SF Business Registry Lookup
Purpose
Look up businesses registered with the City & County of San Francisco by DBA name (doing-business-as / storefront name) or owner name (legal entity / ownership name) and return one row per registered location with: legal name, DBA, full address, NAICS code (+ description when available), current status (active / closed / administratively closed), and registration date. Read-only; backed by the public DataSF (Socrata) open-data catalog. Authoritative source for SF Business Account Number (BAN), location id, and tax-roll-level location history. Returns Zod-validatable JSON.
When to Use
- "Is
<biz>registered to operate in San Francisco?" - Look up the BAN (Business Account Number) and TTX Location ID for a known DBA.
- Enumerate every storefront location for a corporate parent (e.g. all Blue Bottle locations).
- Pull NAICS classification and registration dates for due-diligence / KYB / underwriting.
- Get a recent-closures list for a neighborhood or NAICS sector.
Workflow
The SF business registry is published as a Socrata Open Data API (SODA) dataset at g8m3-pdis (canonical name: Registered Business Locations - San Francisco). The dataset is unauthenticated, CORS-open, no proxies needed, and updated daily — the SODA API beats every browser path by ~100× on cost and is the recommended method. The DataSF "Explore / Filter" web UI and the published "Registered Business Lookup" Story (k6sk-2y6w) are layered on top of the same SODA endpoint — driving them with a Stagehand CUA agent is a valid fallback, but it's strictly slower and adds JS-render flakiness for zero new data. Lead with SODA; reach for the browser only if the SODA endpoint is unreachable from the caller's network.
1. Build the SoQL query
Endpoint: https://data.sfgov.org/resource/g8m3-pdis.json
Three filter strategies, in order of precision:
| Strategy | When to use | Param |
|---|---|---|
$where upper(dba_name) like '%<TERM>%' | User gave a storefront / DBA name | exact substring, case-insensitive |
$where upper(ownership_name) like '%<TERM>%' | User gave a legal entity / owner | exact substring, case-insensitive |
$q=<TERM> | User gave a fuzzy / multi-token query, or a term containing apostrophes (Joe's) | Socrata full-text index across all text columns |
Always wrap user input in upper(...) + like '%...%' for predictable case-insensitive substring matching — the columns are mixed-case in storage (Blue Bottle Coffee, BLUE BOTTLE COFFEE INC, etc.). Use $q when the term contains a ' apostrophe to avoid SoQL syntax errors (see gotchas).
Project only the columns you need with $select — the table has ~38 columns and full rows are wasteful:
$select = uniqueid, certificate_number, ttxid,
ownership_name, dba_name,
full_business_address, city, state, business_zip,
dba_start_date, dba_end_date,
location_start_date, location_end_date,
administratively_closed,
naic_code, naic_code_description,
neighborhoods_analysis_boundaries, supervisor_district,
location
Recommended sort + page size:
$order = location_start_date DESC, uniqueid ASC
$limit = 100 # default 1000 max, but keep small
$offset = 0
uniqueid as a tiebreaker on $order is mandatory for pagination — location_start_date collides for bulk-registered chains (see gotchas).
2. Fetch
URL='https://data.sfgov.org/resource/g8m3-pdis.json'\
'?%24where=upper(dba_name)%20like%20%27%25BLUE%20BOTTLE%25%27'\
'&%24select=uniqueid,certificate_number,ttxid,ownership_name,dba_name,'\
'full_business_address,city,state,business_zip,'\
'dba_start_date,dba_end_date,location_start_date,location_end_date,'\
'administratively_closed,naic_code,naic_code_description'\
'&%24order=location_start_date%20DESC,uniqueid%20ASC'\
'&%24limit=100'
curl -fsS "$URL" # or browse cloud fetch "$URL"
URL-encode $ as %24, single-quotes as %27, % as %25, spaces as %20. Do not use + for spaces inside like '%...%' patterns — Socrata silently fails some queries when + appears mid-literal. Use %20.
3. Derive status (the dataset has no single "status" column)
The status of each row is a function of three columns:
def status(row):
end = row.get("location_end_date") or row.get("dba_end_date")
if row.get("administratively_closed"):
return "administratively_closed" # closed by the Treasurer-Tax Collector, not the owner
if end:
return "closed" # owner-reported closure
return "active" # both end-date columns NULL → currently registered
Always check location_end_date first (per-location closure) before dba_end_date (entire DBA wind-down). administratively_closed is rare (≈ 0.x% of rows) but takes precedence when present — it means the Treasurer's office closed the account for non-payment / no-response, not the owner.
4. Validate with Zod
const Row = z.object({
uniqueid: z.string(),
certificate_number: z.string(), // BAN
ttxid: z.string(), // TTX Location ID
ownership_name: z.string(), // legal name
dba_name: z.string(),
full_business_address: z.string().optional(),
city: z.string().optional(),
state: z.string().optional(),
business_zip: z.string().optional(),
dba_start_date: z.string().datetime({ offset: false }).optional(),
dba_end_date: z.string().datetime({ offset: false }).optional(),
location_start_date: z.string().datetime({ offset: false }).optional(),
location_end_date: z.string().datetime({ offset: false }).optional(),
administratively_closed: z.string().optional(),
naic_code: z.string().optional(),
naic_code_description: z.string().optional(),
});
Mark every field except uniqueid, certificate_number, ttxid, ownership_name, and dba_name as .optional(). The SODA endpoint omits empty fields from the JSON entirely (no nulls, no empty strings) — strict schemas without .optional() will reject most rows.
5. (Optional) Paginate
If len(results) === $limit you probably have more. Re-issue with $offset += $limit until you get a short page. Maintain the secondary sort key uniqueid ASC on every page or you'll see duplicates.
Browser fallback (Stagehand CUA agent on the Explore UI)
If the SODA API is unreachable, drive the DataSF Explore/Filter UI. The dataset's filter page is:
https://data.sfgov.org/Economy-and-Community/Registered-Business-Locations-San-Francisco/g8m3-pdis/explore/query/<URL-encoded SoQL>/page/filter
You can pre-bake the entire SoQL query in the URL, navigate, wait for the grid to render, and read the rendered rows directly — Stagehand's extract over the table renders cleanly. Steps:
- URL-encode a
SELECT ... WHERE upper(\dba_name`) like '%TERM%'SoQL string and embed it in the/explore/query/<encoded>/page/filter` path. browse openthe URL with a residential-proxy session (no--verifiedneeded — DataSF is not anti-bot).browse wait load, thenwait timeout 4000for the grid to hydrate.browse snapshotand extract via Stagehand'sagentorextract(the grid has stable[role="row"]semantics).- Map the extracted strings into the same Zod schema as in step 4 above.
This path costs ~10–15 turns per query vs. 1 HTTP request for the SODA path. Use only when SODA is blocked.
Site-Specific Gotchas
naic_code_descriptionis sparsely populated for modern codes. The dataset contains two eras of NAICS coding: older rows have 4-digit range codes (e.g.naic_code: "2300-2399") withnaic_code_description: "Construction"populated; newer rows have specific 6-digit codes (e.g.naic_code: "722515"for snack/non-alcoholic-bev bars) withnaic_code_descriptionblank. If you need the description for a 6-digit code, look it up out-of-band against the US Census NAICS reference — do not assume the dataset will supply it.naics_code_descriptions_listis a separate denormalized "all descriptions concatenated" column that can also be empty for newer registrations.- Status must be derived, not read. There is no
statuscolumn.location_end_dateis the per-location wind-down (most useful for "is this storefront still open?");dba_end_dateis the DBA-level wind-down;administratively_closedis a separate text column populated when the Treasurer-Tax Collector closes the account for non-compliance. A row is "active" only when all three are NULL/absent. - Two distinct registration dates exist —
dba_start_date(when this ownership entity first registered the DBA, often years before opening this location) andlocation_start_date(when this specific storefront opened). The Blue Bottle Coffee Inc parent entity'sdba_start_dateis2015-11-20for every location, but individual storefrontlocation_start_dates range from 2014 to 2025. Surface both in your output; picklocation_start_dateas the canonical "registration date for this location" if forced to choose one. - Apostrophes in search terms break the
$where likepath. A$where upper(dba_name) like '%JOE'S%'query (URL-encoded with%27%27to escape the apostrophe) returns 500 Internal Server Error from Socrata's parser even though the doubled-quote escape is documented SoQL syntax. Workaround: strip apostrophes from the input before constructing thelikepattern ("Joe's" → "JOE"matches "Joe's Pizza" correctly becauselike '%JOE%'is a substring match), or use the$q=<term>full-text index which handles apostrophes transparently. - Pagination requires a secondary sort key. Sorting by
location_start_date DESCalone produces duplicate rows across pages because chain registrations (e.g. "Ba @ Google ..." in our sample data) share identicallocation_start_datevalues. Always append, uniqueid ASCto$order. Confirmed dup-on-page-boundary behavior 2026-05-21. - The SODA endpoint omits empty fields entirely. A row with no
dba_end_datewill not have the key in the JSON — there is no"dba_end_date": null, the key is simply absent. Zod schemas must use.optional(), not.nullable(), on every nullable column. The presence/absence of the key is the actual signal. naic_codeis a string, not a number. Both modern codes ("722515") and legacy range codes ("2300-2399") ship as JSON strings. Don'tparseInt— the range codes will silently become2300and lose meaning.- Some aggregate queries return Socrata internal-error 500s. Naked
count(*)over the full 361k-row dataset combined withIS NOT NULLfilters has been observed to returnerrorCode: internal-error(Socrata-side timeout/limit). Plain per-DBA filtered queries work reliably. Don't build the skill on top of unbounded aggregates. - Don't use
data.sfgov.org/api/views/...for filtered queries. That endpoint returns the dataset metadata (schema, descriptions, owner). The data itself lives atdata.sfgov.org/resource/<id>.jsonwith SoQL params. Confusing these two paths is the most common mistake when working with Socrata domains. - The "Active Business Locations" filtered view (
kvj8-g7jh) is not strictly active-only. It still returns rows with non-nulllocation_end_datefor some recently-closed locations. If you need "active only", filter on the baseg8m3-pdisdataset with your own status logic from step 3 — don't trust the view's name. - No anti-bot, no auth, no rate-limit shaping observed. DataSF (Socrata) is a true public-data endpoint. A residential proxy is not required for either the SODA API or the browser fallback. Anthropic / SOC-2 caveats:
data.sfgov.orgis hosted inaws-us-east-1-fedramp-prod; outbound from non-US sources is fine but logged. - Daily refresh, not real-time. Each row carries
data_as_ofanddata_loaded_attimestamps — surface these (or at leastdata_as_of) in your output if downstream consumers care about freshness. A business that registered yesterday may not appear until the next-morning load.
Expected Output
Three distinct outcome shapes. The schema is the same for owner-name and DBA-name searches; only the search_field discriminator changes.
// 1. Hit — one or more matching registrations
{
"search_field": "dba_name",
"search_term": "Blue Bottle",
"data_as_of": "2026-05-21",
"total_results": 19,
"results": [
{
"uniqueid": "1396358-08-251-1021127",
"business_account_number": "1021127",
"location_id": "1396358-08-251",
"legal_name": "Blue Bottle Coffee Inc",
"dba_name": "Blue Bottle Coffee",
"address": {
"street": "300 Toni Stone Xing Ste E",
"city": "San Francisco",
"state": "CA",
"zip": "94158"
},
"naics": {
"code": "722515",
"description": null
},
"status": "active",
"registration_date": {
"dba_start_date": "2015-11-20",
"location_start_date": "2025-08-05",
"location_end_date": null
},
"neighborhood": "Mission Bay",
"supervisor_district": "6",
"location": { "lat": 37.775547, "lon": -122.3887815 }
}
]
}
// 2. Hit — historical / closed location
{
"search_field": "dba_name",
"search_term": "Blue Bottle",
"results": [
{
"uniqueid": "...",
"legal_name": "Blue Bottle Coffee Inc",
"dba_name": "Blue Bottle Coffee",
"address": { "street": "66 Mint St", "city": "San Francisco", "state": "CA", "zip": "94103" },
"naics": { "code": "722515", "description": null },
"status": "closed",
"registration_date": {
"dba_start_date": "2015-11-20",
"location_start_date": "2023-01-01",
"location_end_date": "2023-04-14"
}
}
]
}
// 3. No match — term did not surface any registered business
{
"search_field": "dba_name",
"search_term": "ZZZ Nonexistent Biz",
"data_as_of": "2026-05-21",
"total_results": 0,
"results": []
}
A fourth shape — administratively closed — uses the same envelope as shape 1/2 with "status": "administratively_closed" and may have location_end_date: null (the TTX office closed the account without a wind-down date). Treat it as a closed record for most downstream purposes.