data.sfgov.org

sf-business-registry-lookup

Installation

Adds this website's skill for your agents

 

Summary

Search the San Francisco registered-business dataset on DataSF by DBA name or owner name and return legal name, DBA, address, NAICS code, status, and registration date per location. Read-only, Zod-validatable.

FIG. 01
FIG. 02
FIG. 03
FIG. 04
SKILL.md
258 lines

SF Business Registry Lookup

Purpose

Look up businesses registered with the City & County of San Francisco by DBA name (doing-business-as / storefront name) or owner name (legal entity / ownership name) and return one row per registered location with: legal name, DBA, full address, NAICS code (+ description when available), current status (active / closed / administratively closed), and registration date. Read-only; backed by the public DataSF (Socrata) open-data catalog. Authoritative source for SF Business Account Number (BAN), location id, and tax-roll-level location history. Returns Zod-validatable JSON.

When to Use

  • "Is <biz> registered to operate in San Francisco?"
  • Look up the BAN (Business Account Number) and TTX Location ID for a known DBA.
  • Enumerate every storefront location for a corporate parent (e.g. all Blue Bottle locations).
  • Pull NAICS classification and registration dates for due-diligence / KYB / underwriting.
  • Get a recent-closures list for a neighborhood or NAICS sector.

Workflow

The SF business registry is published as a Socrata Open Data API (SODA) dataset at g8m3-pdis (canonical name: Registered Business Locations - San Francisco). The dataset is unauthenticated, CORS-open, no proxies needed, and updated daily — the SODA API beats every browser path by ~100× on cost and is the recommended method. The DataSF "Explore / Filter" web UI and the published "Registered Business Lookup" Story (k6sk-2y6w) are layered on top of the same SODA endpoint — driving them with a Stagehand CUA agent is a valid fallback, but it's strictly slower and adds JS-render flakiness for zero new data. Lead with SODA; reach for the browser only if the SODA endpoint is unreachable from the caller's network.

1. Build the SoQL query

Endpoint: https://data.sfgov.org/resource/g8m3-pdis.json

Three filter strategies, in order of precision:

StrategyWhen to useParam
$where upper(dba_name) like '%<TERM>%'User gave a storefront / DBA nameexact substring, case-insensitive
$where upper(ownership_name) like '%<TERM>%'User gave a legal entity / ownerexact substring, case-insensitive
$q=<TERM>User gave a fuzzy / multi-token query, or a term containing apostrophes (Joe's)Socrata full-text index across all text columns

Always wrap user input in upper(...) + like '%...%' for predictable case-insensitive substring matching — the columns are mixed-case in storage (Blue Bottle Coffee, BLUE BOTTLE COFFEE INC, etc.). Use $q when the term contains a ' apostrophe to avoid SoQL syntax errors (see gotchas).

Project only the columns you need with $select — the table has ~38 columns and full rows are wasteful:

$select = uniqueid, certificate_number, ttxid,
          ownership_name, dba_name,
          full_business_address, city, state, business_zip,
          dba_start_date, dba_end_date,
          location_start_date, location_end_date,
          administratively_closed,
          naic_code, naic_code_description,
          neighborhoods_analysis_boundaries, supervisor_district,
          location

Recommended sort + page size:

$order = location_start_date DESC, uniqueid ASC
$limit = 100      # default 1000 max, but keep small
$offset = 0

uniqueid as a tiebreaker on $order is mandatory for paginationlocation_start_date collides for bulk-registered chains (see gotchas).

2. Fetch

URL='https://data.sfgov.org/resource/g8m3-pdis.json'\
'?%24where=upper(dba_name)%20like%20%27%25BLUE%20BOTTLE%25%27'\
'&%24select=uniqueid,certificate_number,ttxid,ownership_name,dba_name,'\
'full_business_address,city,state,business_zip,'\
'dba_start_date,dba_end_date,location_start_date,location_end_date,'\
'administratively_closed,naic_code,naic_code_description'\
'&%24order=location_start_date%20DESC,uniqueid%20ASC'\
'&%24limit=100'

curl -fsS "$URL"        # or browse cloud fetch "$URL"

URL-encode $ as %24, single-quotes as %27, % as %25, spaces as %20. Do not use + for spaces inside like '%...%' patterns — Socrata silently fails some queries when + appears mid-literal. Use %20.

3. Derive status (the dataset has no single "status" column)

The status of each row is a function of three columns:

def status(row):
    end = row.get("location_end_date") or row.get("dba_end_date")
    if row.get("administratively_closed"):
        return "administratively_closed"   # closed by the Treasurer-Tax Collector, not the owner
    if end:
        return "closed"                    # owner-reported closure
    return "active"                        # both end-date columns NULL → currently registered

Always check location_end_date first (per-location closure) before dba_end_date (entire DBA wind-down). administratively_closed is rare (≈ 0.x% of rows) but takes precedence when present — it means the Treasurer's office closed the account for non-payment / no-response, not the owner.

4. Validate with Zod

const Row = z.object({
  uniqueid: z.string(),
  certificate_number: z.string(),           // BAN
  ttxid: z.string(),                        // TTX Location ID
  ownership_name: z.string(),               // legal name
  dba_name: z.string(),
  full_business_address: z.string().optional(),
  city: z.string().optional(),
  state: z.string().optional(),
  business_zip: z.string().optional(),
  dba_start_date: z.string().datetime({ offset: false }).optional(),
  dba_end_date: z.string().datetime({ offset: false }).optional(),
  location_start_date: z.string().datetime({ offset: false }).optional(),
  location_end_date: z.string().datetime({ offset: false }).optional(),
  administratively_closed: z.string().optional(),
  naic_code: z.string().optional(),
  naic_code_description: z.string().optional(),
});

Mark every field except uniqueid, certificate_number, ttxid, ownership_name, and dba_name as .optional(). The SODA endpoint omits empty fields from the JSON entirely (no nulls, no empty strings) — strict schemas without .optional() will reject most rows.

5. (Optional) Paginate

If len(results) === $limit you probably have more. Re-issue with $offset += $limit until you get a short page. Maintain the secondary sort key uniqueid ASC on every page or you'll see duplicates.

Browser fallback (Stagehand CUA agent on the Explore UI)

If the SODA API is unreachable, drive the DataSF Explore/Filter UI. The dataset's filter page is:

https://data.sfgov.org/Economy-and-Community/Registered-Business-Locations-San-Francisco/g8m3-pdis/explore/query/<URL-encoded SoQL>/page/filter

You can pre-bake the entire SoQL query in the URL, navigate, wait for the grid to render, and read the rendered rows directly — Stagehand's extract over the table renders cleanly. Steps:

  1. URL-encode a SELECT ... WHERE upper(\dba_name`) like '%TERM%'SoQL string and embed it in the/explore/query/<encoded>/page/filter` path.
  2. browse open the URL with a residential-proxy session (no --verified needed — DataSF is not anti-bot).
  3. browse wait load, then wait timeout 4000 for the grid to hydrate.
  4. browse snapshot and extract via Stagehand's agent or extract (the grid has stable [role="row"] semantics).
  5. Map the extracted strings into the same Zod schema as in step 4 above.

This path costs ~10–15 turns per query vs. 1 HTTP request for the SODA path. Use only when SODA is blocked.

Site-Specific Gotchas

  • naic_code_description is sparsely populated for modern codes. The dataset contains two eras of NAICS coding: older rows have 4-digit range codes (e.g. naic_code: "2300-2399") with naic_code_description: "Construction" populated; newer rows have specific 6-digit codes (e.g. naic_code: "722515" for snack/non-alcoholic-bev bars) with naic_code_description blank. If you need the description for a 6-digit code, look it up out-of-band against the US Census NAICS reference — do not assume the dataset will supply it. naics_code_descriptions_list is a separate denormalized "all descriptions concatenated" column that can also be empty for newer registrations.
  • Status must be derived, not read. There is no status column. location_end_date is the per-location wind-down (most useful for "is this storefront still open?"); dba_end_date is the DBA-level wind-down; administratively_closed is a separate text column populated when the Treasurer-Tax Collector closes the account for non-compliance. A row is "active" only when all three are NULL/absent.
  • Two distinct registration dates existdba_start_date (when this ownership entity first registered the DBA, often years before opening this location) and location_start_date (when this specific storefront opened). The Blue Bottle Coffee Inc parent entity's dba_start_date is 2015-11-20 for every location, but individual storefront location_start_dates range from 2014 to 2025. Surface both in your output; pick location_start_date as the canonical "registration date for this location" if forced to choose one.
  • Apostrophes in search terms break the $where like path. A $where upper(dba_name) like '%JOE'S%' query (URL-encoded with %27%27 to escape the apostrophe) returns 500 Internal Server Error from Socrata's parser even though the doubled-quote escape is documented SoQL syntax. Workaround: strip apostrophes from the input before constructing the like pattern ("Joe's" → "JOE" matches "Joe's Pizza" correctly because like '%JOE%' is a substring match), or use the $q=<term> full-text index which handles apostrophes transparently.
  • Pagination requires a secondary sort key. Sorting by location_start_date DESC alone produces duplicate rows across pages because chain registrations (e.g. "Ba @ Google ..." in our sample data) share identical location_start_date values. Always append , uniqueid ASC to $order. Confirmed dup-on-page-boundary behavior 2026-05-21.
  • The SODA endpoint omits empty fields entirely. A row with no dba_end_date will not have the key in the JSON — there is no "dba_end_date": null, the key is simply absent. Zod schemas must use .optional(), not .nullable(), on every nullable column. The presence/absence of the key is the actual signal.
  • naic_code is a string, not a number. Both modern codes ("722515") and legacy range codes ("2300-2399") ship as JSON strings. Don't parseInt — the range codes will silently become 2300 and lose meaning.
  • Some aggregate queries return Socrata internal-error 500s. Naked count(*) over the full 361k-row dataset combined with IS NOT NULL filters has been observed to return errorCode: internal-error (Socrata-side timeout/limit). Plain per-DBA filtered queries work reliably. Don't build the skill on top of unbounded aggregates.
  • Don't use data.sfgov.org/api/views/... for filtered queries. That endpoint returns the dataset metadata (schema, descriptions, owner). The data itself lives at data.sfgov.org/resource/<id>.json with SoQL params. Confusing these two paths is the most common mistake when working with Socrata domains.
  • The "Active Business Locations" filtered view (kvj8-g7jh) is not strictly active-only. It still returns rows with non-null location_end_date for some recently-closed locations. If you need "active only", filter on the base g8m3-pdis dataset with your own status logic from step 3 — don't trust the view's name.
  • No anti-bot, no auth, no rate-limit shaping observed. DataSF (Socrata) is a true public-data endpoint. A residential proxy is not required for either the SODA API or the browser fallback. Anthropic / SOC-2 caveats: data.sfgov.org is hosted in aws-us-east-1-fedramp-prod; outbound from non-US sources is fine but logged.
  • Daily refresh, not real-time. Each row carries data_as_of and data_loaded_at timestamps — surface these (or at least data_as_of) in your output if downstream consumers care about freshness. A business that registered yesterday may not appear until the next-morning load.

Expected Output

Three distinct outcome shapes. The schema is the same for owner-name and DBA-name searches; only the search_field discriminator changes.

// 1. Hit — one or more matching registrations
{
  "search_field": "dba_name",
  "search_term": "Blue Bottle",
  "data_as_of": "2026-05-21",
  "total_results": 19,
  "results": [
    {
      "uniqueid": "1396358-08-251-1021127",
      "business_account_number": "1021127",
      "location_id": "1396358-08-251",
      "legal_name": "Blue Bottle Coffee Inc",
      "dba_name": "Blue Bottle Coffee",
      "address": {
        "street": "300 Toni Stone Xing Ste E",
        "city": "San Francisco",
        "state": "CA",
        "zip": "94158"
      },
      "naics": {
        "code": "722515",
        "description": null
      },
      "status": "active",
      "registration_date": {
        "dba_start_date": "2015-11-20",
        "location_start_date": "2025-08-05",
        "location_end_date": null
      },
      "neighborhood": "Mission Bay",
      "supervisor_district": "6",
      "location": { "lat": 37.775547, "lon": -122.3887815 }
    }
  ]
}
// 2. Hit — historical / closed location
{
  "search_field": "dba_name",
  "search_term": "Blue Bottle",
  "results": [
    {
      "uniqueid": "...",
      "legal_name": "Blue Bottle Coffee Inc",
      "dba_name": "Blue Bottle Coffee",
      "address": { "street": "66 Mint St", "city": "San Francisco", "state": "CA", "zip": "94103" },
      "naics": { "code": "722515", "description": null },
      "status": "closed",
      "registration_date": {
        "dba_start_date": "2015-11-20",
        "location_start_date": "2023-01-01",
        "location_end_date": "2023-04-14"
      }
    }
  ]
}
// 3. No match — term did not surface any registered business
{
  "search_field": "dba_name",
  "search_term": "ZZZ Nonexistent Biz",
  "data_as_of": "2026-05-21",
  "total_results": 0,
  "results": []
}

A fourth shape — administratively closed — uses the same envelope as shape 1/2 with "status": "administratively_closed" and may have location_end_date: null (the TTX office closed the account without a wind-down date). Treat it as a closed record for most downstream purposes.