zillow.com

extract-listings

Installation

Adds this website's skill for your agents

 

Summary

Extract Zillow for-sale listings matching a complex multi-dimensional filter (price, beds/baths, sqft, lot, year built, property type, listing status, days-on-market, HOA, monthly payment, home features). Constructs the filtered SRP URL via Zillow's searchQueryState parameter, fetches via Browserbase Fetch API (bypasses PerimeterX), and parses __NEXT_DATA__ for structured listings + region totals + pagination. Read-only.

FIG. 01
FIG. 02
SKILL.md
573 lines

Zillow Extract Filtered Listings

Purpose

Given a location (city + state, ZIP, neighborhood, full Zillow URL, or free-form region) and a multi-dimensional filter spec (price, beds, baths, sqft, lot size, year built, days-on-market, property type, listing status, HOA, monthly payment, home features), return the matching active for-sale listings as structured JSON — including zpid, formatted + raw price, beds, baths, interior sqft, lot size with unit, full address, property type, listing status, days on Zillow, Zestimate (when present), HOA (when present), monthly-payment estimate (when present), primary photo URL, and the canonical detail URL — plus region-wide totalResultCount, totalPages, and the exact searchQueryState URL used. Read-only — never click Save, Tour, Contact Agent, or any mutation control.

When to Use

  • A user asks for "homes for sale in {region} with {filters}".
  • An agent needs comparable for-sale comps for a property.
  • Daily/hourly monitoring of new listings matching a complex filter (e.g., "single-family or townhouse, $400k–$750k, 3+ beds, listed in last 30 days, with garage and A/C").
  • Any workflow that previously scraped Zillow's HTML — the __NEXT_DATA__ blob is faster, structurally richer, and (via browse cloud fetch --proxies) bypasses Zillow's anti-bot wall that hard-blocks scripted browsers.

Workflow

Zillow's SRP is a Next.js app that server-renders the full search state into a <script id="__NEXT_DATA__"> JSON blob on the initial HTML response. The canonical filter representation is the searchQueryState URL query parameter (URL-encoded JSON). Constructing a filtered URL upfront and reading the response's __NEXT_DATA__ is the right path — fetching unfiltered results and post-filtering client-side is wasteful and lossy.

The optimal transport is browse cloud fetch --proxies (Browserbase's lightweight HTTP API path), NOT a scripted browser session. Zillow fronts the site with PerimeterX / HUMAN's "Press & Hold" bot defense, which fires a hard JS-challenge modal on the very first navigation from a Browserbase- fingerprinted Chromium — even bare https://www.zillow.com/austin-tx/ gets blocked across regions (us-east-1, us-west-2, eu-central-1) and across --verified / --proxies / --solve-captchas flag combinations. The Fetch API path uses a different HTTP stack + residential-proxy pool that PerimeterX serves freely (verified 200 OK on identical URLs that the browser path got 403 on, same minute).

1. Resolve location → Zillow SEO slug

Map free-form input to one of Zillow's path-based region slugs. The slug locks the region; filters layer on top via searchQueryState. Use the slug WITHOUT filters first if the region resolution looks ambiguous — it's idempotent.

Input shapeSlug patternExample
City + state/{city-lowercased-hyphenated}-{state-2letter-lc}/Austin, TX/austin-tx/
ZIP code/homes/{zip}/30307/homes/30307/
Neighborhood/{nbhd-slug}-{city-slug}-{state}/Mission, San Francisco/mission-san-francisco-ca/
Full Zillow URLUse as-ishttps://www.zillow.com/austin-tx/houses/
Free-form regionFall back to homepage search resolver (see Gotchas)"South Bay Area" → needs lookup

After fetching, verify searchPageState.queryState.regionSelection in the response — it returns [{regionId: N, regionType: T}] where:

  • regionType: 4 — state
  • regionType: 6 — city
  • regionType: 7 — ZIP
  • regionType: 8 — neighborhood

If the region didn't resolve as expected (e.g., 404 or wrong city), fall back to fetching the homepage and using the search-as-you-type endpoint (see Gotchas).

2. Build searchQueryState with the full filter surface

searchQueryState is a URL-encoded JSON object. Minimum viable shape:

{
  "pagination": {"currentPage": 1},
  "isMapVisible": false,
  "isListVisible": true,
  "regionSelection": [{"regionId": 10221, "regionType": 6}],
  "filterState": { /* see below */ }
}

mapBounds is optional — Zillow fills it server-side from the region. Omit unless you specifically need to scope to a sub-bbox of the region.

filterState schema — these keys are the long-form id values (NOT the URL-bar shortIds like con/sf). All 110 keys come from searchPageState.filterDefinitions in any SRP response — print that map once per Zillow build to verify.

Property type (Boolean; default = true for ALL → everything included)

To narrow to a subset, explicitly set the unwanted types to false. Setting the wanted type to true alone does nothing because they're all-on by default.

Filter idshortIdLabel
isSingleFamilysfHouses
isCondoconCondos/Co-ops
isTownhousetowTownhomes
isMultiFamilymfMulti-family
isApartmentapaApartments
isApartmentOrCondoapcoApt-or-condo composite — set false when narrowing away from condos
isManufacturedmanuManufactured
isLotLandlandLots/Land
"filterState": {
  "isCondo": {"value": false},
  "isMultiFamily": {"value": false},
  "isApartment": {"value": false},
  "isManufactured": {"value": false},
  "isLotLand": {"value": false},
  "isApartmentOrCondo": {"value": false}
  // isSingleFamily + isTownhouse stay default true → houses + townhouses only
}

Listing status (Boolean; defaults vary)

Filter idshortIdDefaultLabel
isForSaleByAgentfsbatrueAgent listed
isForSaleByOwnerfsbotrueOwner posted
isNewConstructionnctrueNew construction
isComingSooncmsntrueComing soon
isAuctionauctrueAuctions
isForSaleForeclosureforetrueForeclosures
isPreMarketForeclosurepmffalseForeclosed (pre-market)
isPreMarketPreForeclosurepffalsePre-foreclosures
isRecentlySoldrsfalseRecently sold
isPendingListingsSelectedpndfalsePending & under contract
isAcceptingBackupOffersSelectedabofalseAccepting backup offers
isOpenHousesOnlyopenfalseOpen-house only

"For sale" is the default — you only need to flip statuses when filtering to something non-default. Example: sold listings only → set isForSaleByAgent: false, isForSaleByOwner: false, isNewConstruction: false, isAuction: false, isForSaleForeclosure: false, isComingSoon: false, isRecentlySold: true.

Range filters ({min, max} — both nullable)

Filter idshortIdUnit
price(none — uses full id)USD
monthlyPaymentmpUSD/month — mortgage + tax + HOA estimate
beds(none)integer
baths(none)number (half-baths allowed → 1.5, 2.5)
sqft(none)interior living area
lotSizelotsqft by default; {min, max, units: "acre"} for acres
built(none)year
hoa(none)max USD/month (set max, leave min: null)
parkingSpotsparks{min} only
"price":  {"min": 400000, "max": 750000},
"beds":   {"min": 3},
"baths":  {"min": 2},
"sqft":   {"min": 1500},
"built":  {"min": 1990},
"hoa":    {"max": 200},
"lotSize":{"min": 5000, "max": 20000}

Enum / String filters

Filter idshortIdTypeValues
doz(none)Enum"1", "7", "14", "30", "90", "6m", "12m", "24m", "36m", "any"
sortSelectionsortEnum"globalrelevanceex" (Homes for You — default on SRP), "days" (newest), "priced" (low→high), "pricea" (high→low), "beds", "baths", "size", "lot", "built", "saved", "listingstatus", "featured", "paymentaa"
keywordsattStringfree text — searches listing descriptions
ageRestricted55Plus55plusEnum"i" (include), "e" (exclude), "o" (only)
dataSourceSelectiondsrcString"all" (default)

Home-feature toggles (Boolean; default = false → set true to require)

Filter idshortIdLabel
hasGaragegarMust have garage
parkingSpots (Range)parksParking spots {min}
hasAirConditioningacMust have A/C
hasPoolpoolMust have pool
isWaterfrontwatWaterfront
singleStorystoSingle-story only
hasBasementhbasHas basement
isBasementFinishedbasfFinished basement
isBasementUnfinishedbasuUnfinished basement
hasDisabledAccessdisacAccessible
isCityView / isMountainView / isParkView / isWaterViewcityv / mouv / parkv / watvView attributes
is3dHome3dHas 3D tour
onlyWithPhotos(none)Has photos
onlyPriceReduction(none)Price-reduced

HOA hide-toggle

Setting hoa.max filters to listings whose HOA is at or below that value — it does NOT include listings with unknown HOA data unless you also set:

"includeHomesWithNoHoaData": {"value": true}   // default = true

If the user wants "no HOA fee at all", combine hoa: {max: 0} with includeHomesWithNoHoaData: {value: true} and explicitly check hdpData.homeInfo for hoaFee after extraction — see Gotchas.

3. Construct the URL

const sqs = {
  pagination: {currentPage: 1},
  isMapVisible: false,
  isListVisible: true,
  regionSelection: [{regionId: 10221, regionType: 6}],
  filterState: { /* keys above */ }
};
const url = `https://www.zillow.com/austin-tx/?searchQueryState=${encodeURIComponent(JSON.stringify(sqs))}`;

For page N > 1, BOTH set searchQueryState.pagination.currentPage = N AND insert /N_p/ into the path:

https://www.zillow.com/austin-tx/2_p/?searchQueryState=...   # page 2
https://www.zillow.com/austin-tx/3_p/?searchQueryState=...   # page 3

The pagination.nextUrl / pagination.previousUrl fields in the response (searchPageState.cat1.searchList.pagination) tell you the next path slug verbatim — use those when paginating.

4. Fetch via Browserbase Fetch API (not a browser session)

export BROWSERBASE_API_KEY="$BB_API_KEY"
browse cloud fetch "$URL" --proxies --allow-redirects --output /tmp/srp.html

Verified 200 OK, ~960KB HTML, full __NEXT_DATA__ blob on the filtered URL. The browser path returns the PerimeterX "Press & Hold" challenge instead (see Gotchas — this is the dominant gotcha on Zillow).

5. Parse __NEXT_DATA__

const m = html.match(/<script id="__NEXT_DATA__" type="application\/json">([\s\S]+?)<\/script>/);
const data = JSON.parse(m[1]);
const sps = data.props.pageProps.searchPageState;

const listings = sps.cat1.searchResults.listResults;      // 41 listings/page
const total    = sps.cat1.searchList.totalResultCount;    // region-wide total post-filter
const pages    = sps.cat1.searchList.totalPages;          // total pages
const perPage  = sps.cat1.searchList.resultsPerPage;      // 41 in observed responses
const nextSlug = sps.cat1.searchList.pagination?.nextUrl; // e.g. "/austin-tx/2_p/"
const filterEcho = sps.queryState.filterState;            // ← verify your filter was accepted
const regionEcho = sps.queryState.regionSelection;        // ← verify region resolution

6. Decode each listing

Each entry in listResults[] has:

PathDescription
zpid (string) — also hdpData.homeInfo.zpid (number)Canonical property id
price (string, e.g. "$480,000") + unformattedPrice (number, e.g. 480000)Display + raw price
beds, baths (numbers)Bedrooms/bathrooms — baths is a number that may be 1.5, 2.5 (full + half summed)
area (number, sqft)Interior living area (also hdpData.homeInfo.livingArea)
hdpData.homeInfo.lotAreaValue + hdpData.homeInfo.lotAreaUnitLot size + unit ("sqft" or "acres")
address (string)"123 Main St, Austin, TX 78704"
addressStreet, addressCity, addressState, addressZipcodeParts
latLong ({latitude, longitude})Lat/Lon
statusType ("FOR_SALE", "FOR_RENT", "RECENTLY_SOLD", ...)Listing status enum
rawHomeStatusCd ("ForSale", "Pending", ...)Raw status code
marketingStatusSimplifiedCd ("For Sale by Agent", "For Sale by Owner", "New Construction", "Coming Soon", "Foreclosure", "Auction")Marketing variant
statusText ("Active", "Pending", ...)Display label
hdpData.homeInfo.homeType ("SINGLE_FAMILY", "CONDO", "TOWNHOUSE", "MULTI_FAMILY", "APARTMENT", "MANUFACTURED", "LOT")Property-type enum
hdpData.homeInfo.daysOnZillow (number)Days on market
hdpData.homeInfo.taxAssessedValue (number, optional)Tax-assessed value
zestimate (number, top-level, optional)Zillow estimate when shown
imgSrc (string)Primary photo URL
detailUrl (string — already absolute or use https://www.zillow.com prefix)Canonical detail page
carouselPhotosComposable (array, optional)Additional photo URLs
hdpData.homeInfo.listing_sub_type (object){is_FSBA: true}, {is_FSBO: true}, {is_newHome: true}, {is_foreclosure: true}, etc.

Per-listing HOA (hoaFee) and monthly-payment-estimate fields are NOT consistently present in listResults[]. Zillow stores those on the property's detail page (/homedetails/.../<zpid>_zpid/) — if the user requires them, follow the detailUrl and parse its __NEXT_DATA__ (gdpClientCacheproperty keys). See Gotchas.

7. Paginate if needed

If totalPages > 1 AND the caller wants all results, fetch each /N_p/?searchQueryState=... page (incrementing pagination.currentPage AND the /N_p/ path segment in tandem). Zillow caps SRP results at the first ~500 listings (≤ 13 pages of 41) regardless of totalResultCount. If totalResultCount > 500, set resultsCapped: true in the output and advise the caller to narrow filters.

8. Build the response

Emit the JSON schema in "Expected Output" below. Include the exact searchQueryState_url you used and the echoed filterState_applied from the response (Zillow occasionally normalizes / drops unrecognized keys — the echo is the source of truth).

Site-Specific Gotchas

  • PerimeterX "Press & Hold" hard-blocks the browser path. A scripted Chromium (Browserbase browse open --remote) on https://www.zillow.com/austin-tx/ returns a page whose <title> is "Access to this page has been denied" and body reads "Press & Hold to confirm you are a human (and not a bot)" with a PerimeterX reference ID. Verified across:

    • --verified alone
    • --verified --proxies
    • --verified --proxies --solve-captchas
    • regions us-west-2, us-east-1, eu-central-1
    • viewport 1920x1080
    • bare homepage, bare regional SRP, path-based filters (/houses/), and full searchQueryState URLs

    The very first session occasionally renders the bare SRP successfully but shows the Press & Hold modal over the page; any subsequent navigation hard-locks at the HTTP level. PerimeterX appears to fingerprint Browserbase's Chromium fleet — --solve-captchas does NOT solve the Press & Hold puzzle (it's a long-hold gesture, not a checkbox/click captcha). Don't fight it — use the Fetch API.

  • browse cloud fetch --proxies bypasses PerimeterX entirely. Verified 200 OK on identical URLs that the browser path 403'd on, in the same minute. The Fetch API path uses a different HTTP stack + residential-proxy pool that Zillow's PerimeterX rules don't fingerprint. Always pass --proxies — bare Fetch without proxies may also be flagged on follow-up requests. Always pass --allow-redirects since Zillow 301's bare-IP hits to geo-localized variants.

  • __NEXT_DATA__ is the full SRP state. No XHR scraping needed — props.pageProps.searchPageState contains:

    • queryState (echo of the applied searchQueryState)
    • filterDefinitions (the full 110-filter schema — IDs, types, defaults, shortIds, allowed enums)
    • cat1.searchResults.listResults (the page's listings)
    • cat1.searchList.{totalResultCount, totalPages, resultsPerPage, pagination}
  • filterState keys are long-form IDs, not URL-bar shortIds. The URL bar may show e.g. ?searchQueryState=...sortSelection... but ALSO Zillow's SEO paths use shortIds like /3-_beds/. In the JSON body, use the long form (isSingleFamily, not sf; hasGarage, not gar). The shortIds are only for the path-segment SEO URLs.

  • Property-type filters are all-on by default. Setting isSingleFamily: {value: true} alone does NOTHING — every property type is included by default. To narrow to one type, set the other types to {value: false} (and remember isApartmentOrCondo is a composite that must also be false when narrowing away from condos).

  • Pagination requires BOTH the path AND the query state. Page 2 of Austin is /austin-tx/2_p/?searchQueryState=...pagination.currentPage=2.... Setting only one of them silently returns page 1. Use the response's searchList.pagination.nextUrl as the authoritative path slug.

  • 41 results per page, capped at ~500 total. resultsPerPage is 41 in every observed response. Zillow's SRP scroll caps at ~500 results (~13 pages) — beyond that, the SRP returns the same last page and totalPages reflects the cap, not the true total. Mirror totalResultCount > 500resultsCapped: true in your output and advise the caller to narrow filters (tighter geography, tighter price band, etc.).

  • Lot-size units toggle. lotSize defaults to sqft ({min, max, units: "sqft"}). To filter by acres, set units: "acre" AND interpret hdpData.homeInfo.lotAreaUnit in the response — Zillow may return some listings in acres and some in sqft (typically sqft below 1 acre, acres above).

  • HOA filter behavior. hoa: {max: N} keeps listings with HOA ≤ N. To also include unknown-HOA listings, ensure includeHomesWithNoHoaData: {value: true} (the default). For "zero HOA only": hoa: {max: 0} AND check hdpData.homeInfo.hoaFee post-extraction.

  • Per-listing hoaFee, zestimate, monthlyPayment are inconsistently present in listResults[]. zestimate appears on some listings (~30% in Austin sample), hoaFee and monthlyPayment are usually absent at the SRP level. To fetch them reliably, follow detailUrl and parse the detail page's __NEXT_DATA__ (gdpClientCacheForSaleShopperPlatformFullRenderQuerypropertyhoaFee, zestimate, monthlyHoaFee, monthlyHoaFeeDisplay). Only do this when the caller specifically asks — it's N extra fetches.

  • Region slug for free-form input falls back to homepage search. ZIP (/homes/{zip}/), city + state (/{slug}-{state}/), and most major neighborhoods (/{nbhd}-{city}-{state}/) work via direct path. For unknown free-form regions ("South Bay Area", "DFW Metroplex"), fetch the homepage https://www.zillow.com/homes/ and submit the term to the search resolver: Zillow's autocomplete GraphQL endpoint (/zg-graph/autocomplete/results) returns 400 without Apollo client headers (x-apollo-operation-name, client-id, x-caller-id) — easiest path is to slugify the input heuristically and fall back to a 404-detection retry chain.

  • mapBounds is optional. Zillow fills it server-side from regionSelection. Including it scopes to a sub-bbox; omitting it gives the full region.

  • Sold listings need isRecentlySold: true AND all for-sale toggles set to false. Otherwise the SRP returns the union of for-sale + recently-sold.

  • doz Enum, not Range. Days-on-Zillow is {value: "30"} (string), not {min: 0, max: 30}. Only the listed enum values ("1", "7", "14", "30", "90", "6m", "12m", "24m", "36m", "any") are accepted — unrecognized values default to "any".

  • searchPageState.filterDefinitions is the canonical schema source. When in doubt about a filter's shape (Boolean? Range? Enum? what's the default?), fetch ANY SRP response and read its filterDefinitions block. Zillow updates the schema over time — re-derive once per quarter.

  • Detail URLs are canonical. detailUrl is https://www.zillow.com/homedetails/{slug}/{zpid}_zpid/ — the _zpid/ suffix is mandatory and the slug is informational. Hitting just https://www.zillow.com/homedetails/{zpid}_zpid/ also redirects to the canonical.

  • READ-ONLY. Do NOT click Save, Save Search, Tour, Contact Agent, Apply, or any other CTA on the SRP or detail pages. The skill never makes a state change.

Browser fallback (use only if Fetch API fails)

If browse cloud fetch --proxies starts returning 4xx (e.g., Zillow extends PX to the Fetch path):

  1. Spin a fresh browse cloud sessions create --keep-alive --verified --proxies --solve-captchas --region us-east-1 session.
  2. Open https://www.zillow.com/ (homepage, NOT the regional SRP first).
  3. wait timeout 6000 for any PX challenge animation to finish.
  4. Use browse fill to enter the location in the search box, browse press Enter. This routes through the SPA and may bypass the direct-URL PX fingerprint.
  5. Read __NEXT_DATA__ via browse eval: JSON.parse(document.getElementById('__NEXT_DATA__').textContent).props.pageProps.searchPageState
  6. To apply filters, use the SRP's filter UI — click each filter pill, set values via browse fill / browse click, then browse click the "Apply" button. Zillow's SPA will update searchQueryState via history.pushState without triggering a hard navigation — this avoids the PX fingerprint.
  7. After every action, wait timeout 2000 and re-read __NEXT_DATA__ / window.__INITIAL_STATE__ for fresh results.

This path costs ~5–10× more turns than the Fetch path and is brittle to PX modal interrupts. Use Fetch API as primary; fall back to browser only when Fetch breaks.

Expected Output

{
  "success": true,
  "searchQueryState_url": "https://www.zillow.com/austin-tx/?searchQueryState=%7B%22pagination%22%3A%7B%22currentPage%22%3A1%7D%2C...",
  "filterState_applied": {
    "sortSelection": {"value": "globalrelevanceex"},
    "price": {"min": 400000, "max": 750000},
    "beds": {"min": 3, "max": null},
    "baths": {"min": 2, "max": null},
    "sqft": {"min": 1500, "max": null},
    "built": {"min": 1990, "max": null},
    "doz": {"value": "30"},
    "isCondo": {"value": false},
    "isMultiFamily": {"value": false},
    "isApartment": {"value": false},
    "isManufactured": {"value": false},
    "isLotLand": {"value": false},
    "isApartmentOrCondo": {"value": false},
    "hasGarage": {"value": true},
    "hasAirConditioning": {"value": true}
  },
  "regionSelection": [{"regionId": 10221, "regionType": 6}],
  "totalResultCount": 258,
  "currentPage": 1,
  "totalPages": 7,
  "resultsPerPage": 41,
  "resultsCapped": false,
  "listings": [
    {
      "zpid": "111969308",
      "price": "$480,000",
      "priceRaw": 480000,
      "beds": 3,
      "baths": 2,
      "area": 1849,
      "lotAreaValue": 7056.72,
      "lotAreaUnit": "sqft",
      "address": {
        "full": "13933 Turkey Hollow Trl, Austin, TX 78717",
        "streetAddress": "13933 Turkey Hollow Trl",
        "city": "Austin",
        "state": "TX",
        "zipcode": "78717"
      },
      "latLong": {"latitude": 30.49, "longitude": -97.79543},
      "propertyType": "SINGLE_FAMILY",
      "homeStatus": "FOR_SALE",
      "statusText": "Active",
      "marketingStatus": "For Sale by Agent",
      "listingSubType": {"is_FSBA": true},
      "daysOnZillow": 3,
      "zestimate": null,
      "taxAssessedValue": 499783,
      "hoa": null,
      "monthlyPayment": null,
      "imgSrc": "https://photos.zillowstatic.com/fp/671419edea6ca08359352874f3b8ad57-p_e.jpg",
      "detailUrl": "https://www.zillow.com/homedetails/13933-Turkey-Hollow-Trl-Austin-TX-78717/111969308_zpid/"
    }
  ]
}

Outcome shapes

// Filtered results found
{ "success": true, "totalResultCount": 258, "listings": [ /* up to resultsPerPage */ ], ... }

// Zero results for the applied filter (Zillow returns the response anyway with empty listResults)
{ "success": true, "totalResultCount": 0, "listings": [],
  "zeroResultMessage": "No matching results — try widening your search radius or relaxing filters" }

// Region not resolved (404 on the constructed slug, or wrong region echoed)
{ "success": false, "reason": "region_not_resolved", "input": "South Bay Area",
  "attemptedSlugs": ["/south-bay-area/", "/south-bay-area-ca/"] }

// Anti-bot wall (Fetch API also blocked)
{ "success": false, "reason": "perimeterx_block", "challenge": "press_and_hold",
  "referenceId": "...", "fallback_attempted": "browser+homepage" }

// Results capped (totalResultCount > 500, Zillow won't paginate past the cap)
{ "success": true, "totalResultCount": 12450, "resultsCapped": true,
  "advice": "narrow filters or use a smaller region", "listings": [ /* first ≤500 */ ] }