Amazon Product Search
Purpose
Given a search intent — free-form keyword, full Amazon search URL, ASIN list, or a category-browse intent ("Bestsellers in Coffee") — plus any combination of Amazon's left-rail / top-bar filters, return a structured JSON page of products with per-item ASIN, title, brand, primary + thumbnail image URLs, current price + currency, list price + discount %, star rating + review count, Prime eligibility, sponsored / Amazon's-Choice / bestseller / Climate-Pledge / coupon badges, ships-from / sold-by attribution, and the canonical /dp/{ASIN} URL. Captures the page-level totalResultCount so callers know the returned slice is partial. Honors region-specific TLDs (.com, .co.uk, .de, .ca, .jp, ...) — each storefront uses its own filter / category ID space. Read-only — never clicks Add to Cart, Buy Now, Subscribe & Save, or Sign In. When the page presents a captcha or Akamai bot-wall, captures the screenshot and emits a captcha_wall failure shape rather than attempting to solve.
When to Use
- "Find me {N} matching products on Amazon for {keyword}" with any filter set.
- Periodic price / rating / stock-level monitoring of a specific ASIN cohort.
- Bulk cross-category extraction (
/zgbs/,/gp/bestsellers/,/gp/new-releases/). - Comparison-shopping agents that need a structured product feed instead of HTML.
- Anywhere a caller would otherwise scrape
/s?k=...by hand — this skill encapsulates the entire filter-encoding scheme + anti-bot wrapper.
Workflow
Amazon's /s search route is gated by Akamai Bot Manager (the bm_sz / ak_bmsc cookie family + bm-verify 5-second JS challenge interstitial). Browserless HTTP fetches — even through a residential proxy — fail closed in ~75% of query variants observed during iteration. The reliable path is a fully verified Browserbase session with --verified --proxies enabled, which executes the bot-challenge JS, sets the bm-cookies, and then re-requests the SERP cleanly.
Primary: scripted-browser path
-
Create a Verified + residential-proxy session — both flags are mandatory.
SID=$(browse cloud sessions create --keep-alive --verified --proxies | jq -r .id) export BROWSE_SESSION="$SID" -
Warm the bm-cookies before the targeted SERP by opening the homepage:
browse open --remote "https://www.amazon.com/" browse wait load --remote browse wait timeout 1500 --remoteThis lets Akamai's JS run once, seeding
ak_bmsc+bm_szcookies in the session. Skipping the warm-up triples the captcha rate on the next request. -
Build the search URL. Pick the right storefront TLD by region (table below). Encode filters into
rh=segments, sort vias=, and paginate viapage=N. See "Filter encoding" in Site-Specific Gotchas for the full map.https://www.amazon.com/s ?k={URL-encoded keyword} &rh={filter-1},{filter-2},... (URL-encoded comma is %2C) &s={sort enum} &page={N}Example — "wireless mechanical keyboard, 4+ stars, ≤$150, Prime-eligible, sorted by review rank":
/s?k=wireless+mechanical+keyboard &rh=p_72%3A1248915011%2Cp_36%3A-15000%2Cp_85%3A2470955011 &s=review-rank -
Open the SERP in the same session:
browse open --remote "$URL" browse wait load --remote browse wait timeout 1500 --remote # lazy-loaded badges / Prime icons / Climate Pledge render after `load` -
Detect the captcha wall before parsing. If the page title is "Robot Check" or the body contains
<form action="/errors/validateCaptcha", stop — emit thecaptcha_wallshape (see Expected Output). Do not attempt to solve.TITLE=$(browse get title --remote) if [ "$TITLE" = "Robot Check" ] || browse get html body --remote | grep -q '/errors/validateCaptcha'; then # ship candidate output, captura screenshot, exit fi -
Read the rendered HTML and parse cards:
browse get html body --remote > /tmp/serp.html-
Card root: each
<div data-component-type="s-search-result" data-asin="...">is one product. A clean SERP renders 22 cards per page (numbered 16 organic + 6 sponsored slots). -
Total result header:
1-{X} of [over ]{Y,YYY} results— capturestotalResultCount(theYvalue, often"over 200,000"). -
Per-card selectors (use stable
data-cyanchors — these are Amazon's internal test selectors, more stable than CSS classes):Field Selector / regex asinouter data-asinattributetitleh2 > a > spantext insidedata-cy="title-recipe"urlh2 > ahref, normalized tohttps://www.amazon.com/dp/{ASIN}imageimg.s-imagesrcattributeprice.formatted.a-offscreeninsidedata-cy="price-recipe", value like"$39.99"price.rawparse price.formattedminus the$list_price.a-price.a-text-price > .a-offscreen(strikethrough variant). When absent, no sale.discount_pctround(1 - price/list_price)when both presentrating.starsaria-label="X.Y out of 5 stars"insidedata-cy="reviews-block"— extract the leadingX.Yrating.review_countadjacent aria-label="N,NNN ratings"— extractN,NNNand parse to integerprimepresence of i-aok-primeclass ORaria-label="Prime"insidedata-cy="delivery-recipe"sponsoredenclosing card contains AdHolderclass ORs-sponsored-resultdata attramazons_choicetext "Amazon's Choice" inside data-cy="s-pc-faceout-badge"bestsellertext "Best Seller" / "#1 Best Seller in ..." inside data-cy="s-pc-faceout-badge"climate_pledgedata-cy="certification-recipe"contains "Climate Pledge Friendly"deal_labelred-tagged .a-color-price.s-background-color-platinumtext ("Limited time deal", "Lightning Deal", "Coupon: $5 off")ships_from/sold_byinside data-cy="delivery-block"text content
-
-
Paginate. Up to 7 pages of organic results are typically available; Amazon caps at
page=7-20depending on category. Detect end via<a aria-label="Go to next page"presence/absence. -
Release the session:
browse cloud sessions update "$SID" --status REQUEST_RELEASE
Fast path for category-browse intents
For "Bestsellers in {category}" / "New Releases in {category}" / "Movers & Shakers in {category}" — skip /s entirely and use the curated browse roots. These are not Akamai-gated as aggressively as /s (verified 2026-05-18: GET /gp/bestsellers/ returned 644 KB of real HTML with data-asin markers via residential-proxy Fetch, no challenge interstitial).
https://www.amazon.com/gp/bestsellers/{category-slug}/ — top 100 bestsellers
https://www.amazon.com/gp/new-releases/{category-slug}/ — top 100 new releases
https://www.amazon.com/gp/movers-and-shakers/{category-slug}/ — biggest-gainers
https://www.amazon.com/zgbs/{category-slug} — alias for /gp/bestsellers
Slugs are visible in the /zgbs/ HTML: amazon-devices, appliances, arts-crafts, audible, automotive, baby-products, beauty, etc.
Fast path for ASIN lookup
When the caller already has ASINs, skip search and hit /dp/{ASIN} directly. Each /dp/{ASIN} HTML response is >1 MB, so use a Browserbase remote session — NOT cloud fetch (Fetch API is hard-capped at 1 MB response body and will 502).
Static-HTML fallback (browserless)
When a Browserbase remote session is unavailable, browse cloud fetch <SERP-URL> --proxies can return server-side-rendered HTML for single-word, no-filter, no-pagination queries (verified: /s?k=test&ref=nb_sb_noss → 933 KB clean response). This path:
- Returns 70-80% of the per-card fields. Static HTML does contain
data-asin,h2title,.s-image src,.a-offscreenprice,.a-price-wholeinteger, and the rating + review-count aria-labels. - Does NOT contain the Prime icon (
i-aok-prime), Amazon's Choice badge, Climate Pledge badge, bestseller badge, or the sort-dropdown anchor — these are JS-rendered post-load. - Fails closed (returns the 2 KB Akamai
bm-verifyinterstitial HTML) when the query contains: gibberish strings, multi-segmentrh=filters,i={department}shortcuts, orpage>1pagination. Detect by<meta http-equiv="refresh"+bm-verify=in the response body. - Is hard-capped at 1 MB response body (Browserbase Fetch API limit). Most legit SERPs are 900 KB – 1.4 MB; expect ~50% to overflow with a
502 The response body exceeded the maximum allowed size of 1MBerror. Use the browser session for those.
Site-Specific Gotchas
-
Akamai Bot Manager gates
/sroute aggressively. Symptoms: a 2-3 KB HTML body containing<meta http-equiv="refresh" content="5; URL='/s?...&bm-verify=...'" />and aniframe src=https://m.media-amazon.com/images/S/sash/...gifinstead of the SERP grid. Triggers observed during iteration: gibberish queries, multi-segmentrh=filters,i={department}shortcuts, paginated requests, low-confidence query strings, and concatenated keywords (e.g.k=usbcableblocked whilek=usb+cablepassed). The only reliable bypass is a--verifiedBrowserbase session with the homepage-warmed bm-cookies in place before the SERP request. -
Captcha (Robot Check) page is a distinct outcome — DO NOT solve it. Marker:
<title>Robot Check</title>+<form action="/errors/validateCaptcha". Triggered by sustained request rate from a flagged IP or a freshly minted session without a referer chain. Emitsuccess: false, reason: "captcha_wall", ship ascandidateoutcome. Captcha-solving services are out of scope for read-only product extraction. -
Filter IDs are NOT stable across queries / categories. The widely-documented "4 Stars & Up" filter is
p_72:1248915011in the canonical US storefront, but a/s?k=testfetch on 2026-05-18 returnedp_72:3014475011in its rendered filter rail — Amazon resolves IDs per-query / per-category-context, sometimes for A/B-test cohorts. The robust pattern is: (a) try the canonical IDs first; (b) if the resulting filter rail anchor href contains a different ID for the same logical filter, re-request with that ID; (c) cache the (query-prefix, category, filter-key) → ID triple per session, not globally. -
Filter encoding scheme — verified
rh=prefixes from a real SERP fetch (2026-05-18):Prefix Filter dimension Notes n:{id}Department / category node Canonical examples: 172282Electronics,283155Books,1055398Home & Kitchen,7141123011Clothing, Shoes & Jewelry,2619533011Beauty. Combine sub-categories with comma:rh=n:172282%2Cn:172456p_72:{id}Customer rating threshold Canonical: 12489150114★+,12489140113★+,12489130112★+,12489120111★+. Per-query overrides happen — see "Filter IDs are not stable" above.p_36:-{maxCents}/p_36:{minCents}-{maxCents}/p_36:{minCents}-Custom price range (cents) p_36:-15000= ≤$150.p_36:2500-5000= $25–$50.p_36:20000-= $200+.p_n_price_fma:{id}Preset price bucket Eight buckets observed ( 10346812011–10346819011) corresponding to the displayed labels:$0 to $1,$1 to $3,$3 to $5,$5 to $10,$10 to $15,$15 to $20,Under $10,Over $20. IDs vary per-category.p_85:{id}Prime eligibility Canonical: 2470955011(US).p_76:{id}Free shipping / FBA Per-storefront IDs. p_90:{id}Seller — Amazon-as-seller Canonical: 8308921011(US). For third-party / specific seller, useemi=query param or the seller-idme=shortcut.p_n_deal_type:{id}Deals Three observed values: Today's Deals, Lightning Deals, Coupons — IDs in the 23566065011family.p_n_condition-type:{id}Condition 6461716011New,6461717011Used,6461718011Renewed/Refurbished (US).p_n_date:{id}New arrivals Last 30 / 90 days etc. p_n_availability:{id}Include out-of-stock Toggle for showing OOS items. p_n_feature_browse-bin:{id}Apparel: Color, Size, Material, Pattern, Fit Highly category-specific; discover from filter rail. p_n_climate_pledge_friendly:{id}Climate Pledge Friendly Niche filter. p_n_subscribe_save_eligibility:{id}Subscribe & Save Niche filter. p_n_small_business:{id}Small Business Niche filter. Multi-filter URL-encoding uses
%2Cbetween segments (raw comma). Same key repeated forms a logical OR; different keys form a logical AND. -
Sort enum (
s=query param):Value Display label relevanceblenderFeatured (default) price-asc-rankPrice: Low to High price-desc-rankPrice: High to Low review-rankAvg. Customer Review date-desc-rankNewest Arrivals exact-aware-popularity-rankBest Sellers -
Pagination URL shape:
?page=N&xpid={experiment-id}&qid={timestamp}&ref=sr_pg_N.xpidandqidare surfaced by Amazon's renderer but are not required for a correct response — pass onlypage=Nfor clean re-requests. Pages 2+ are heavily Akamai-gated; a browser session warmed by an earlier page-1 fetch is the reliable path. -
22 cards per page, not 24. The "default 24 results" mentioned in some references is the target count but the rendered SERP typically shows 22 organic cards with the remaining slots occupied by sponsored content (
AdHolderdivs interleaved). Always deriveresults_returnedfrom actual card count, not from a fixed assumption. -
Sponsored cards repeat the same ASIN as organic. When parsing, the same
data-asinvalue sometimes appears in two consecutive<div>s — once withAdHolder(sponsored placement) and once organically lower on the page. Dedupe by(asin, sponsored)tuple, or surface both withpositionandsponsoredflags so the caller can decide. -
Storefront TLD maps to a separate ID space. Filter IDs (
p_72,p_85, etc.) and category nodes (n:...) are NOT portable across.com/.co.uk/.de/.ca/.jp. Each storefront uses its own integer ranges. The skill must re-discover IDs per locale via the rendered filter rail. UK uses.co.uk, Germany/Austria/Switzerland use.de, Canada uses.ca, Japan uses.co.jp, Australia uses.com.au, Mexico uses.com.mx, Brazil uses.com.br, India uses.in. -
application/ld+jsonis NOT embedded on SERPs. Verified 2026-05-18: zero<script type="application/ld+json">blocks in a clean/s?k=testresponse. Product LD-JSON appears only on/dp/{ASIN}detail pages. Do not waste time looking for structured data on/s. -
JS-rendered badges are missing from
browse cloud fetchoutput. The Browserbase Fetch API path returns server-side-rendered HTML before Amazon's lazy-loaded widgets populate. Specifically absent:i-aok-primeicons, "Amazon's Choice" / Bestseller / Climate Pledge badges, and the sort-dropdown anchor. To extract these, drive a full browser session (which executes the JS) and snapshot afterwait timeout 1500. -
1 MB response cap on
browse cloud fetch. The Fetch API returns502 The response body exceeded the maximum allowed size of 1MB. Use a browser session to handle large responses.for any response payload ≥ 1 MB. Observed:/dp/{ASIN}pages and most full-category SERPs exceed this./gp/bestsellers/(~644 KB),/robots.txt(2 KB), and/s?k=test&ref=nb_sb_noss(933 KB) fit comfortably. Use a browser session as primary; reach forcloud fetchonly as a fallback for known-small endpoints. -
DO NOT click
data-cy="add-to-cart", the "Buy Now" button, "Subscribe & Save", "Sign In", "Try Prime", or any other mutation control. Read-only is the contract. Stop at the SERP — never navigate into the product detail page's purchase flow. -
Region routing on bare
amazon.com. When the request IP is in a non-US country, Amazon may redirect to the local storefront mid-request (302 to/). Lock the locale by including the storefront TLD in the URL and setting&language=en_USif the response would otherwise localize text. -
browse cloud fetch --proxiesis geo-locked by Browserbase residential proxy egress. During iteration, the egress IP was US-west; results matched US storefront pricing. For non-US storefronts, the proxy region matters — Browserbase doesn't currently expose a per-storefront proxy region flag, so non-US localization may be inconsistent. -
xpid/qid/ref=sr_*/dib=...tracking params are noise. Strip from extracted product URLs before emitting — the canonical clean form ishttps://www.amazon.com/dp/{ASIN}(or/{locale-slug}/dp/{ASIN}if locale is meaningful). -
Volatility: Amazon changes selectors quarterly. The
data-cytest-selector layer has been stable since early 2024 but is not contractual. Ifdata-cy="title-recipe"returns empty, fall back toh2 a spantext. Maintain a per-selector fallback chain.
Expected Output
Successful result with full filter set applied:
{
"success": true,
"storefront": "amazon.com",
"query": "wireless mechanical keyboard",
"filters_applied": {
"min_rating": 4,
"price_max_usd": 150,
"prime_only": true,
"sort": "review-rank",
"department": null,
"brand": null,
"condition": null
},
"url_used": "https://www.amazon.com/s?k=wireless+mechanical+keyboard&rh=p_72%3A1248915011%2Cp_36%3A-15000%2Cp_85%3A2470955011&s=review-rank&page=1",
"total_result_count": 4000,
"total_result_count_is_approximate": true,
"results_returned": 22,
"page": 1,
"has_next_page": true,
"products": [
{
"position": 1,
"asin": "B0XXXXXXXX",
"title": "Keychron K8 Pro QMK/VIA Wireless Mechanical Keyboard",
"brand": "Keychron",
"image": "https://m.media-amazon.com/images/I/81xxxxxxxxL._AC_UY218_.jpg",
"thumbnails": [
"https://m.media-amazon.com/images/I/71xxxxxxxxL._AC_UY218_.jpg"
],
"price": { "formatted": "$129.99", "raw": 129.99, "currency": "USD" },
"list_price": { "formatted": "$149.99", "raw": 149.99 },
"discount_pct": 13,
"rating": { "stars": 4.5, "review_count": 1820 },
"prime": true,
"sponsored": false,
"amazons_choice": false,
"bestseller": false,
"climate_pledge": false,
"deal_label": null,
"ships_from": "Amazon.com",
"sold_by": "Keychron Official",
"url": "https://www.amazon.com/dp/B0XXXXXXXX"
}
]
}
Zero-result outcome (valid, not a failure):
{
"success": true,
"storefront": "amazon.com",
"query": "asdfkjasdljaksdljasldkjasl",
"total_result_count": 0,
"results_returned": 0,
"page": 1,
"products": []
}
Captcha / Robot Check wall (ship as candidate, do NOT attempt to solve):
{
"success": false,
"reason": "captcha_wall",
"storefront": "amazon.com",
"query": "wireless mechanical keyboard",
"url_attempted": "https://www.amazon.com/s?k=wireless+mechanical+keyboard&...",
"page_title": "Robot Check",
"screenshot_path": "screenshots/04-robot-check.png",
"trigger_hint": "Sustained request rate from this residential-proxy IP or missing bm-verify session cookie. Retry with a fresh session + homepage warm-up."
}
Akamai bm-verify interstitial (HTML shape: 5-second meta-refresh + iframe + _sec/verify POST script):
{
"success": false,
"reason": "akamai_interstitial",
"storefront": "amazon.com",
"query": "...",
"url_attempted": "...",
"response_size_bytes": 2310,
"screenshot_path": "screenshots/02-akamai-bot-wall.png",
"trigger_hint": "Static-HTML Fetch path triggered Akamai. Retry via a verified Browserbase remote session with proxies + homepage warm-up."
}
ASIN-direct lookup (used when caller passes ASINs instead of a query):
{
"success": true,
"storefront": "amazon.com",
"asins_requested": ["B08XXXXX01", "B09XXXXX02"],
"results_returned": 2,
"products": [
{
"asin": "B08XXXXX01",
"title": "...",
"url": "https://www.amazon.com/dp/B08XXXXX01",
"price": { "formatted": "$24.99", "raw": 24.99, "currency": "USD" },
"rating": { "stars": 4.6, "review_count": 12504 },
"prime": true,
"ships_from": "Amazon.com",
"sold_by": "Acme Inc."
}
]
}
Ambiguous query (Amazon's "Did you mean" / spell-correction interstitial):
{
"success": true,
"storefront": "amazon.com",
"query": "wirless mechanikal keybord",
"spell_correction_suggested": "wireless mechanical keyboard",
"spell_correction_applied": true,
"url_used": "https://www.amazon.com/s?k=wireless+mechanical+keyboard&...",
"total_result_count": 4000,
"results_returned": 22,
"products": [ /* ... */ ]
}