Amazon India — Browse Products
Purpose
Search Amazon.in (the India marketplace) for products by keyword and return the first-page search-result listings as structured records — ASIN, title, current price (INR), MRP/list price, star rating, review count, canonical product URL, primary thumbnail image, sponsored flag, and free-delivery flag. Read-only; never adds to cart, never checks out, never signs in.
When to Use
- Capturing a price/rating snapshot for a query (price tracking, competitive intel).
- Building a comparison table across queries (e.g. "wireless earbuds under 2000", "python programming book").
- Feeding ASIN + canonical URL into a downstream product-detail crawler.
- Anywhere you'd otherwise scrape Amazon.in HTML by hand — a deep-link search URL plus a single
document.querySelectorAllpass beats clicking through the search form.
Workflow
There is no public Amazon Product API available without a Seller Central / Product Advertising API approval and access keys. The deep-link search URL (https://www.amazon.in/s?k=<query>) is the reliable shortcut: it's an unauthenticated GET, accepts a small set of well-known query parameters (page, s for sort, rh for refinements), and renders fully server-side — every result card is in the initial HTML, no scroll/XHR pagination required. The extraction below runs in one document.querySelectorAll pass; do not try to drive the search form via the homepage #twotabsearchtextbox — the homepage is heavier (~25s wall time, multiple A/B-test variants of the search box), the deep link is ~3s and identical.
A Browserbase session with --verified --proxies (Indian residential proxy) is required when the outbound IP is outside India — amazon.in serves a reduced/redirect homepage to non-Indian IPs and a CSRF challenge on the search endpoint. With verified+proxies enabled, no captcha or login wall was observed across 4 distinct queries (commodity electronics, books, ascending-price sort, page 2 pagination).
-
Create a stealth Browserbase session.
sid=$(browse cloud sessions create --keep-alive --proxies --verified \ | node -e "let s='';process.stdin.on('data',c=>s+=c).on('end',()=>process.stdout.write(JSON.parse(s).id))") export BROWSE_SESSION="$sid" -
Deep-link to the search URL. Skip the homepage entirely.
QUERY="wireless earbuds under 2000" ENC=$(node -e "process.stdout.write(encodeURIComponent(process.argv[1]).replace(/%20/g,'+'))" "$QUERY") browse open "https://www.amazon.in/s?k=$ENC" --remote --session "$sid" browse wait timeout 3000 --remote --session "$sid"Optional query params:
page=N— pagination (1-indexed). 22 raw cards/page, ~19 unique after dedup.s=price-asc-rank(low→high),s=price-desc-rank(high→low),s=review-rank(avg rating),s=date-desc-rank(newest),s=relevanceblender(default).rh=p_36:50000-200000— price range in paise (×100, so50000-200000= ₹500–₹2000). Combine with&dcfor "department-confined".i=stripbooks/i=electronics/i=fashion— department refinement (thei=value matches the slug shown in Amazon's left-nav department links).
-
Extract product cards via one DOM pass. Pipe this script through
browse eval(the IIFE pattern below returns a JSON string thatbrowse evalwill surface in itsresultfield):(() => { const ORIGIN = 'https://www.amazon.in'; const decodeAsin = href => { if (!href) return null; if (href.startsWith('/sspa/click')) { try { const dest = new URL(href, ORIGIN).searchParams.get('url'); if (dest) { const m = decodeURIComponent(dest).match(/\/dp\/([A-Z0-9]{10})/); if (m) return m[1]; } } catch (e) {} } const m = href.match(/\/dp\/([A-Z0-9]{10})/); return m ? m[1] : null; }; const parsePrice = t => { if (!t) return null; const d = t.replace(/[^\d.]/g, '').replace(/\./g, ''); return d ? parseInt(d, 10) : null; }; const parseReviewCount = t => { if (!t) return null; const c = t.replace(/[(),]/g, '').trim(); const m = c.match(/^([\d.]+)\s*([KMkm]?)$/); if (!m) return parseInt(c.replace(/\D/g, ''), 10) || null; const n = parseFloat(m[1]), s = m[2].toLowerCase(); if (s === 'k') return Math.round(n * 1000); if (s === 'm') return Math.round(n * 1e6); return Math.round(n); }; const parseRating = t => { if (!t) return null; const m = t.match(/^([\d.]+)/); return m ? parseFloat(m[1]) : null; }; const items = document.querySelectorAll('[data-component-type="s-search-result"]'); const out = [], seen = new Set(); items.forEach(el => { const rawAsin = el.getAttribute('data-asin'); if (!rawAsin) return; const linkEl = el.querySelector('h2 a, a.s-line-clamp-2, a.s-no-outline'); const href = linkEl?.getAttribute('href'); const asin = decodeAsin(href) || rawAsin; if (seen.has(asin)) return; seen.add(asin); const titleEl = el.querySelector('h2 span'); const priceWhole = el.querySelector('.a-price:not(.a-text-price) .a-price-whole'); const priceSym = el.querySelector('.a-price:not(.a-text-price) .a-price-symbol'); const priceOff = el.querySelector('.a-price.a-text-price .a-offscreen'); const ratingAria = el.querySelector('[aria-label*="out of"]'); const ratingAlt = el.querySelector('.a-icon-alt'); const reviewsEl = el.querySelector('a span.s-underline-text, a[aria-label*="ratings"] span'); const sponsoredEl = el.querySelector('[aria-label="Sponsored"], .puis-sponsored-label-text, .s-sponsored-label-text'); const imageEl = el.querySelector('img.s-image'); const ratingText = ratingAria?.getAttribute('aria-label') || ratingAlt?.textContent || null; out.push({ asin, title: titleEl?.textContent.trim() || null, price_inr: parsePrice((priceSym?.textContent || '') + (priceWhole?.textContent || '')), mrp_inr: parsePrice(priceOff?.textContent), rating: parseRating(ratingText), review_count: parseReviewCount(reviewsEl?.textContent), url: `${ORIGIN}/dp/${asin}`, image: imageEl?.getAttribute('src') || null, sponsored: !!sponsoredEl, free_delivery: (el.textContent || '').includes('FREE delivery'), }); }); return JSON.stringify({ query: location.search, total: items.length, items: out }); })() -
Take the top N entries of
itemsfor return to the caller. Sponsored items are surfaced at the top by Amazon's ranker —result_countshould be applied after dedup (sponsored cards re-appear inline mid-page, which is why the extractor dedupes by canonical ASIN). -
Paginate by re-navigating with
&page=Nand re-running the extractor — there is no client-side incremental loading; each page is a fresh server-rendered HTML document with ~22 raw cards. -
Release the session.
browse cloud sessions update "$sid" --status REQUEST_RELEASE
Site-Specific Gotchas
- Geo-IP filtering: amazon.in serves a degraded homepage and rejects search to non-Indian IPs without a captcha challenge. Always use
--proxies --verifiedwhen running from a US/EU sandbox. With Indian residential proxies, no captcha or login wall was encountered across electronics, books, and price-sorted queries. - Prime badge is effectively absent from amazon.in search cards — across 4 queries in 2026, zero matches for
i.a-icon-prime,.s-prime, or[aria-label*="Prime"]. The Indian site advertises delivery on the result card via the literal string"FREE delivery"instead. Do not try to extract a Prime boolean; capturefree_delivery: el.textContent.includes('FREE delivery')instead. - Sponsored cards duplicate canonical ASINs. A sponsored card and the organic card for the same product both have
data-asin="B0XXX..."on the wrapper, but the sponsored card's<a href>is a tracker redirect/sspa/click?...&url=%2F...%2Fdp%2FB0XXX%2F...while the organic card's href is the direct/dp/B0XXX/ref=sr_1_N. Decode theurlquery-param of the sspa redirect to extract the canonical ASIN, then dedupe by ASIN — otherwise you ship ~22 items where 2–4 are duplicates. - ASIN format varies by category. Electronics use 10-char
B0[A-Z0-9]{8}. Books use 10-digit ISBNs (often1XXXXXXXXXor9XXXXXXXXX). The regex\/dp\/([A-Z0-9]{10})covers both; don't constrain toB0\w{8}. - Price comes in two DOM nodes, currency symbol + integer whole. Selector pattern:
.a-price .a-price-symbol+.a-price .a-price-whole. The integer is rendered without thousand-separator commas in the text node (CSS-injected via:before/:afterfor display only), sotext.replace(/[^\d]/g,'')parses correctly. Beware of the.a-text-pricesibling — that is the strike-through MRP/list price, exposed via.a-offscreen. Always filter the current price selector with:not(.a-text-price). s=sort parameter values are kebab-rank suffixes:price-asc-rank,price-desc-rank,review-rank,date-desc-rank(recent),exact-aware-popularity-rank,relevanceblender(default). Other values silently fall back to relevance.- Price-filter param
rh=p_36:lo-hiis in paise (₹ × 100).rh=p_36:50000-200000≠ ₹500 to ₹2000 in INR; it actually IS ₹500–₹2000 because the prefixp_36is the price-range refinement and the values are in paise (50000 paise = ₹500). Empirically confirmed — sort byprice-asc-rankafter applying this filter and the lowest result is ≥₹500. - Rating is in
aria-labelof a popover trigger, not in the.a-icon-altof the star icon child. Selector[aria-label*="out of"]reliably hits the trigger<a aria-label="4.2 out of 5 stars, rating details">. The fallback.a-icon-alttext"4.2 out of 5 stars"works for non-sponsored cards but is missing/empty on some sponsored placements; prefer the aria-label. - Review count uses K/M abbreviation in parentheses — e.g.
(13.4K)→ 13,400,(1.5K)→ 1,500. The raw integer is not in the DOM text; only the abbreviated form. Parse the suffix. - Don't waste time on the homepage search form. Driving
#twotabsearchtextbox+ pressing Enter works but adds ~20s and one extra page load with no benefit. Deep-linkhttps://www.amazon.in/s?k=<encoded>is the canonical path. - No JSON API. The undocumented
/s?k=<q>&format=jsonendpoint returns HTML, not JSON. The official Product Advertising API (webservices.amazon.in/paapi5/searchitems) requires a Seller Central account and access-key signing — not available for general scraping. Confirmed dead-end during iteration; do not waste turns probing. net::ERR_ABORTEDfailures are benign. The browser-trace summary shows 7–64 failed requests per page navigation, allnet::ERR_ABORTEDon Scripts/XHRs cancelled by subsequent navigations or prefetch teardown. None affect the rendered search-result HTML. Do not treat these as anti-bot blocks.
Expected Output
{
"success": true,
"search_query": "wireless earbuds under 2000",
"search_url": "https://www.amazon.in/s?k=wireless+earbuds+under+2000",
"total_cards_on_page": 22,
"result_count": 10,
"products": [
{
"asin": "B0FMDL81GS",
"title": "OnePlus Nord Buds 3r TWS Earbuds up to 54 Hours Playback, 2-mic Clear Calls, 3D Spatial Audio, AI Translation, 12.4mm Drivers, Dual-Device Connectivity, 47ms Low Latency - Ash Black",
"price_inr": 1999,
"mrp_inr": null,
"rating": 4.3,
"review_count": 44300,
"url": "https://www.amazon.in/dp/B0FMDL81GS",
"image": "https://m.media-amazon.com/images/I/51nBTTG3hNL._AC_UY218_.jpg",
"sponsored": false,
"free_delivery": true
},
{
"asin": "B0BW8TXJJ2",
"title": "Boat Nirvana Ion, 120HRS Battery, ...",
"price_inr": 1699,
"mrp_inr": 7990,
"rating": 4.1,
"review_count": 13400,
"url": "https://www.amazon.in/dp/B0BW8TXJJ2",
"image": "https://m.media-amazon.com/images/I/81-TGXuOMAL._AC_UY218_.jpg",
"sponsored": true,
"free_delivery": true
}
],
"error_reasoning": null
}
Outcome shapes observed during 1-iter convergence across 4 queries:
results_ok—success: true,products[]populated. The common path. Example above.results_ok_books— same shape;asinis a 10-digit ISBN (1636512933,9367257651),mrp_inris usually present (MRP is mandatory on books),imageandratingpopulated. No structural difference — just be aware that the ASIN regex must accept digits-only IDs.results_empty—total_cards_on_page: 0,products: []. Returned for nonsense queries (?k=qwerasdf123); Amazon renders a "No results" banner. Treat as success with empty array, not as failure.geo_blocked— when the session is run without--proxies --verifiedfrom a non-IN IP, the homepage redirects to a thin landing with no search-bar accessibility refs and/s?k=…returns a "Sorry, we couldn't find that page" body.total_cards_on_page: 0. Setsuccess: false,error_reasoning: "Geo-blocked: amazon.in requires an Indian IP. Re-run session with --proxies --verified.".captcha_wall— not observed under--verified --proxiesduring this iteration, but documented for completeness: Amazon's "Enter the characters you see" page renders#captchacharactersinput. If present, abort withsuccess: false,error_reasoning: "Captcha wall — session fingerprint flagged; rotate proxy + retry."Do not attempt to solve.