Cruise Critic Extract Reviews — Browser Skill
Purpose
Given a Cruise Critic ship page (a https://www.cruisecritic.com/cruise/{cruise-line-slug}/{ship-slug}/reviews URL, or a cruise line + ship name pair resolved via Browserbase Search), extract ship-level metadata (name, line, year built, passenger capacity, crew, overall rating, total review count, per-category averages) plus a filtered slice of member reviews each with {review_id, reviewer_username, reviewer_traveler_type, sailed_date, cruise_length_nights, destination, cabin_type_booked, overall_rating, sub_ratings: {...}, title, body_text, helpful_vote_count, review_url} and any cruise-line response. Read-only — never click Write a Review, Sign In, helpful-vote, or report-review controls.
When to Use
- Aggregating Cruise Critic sentiment for a ship across a date window or destination.
- Building a comparison table of ships within a cruise line (one skill invocation per ship).
- Quoting recent member-review excerpts in a research brief with provenance back to the source review page.
- Pulling the canonical per-category rating breakdown (Cabins, Dining, Entertainment, Public Rooms, Fitness & Recreation, Family, Shore Excursion, Embarkation, Service, Value for Money) used by the site's own award logic.
Workflow
Cruise Critic is a Next.js (Apollo Client) SSR site protected by DataDome (X-Datadome: protected on every response). The reviews-list page is server-rendered with the full visible review payload baked into the HTML (typically 800 kB – >1 MB), so the public browse cloud fetch path can return 502 The response body exceeded the maximum allowed size of 1 MB for the list, and DataDome will start serving captcha-challenge HTML after roughly 5–8 unauthenticated browse cloud fetch calls from the same source IP. Lead with a Browserbase session with stealth and residential proxies enabled. Two undocumented shortcuts that hold up under stealth dramatically cut cost — both are documented below as optimizations.
Recommended path — Browserbase stealth session + per-review _next/data enrichment
-
Start a stealthed session.
sid=$(browse cloud sessions create --keep-alive --proxies --verified | jq -r .id) export BROWSE_SESSION="$sid"--verified(advanced stealth) plus--proxies(residential) is mandatory. A bare session is served a DataDome interstitial on the first navigation to/cruise/.../reviews. -
Resolve the ship URL if the caller gave you a line + ship name instead of a URL:
# Cheap, no session needed browse cloud search "site:cruisecritic.com $LINE $SHIP reviews" \ | jq -r '.results[] | select(.url | test("/cruise/[^/]+/[^/]+/reviews$")) | .url' \ | head -1The canonical reviews URL is always
https://www.cruisecritic.com/cruise/{cruise-line-slug}/{ship-slug}/reviews. Both slugs are kebab-case (royal-caribbean,symphony-of-the-seas,norwegian-cruise-line,viking-jupiter, …). -
Optionally narrow by destination via URL path. Only
destinationis a URL-form filter; all other filters are React UI state./cruise/{line}/{ship}/reviews/destination/{destination-slug}Verified working destination slugs:
usa,caribbean,eastern-caribbean,western-caribbean,southern-caribbean,bahamas,mediterranean,europe,alaska,asia. The site canonicalises lower-case kebab; unknown slugs 404. -
Open the list page with the session and capture both the rendered HTML and the Next.js data.
browse open "https://www.cruisecritic.com/cruise/$LINE_SLUG/$SHIP_SLUG/reviews${DEST_PATH:+/destination/$DEST_PATH}" \ --remote --session "$sid" --wait load --timeout 60000 browse wait timeout 2500 --remote --session "$sid" # Apollo hydration settles browse get html body --remote --session "$sid" > /tmp/list.htmlExtract the
__NEXT_DATA__blob — it carries everything you need without any further DOM scraping:node -e ' const html = require("fs").readFileSync("/tmp/list.html", "utf8"); const m = html.match(/<script id="__NEXT_DATA__"[^>]*>([\s\S]*?)<\/script>/); const j = JSON.parse(m[1]); require("fs").writeFileSync("/tmp/list-next.json", JSON.stringify(j, null, 2)); 'j.props.pageProps.apolloStateis the Apollo cache. Key entries on a list page:Ships:{shipId}— ship core (name,seoName,slug,professionalOverallRating,totalShoreExcursions,cruiseLine.slug,reviewStatus).ShipAttributes:{attrId}—{ passengerCapacity, totalCrew, maidenDate }(year built). Linked fromShips:{id}.attributes.__ref.ROOT_QUERY.searchReviewsWithFilters({"filters":{"isPhotoJournal":false,"shipId":[{id}]},"limit":N})→{ totalResults, stats: { averageMemberRating } }.Reviews:{reviewId}for every visible review in the current filter/sort/page bucket. EachReviews:{id}carries{ id, cruisedOn, hasChildren, withDisabled, numberOfCruisesTakenGroupId, cabinCategory, user.__ref, entries: [ReviewEntries refs] }.ReviewEntries:{entryId}—{ reviewCategory, rating }for one sub-category.SsoUser:{userKey}—{ username }for the reviewer.
Pull review IDs with one expression:
node -e ' const j = require("/tmp/list-next.json"); const a = j.props.pageProps.apolloState; console.log(JSON.stringify(Object.keys(a) .filter(k => k.startsWith("Reviews:")) .map(k => a[k].id))); ' -
Apply rating / traveler-type / cabin / sailed-within / sort / language filters in-browser.
These are not URL filters. They render as a row of pills + dropdowns above the listing. The shape is stable: each filter is a button with
aria-haspopup="listbox"and an accessible name like"Rating: Any","Traveler Type: Any","Cabin Type: Any","Sailed Within: Any","Sort By: Most Helpful". Pattern for each filter:browse snapshot --remote --session "$sid" # In the snapshot, find the button by its accessible name then the corresponding listbox option. browse click "@<button-ref>" --remote --session "$sid" browse wait timeout 500 --remote --session "$sid" browse click "@<option-ref>" --remote --session "$sid" browse wait timeout 1500 --remote --session "$sid" # Apollo refetchAfter each filter change, the URL stays the same but
__NEXT_DATA__is regenerated on the next page load — to refresh it from React state, re-snapshot the body HTML or usebrowse evalagainstwindow.__APOLLO_STATE__(Apollo writes the latest cache there if exposed; otherwise re-read the page viabrowse get html body). The simpler, cheaper alternative for non-destination filters is to fetch all visible reviews first and filter client-side fromcruisedOn/entries/numberOfCruisesTakenGroupId— the data is denser than the UI exposes (e.g.,hasChildren/withDisabledflags let you reconstruct the Family / Disabled traveler-type filter without a click). -
Paginate. The site paginates ~10 reviews per page via an infinite-scroll / "Load more" pattern. Trigger more reviews to render:
# repeat until totalResults reached or you have enough browse press End --remote --session "$sid" browse wait timeout 1500 --remote --session "$sid" # OR click the explicit "Load more reviews" button if present in the snapshotEach load merges new
Reviews:{id}entries into the Apollo cache. Re-extract__NEXT_DATA__or re-snapshot to capture the growing set. -
Enrich each visible review with its full body. The list-page payload contains review sub-rating entries but does not include the review body text or title — that lives only on the per-review page. Two paths, in cost order:
-
(Cheap, preferred)
_next/dataJSON endpoint — fetches the per-review SSR props as ~150 kB JSON, well under the fetch 1 MB cap, and DataDome currently allows it through Browserbase Fetch when the request rides residential proxies. The build ID is on every page (grep -oE '"buildId":"[^"]+"' /tmp/list.html | head -1). Example:GET https://www.cruisecritic.com/_next/data/{buildId}/cruise/{line}/{ship}/reviews/{reviewId}.json ?cruise-line-slug={line}&cruise-ship-slug={ship}&review-id={reviewId}Response shape (verified):
pageProps.reviewwith{ id, title, shipReview (the body text, ~9 kB typical), cruisedOn, overallRating, helpfulVotes, cabinCategory, destination: {id, slug, seoName}, user: {username}, hasChildren, withDisabled, numberOfCruisesTakenGroupId, entries: [{reviewCategory, rating}], comments (cruise-line response, when present), images, nextReview: {id}, previousReview: {id} }. Pace requests to ≤ 1 req/sec with brief jitter; DataDome served captcha HTML after roughly 5 back-to-back unauthenticated bursts in testing. -
(Fallback) Same
browse open ... /reviews/{id} --remote --session "$sid"flow inside the active session. ~5–8× more wall time per review than the_next/datapath but immune to per-IP fetch throttling because the session traffic shares stealth + residential proxy state.
-
-
(Walk-the-chain optimisation) When the caller doesn't need filters, you can skip the list-page entirely. The
_next/dataJSON for any review containsnextReview.idandpreviousReview.idfor adjacent reviews in the site's default ordering — walk the chain in either direction until enough reviews are collected. This eliminates the >1 MB list-page load and the entire pagination loop. The chain order is approximately reverse-chronological but is not strictly sorted; verify bycruisedOnif you need date ordering. -
Map sub-category labels.
entries[].reviewCategoryuses internal camelCase keys. Translate to the user-facing labels in your output:reviewCategory(API)UI label cabinCabins diningDining entertainmentEntertainment publicRoomsPublic Rooms fitnessAndRecreationFitness & Recreation familyFamily shoreExcursionShore Excursion embarkationEmbarkation serviceService valueForMoneyValue for Money Not every reviewer scores every category; absent entries are simply omitted from
entries[]. Treat missing categories asnull, not0. -
Derive
reviewer_traveler_type. The site shows it as a badge but the data is split across three fields onReviews:{id}:hasChildren: true→FamilywithDisabled: true→Disabled- Otherwise the badge string ("Couple", "Solo", "Friends", "Senior") is rendered from a separate Apollo entity that is not always in
apolloStateon the list page — it is reliably present in the per-review_next/dataJSON under a sibling field. If you need the full string for every review, source it from the per-review fetch in step 7.
-
Build review URLs:
https://www.cruisecritic.com/cruise/{cruise-line-slug}/{ship-slug}/reviews/{review-id}. -
Release the session.
browse cloud sessions update "$sid" --status REQUEST_RELEASE
Browser fallback (no _next/data)
If DataDome starts blocking _next/data JSON (it can happen on a hot residential exit IP), do all enrichment through browse open ... --remote --session "$sid" page loads inside the same stealthed session. Extract the review body from the rendered DOM via the JSON-LD <script type="application/ld+json"> block — the Product schema's review[0] contains name, datePublished, reviewBody (truncated to ~200 chars), author.name, reviewRating.ratingValue — and supplement the truncated body with browse get text body filtered to the main <article> selector.
Site-Specific Gotchas
- Anti-bot: DataDome is on every route.
X-Datadome: protectedappears on all responses;X-Datadome-Isbot: falseon the first few from a fresh proxy IP, then captcha HTML (<html lang="en"><head><title>cruisecritic.com</title>…geo.captcha-delivery.com…) once the IP gets flagged.browse cloud sessions create --proxies --verifiedis mandatory. Avoid sustained fan-out viabrowse cloud fetch— keep per-review enrichment inside the active stealth session, or pace_next/datafetches to ≤ 1 req/sec with jitter. - Reviews-list HTML is > 1 MB.
browse cloud fetchreturns502 The response body exceeded the maximum allowed size of 1 MBfor/cruise/.../reviewsand any/cruise/.../reviews/destination/{slug}page (verified onusa,caribbean,mediterranean,alaska,bahamas,europe,asia— all > 1 MB). Always use a browser session for the list. The per-review_next/dataJSON is ~150 kB and well under the cap. _next/dataJSON list path is DataDome-blocked.GET /_next/data/{buildId}/cruise/{line}/{ship}/reviews/destination/{slug}.jsonreturns HTTP 403 with the DataDome challenge cookie, whereas the per-review variantGET /_next/data/{buildId}/cruise/{line}/{ship}/reviews/{review-id}.json?...&review-id={id}returns 200 on the same session and IP. Don't chase the list-page JSON endpoint — it's not a viable shortcut.- GraphQL endpoint is not externally callable. The site is Apollo Client + Next.js SSR, but the public-facing
/graphqlroute is not exposed in any JS bundle reachable viabrowse cloud fetch(chunk inventory inspected:framework,main,webpack, per-page chunks — no graphql URL strings). The_app.jsbundle exceeds 1 MB and cannot be inspected from this path. Don't waste iterations hunting a direct GraphQL POST endpoint — the_next/dataJSON return is functionally equivalent and authenticates the same way the page does (no API key, just DataDome cookie). - Only
destinationis a URL-form filter. Verified 404 for/reviews/rating/{N},/reviews/traveler-type/{slug},/reviews/cabin-type/{slug},/reviews/sailed-within/{window},/reviews/sort/{key},/reviews/language/{lang}, and/reviews/page/{N}. Querystring forms (?rating=5,?page=2,?sortBy=mostRecent,?travelerType=family) are silently ignored and return the unfiltered listing. All non-destination filters require clicking the React UI inside a session. - Pagination is infinite-scroll, not numbered.
?page=Nand/page/{N}both fall through to the unfiltered first page. Trigger additional reviews by pressingEndor clicking the explicit "Load more reviews" control in the snapshot. ~10 reviews load per increment. - Sub-category names are camelCase in the API. The list-page
entries[].reviewCategoryusesvalueForMoney,publicRooms,fitnessAndRecreation,shoreExcursion— translate to the user-facing labels in your output mapping. Not every reviewer scores every category; missing →null, not0. destinationon a review is the itinerary destination, not the home port. It is an object{ id, slug, seoName }(e.g.,{slug:"eastern-caribbean", seoName:"the Eastern Caribbean"}). The departure port lives separately onDeparturePorts:{id}underShips:{id}.departurePorts({"countryId":1}).cabinCategoryis frequentlynull. Many reviews don't pin a cabin type; if the caller wants a Cabin filter, fall back toReviewCabinPivots:{id}references on the review (when present, they carry{cabinType, deck, room}granularity).numberOfCruisesTakenGroupIdis a bucketed-experience integer (1 = first cruise, larger = more experienced). Site renders this as a badge ("First time cruiser", "Experienced", etc.) but the mapping table is internal — emit the integer and let the consumer interpret, or hardcode the observed mapping (1=first, 2=novice, 3=intermediate, 4+=experienced) with a_unverifiedflag.nextReview/previousReviewchains are not strictly chronological. They walk the site's default ordering, which is similar-but-not-identical to "Most Helpful". For strict date-window filtering, paginate via the list page and sort bycruisedOnclient-side rather than walking the chain past your date boundary.- Cruise-line responses live on
review.comments. When the cruise line responded to a review,pageProps.review.commentsis a non-null object{ comment, user: {userName, title} }. Empty otherwise. Worth including inExpected Outputbecause some downstream use-cases ask for it explicitly. shipReviewis HTML-escaped plain text with\r\nnewlines. Decode",&, etc., before emitting. There is no rich-text markup.- Build ID rotates per deploy.
dpl_EoXom4Tk8881A4KbrwTMYtbTjHKs/build-TfctsWXpff2fKSwere live during this skill's authoring. Always extract the currentbuildIdfrom the page HTML before constructing_next/dataURLs — a stale build ID 404s. - AI-training crawlers are blocked in
robots.txt(User-agent: GPTBot|ClaudeBot|Google-Extended|Cohere-ai|CCBot|...), but Disallow rules underUser-agent: *cover/search,/feeds,/member-center,/storyblok/, etc. — not/cruise/.../reviews. The review pages themselves are publicly indexable; the AI-crawler block is a policy signal rather than a per-route enforcement and DataDome operates regardless of user agent. Set a realistic browser UA on your session (Browserbase stealth does this by default). - Two ID spaces exist —
Reviews:{id}(the review_id in URLs) andReviewEntries:{id}(per-subcategory rating rows). Don't conflate them. The review URL is built with theReviews:id; theReviewEntries:ids never appear in a URL. - The
mralegacy path ({port}-{line}-{ship}-{destination}-cruises_dp{N}-cl{N}-sh{N}-de{N}/mra) 308-redirects to/cruise/{line}/{ship}/reviews/destination/{slug}. Don't try to use it directly — follow the redirect and treat the new-shape URL as canonical. - Read-only. Do not click
Write a Review,Sign In,Helpful/ vote controls, orReport Review. The first two start auth flows; the latter two mutate state and are disallowed by the task contract.
Expected Output
Two shapes — success with payload, and error with reason. The skill emits success even when the filter window returns zero reviews (the empty array carries the same provenance + ship metadata as a populated one).
{
"success": true,
"ship": {
"ship_id": 984,
"name": "Symphony of the Seas",
"cruise_line": "Royal Caribbean International",
"cruise_line_slug": "royal-caribbean",
"ship_slug": "symphony-of-the-seas",
"year_built": "2018",
"year_refurbished": null,
"gross_tonnage": null,
"passenger_capacity": 5518,
"total_crew": 2200,
"length_meters": null,
"decks": null,
"professional_overall_rating": 4.50,
"member_overall_rating": 3.78,
"total_member_reviews": 463,
"rating_breakdown": {
"Cabins": null,
"Dining": null,
"Entertainment": null,
"Public Rooms": null,
"Fitness & Recreation": null,
"Family": null,
"Shore Excursion": null,
"Embarkation": null,
"Service": null,
"Value for Money": null
},
"url": "https://www.cruisecritic.com/cruise/royal-caribbean/symphony-of-the-seas/reviews"
},
"filters_applied": {
"min_rating": null,
"traveler_type": null,
"sailed_within": null,
"sailed_date_range": null,
"destination": "eastern-caribbean",
"cabin_type": null,
"sort": "Most Helpful",
"language": "en"
},
"total_results_matching_filters": 187,
"reviews_returned": 2,
"reviews": [
{
"review_id": 727851,
"review_url": "https://www.cruisecritic.com/cruise/royal-caribbean/symphony-of-the-seas/reviews/727851",
"reviewer_username": "steveknj",
"reviewer_traveler_type": "Couple",
"reviewer_experience_bucket": 3,
"sailed_date": "2025-04-30",
"cruise_length_nights": null,
"destination": {
"slug": "eastern-caribbean",
"label": "the Eastern Caribbean"
},
"cabin_type_booked": null,
"overall_rating": 5,
"sub_ratings": {
"Cabins": 4,
"Dining": 5,
"Entertainment": 4,
"Public Rooms": 5,
"Family": 5,
"Embarkation": 5,
"Service": 5,
"Value for Money": 5
},
"title": "Symphony of the Seas - 4/30/2025",
"body_text": "I wanted to preface this to say that this is NOT an extensive review… (full ~9000-character body)",
"pros": null,
"cons": null,
"tip_for_future_cruisers": null,
"helpful_vote_count": 2,
"images": [],
"cruise_line_response": null
},
{
"review_id": 738724,
"review_url": "https://www.cruisecritic.com/cruise/royal-caribbean/symphony-of-the-seas/reviews/738724",
"reviewer_username": "anonymous",
"reviewer_traveler_type": "Couple",
"reviewer_experience_bucket": 2,
"sailed_date": "2026-03-01",
"cruise_length_nights": null,
"destination": {
"slug": "eastern-caribbean",
"label": "the Eastern Caribbean"
},
"cabin_type_booked": null,
"overall_rating": 1,
"sub_ratings": {
"Cabins": 1,
"Dining": 1,
"Embarkation": 4,
"Entertainment": 3,
"Fitness & Recreation": 2,
"Public Rooms": 2,
"Service": 4,
"Value for Money": 1
},
"title": "Symphony is overcrowded, and kids gone wild",
"body_text": "…(full body)",
"pros": null,
"cons": null,
"tip_for_future_cruisers": null,
"helpful_vote_count": 0,
"images": [],
"cruise_line_response": {
"by": "Royal Caribbean Guest Services",
"comment": "We're sorry to hear about your experience…"
}
}
],
"evidence": {
"list_url_loaded": "https://www.cruisecritic.com/cruise/royal-caribbean/symphony-of-the-seas/reviews/destination/eastern-caribbean",
"build_id": "build-TfctsWXpff2fKS",
"session_id": "<browserbase-session-id>",
"fetched_at": "2026-05-18T18:35:00Z"
}
}
Error shapes:
// Ship not found on Cruise Critic
{ "success": false, "reason": "ship_not_found", "queried": { "line": "...", "ship": "..." } }
// DataDome blocked the session even with --proxies --verified (rare on first attempt;
// occurs on hot/burned residential exit IPs — retry with a new session)
{ "success": false, "reason": "anti_bot_block", "evidence": "datadome_captcha_html" }
// Filter window produces zero reviews; ship + total still reported
{ "success": true, "total_results_matching_filters": 0, "reviews": [], "ship": { ... }, "filters_applied": { ... } }
gross_tonnage, length_meters, decks, and year_refurbished are present in the ship's "Specifications" sidebar on the main /cruise/{line}/{ship} overview page (not the /reviews subpage). If the caller requires them, follow the redirect from /cruise/{line}/{ship} (the overview page) inside the same session — the overview page hydrates a richer ShipAttributes record. Omit (set null) if not required to keep the skill cheap.