Saizeriya Singapore Menu Data — Use the Website Like an API
Purpose
Pull Saizeriya Singapore's current menu (Grand, Lunch, Kids) and outlet directory as structured data, without scripted browsing. The site is a thin static-HTML shell whose only machine-readable surface is three versioned PDF files plus one static restaurant-search page — the optimal "API" is a GET against deterministic URLs whose filenames the HTML index publishes. Read-only.
When to Use
- Daily / weekly snapshots of the Saizeriya SG menu to detect new items, price changes, or seasonal swaps.
- Bulk extraction of the 44 outlet directory (name, address, phone) for store-locator features.
- Anywhere you'd otherwise scrape
saizeriya.com.sgHTML — there is no JS-rendered data path, so cheap HTTPGETs beat any browser-driven approach.
Workflow
The site is a flat, static Apache site. There is no JSON / GraphQL / XHR endpoint — every probe to /api/*, /sitemap.xml, /robots.txt, /menu.json, /data/menu.json returns 404 (verified 2026-05-19). The "API" is two GETs:
- GET
/menu/— HTML index that surfaces the current three PDF filenames in<a href="/pdf/...pdf">tags. Filenames are versioned by date (see Site-Specific Gotchas) so they rotate when the menu refreshes; always parse the HTML index first rather than hard-coding filenames. - GET
/pdf/{filename}.pdf— the canonical machine-readable menu. Three variants link from/menu/:- Grand Menu —
/pdf/GrandMenu{YYYYMM}S_single.pdf(current:GrandMenu202603S_single.pdf, 8.84 MB, last-modified 2026-03-23). - Lunch Menu —
/pdf/lunch_{YYYYMM}.pdf(current:lunch_202511.pdf, ~1.0 MB raw / ~750 KB decoded). - Kids Menu —
/pdf/kids{YYYYMM}S.pdf(current:kids202603S.pdf, ~480 KB raw / ~360 KB decoded).
- Grand Menu —
A residential proxy is not required — bare HTTPS GETs return 200 OK on every endpoint. The site has no anti-bot, no cookies, no auth, no rate-limit headers. Browser-driven scraping pays a ~50× cost premium and surfaces zero data the static fetch doesn't.
Step-by-step (API path)
-
Discover current PDF filenames —
GET https://www.saizeriya.com.sg/menu/, parse<a href="/pdf/(...)\.pdf">to get the three current filenames. The<h3>next to each anchor tags itGRAND MENU/LUNCH MENU/KIDS MENU. The HTML response is small (~11 KB), text/html, no JS rendering required. -
Fetch each PDF:
GET https://www.saizeriya.com.sg/pdf/{filename}.pdfResponse is
application/pdfwithAccept-Ranges: bytesandLast-Modified(e.g.Mon, 23 Mar 2026 04:14:02 GMTfor the Grand Menu). UseIf-Modified-Since: {Last-Modified}for cheap freshness polling — the menu refreshes on a multi-month cadence (2025-11 lunch, 2026-03 grand+kids).The Grand Menu PDF is 8.84 MB. Two practical retrieval paths if your fetch transport caps response size (e.g. Browserbase Fetch API's 1 MB cap returns
502 The response body exceeded the maximum allowed size of 1MB):- HTTP Range — send
Range: bytes=0-1048575, then iterateRange: bytes={n}-{n+1048575}and concatenate. Server returns206 Partial ContentwithContent-Range. Verified 2026-05-19 via in-browserfetch()against the live PDF. - In-browser
fetch()— open any same-origin page (/menu/works), runawait fetch('/pdf/GrandMenu202603S_single.pdf').then(r => r.arrayBuffer()), then exfiltrate the bytes (e.g. base64 over CDP or a data: URL). Bypasses the transport cap because the response is consumed inside the browser process.
- HTTP Range — send
-
Extract text from each PDF. Recommended: Node +
pdf-parse(new PDFParse({ data: buf }).getText()); Python +pypdf/pdfplumberworks equally well. Each PDF is single-page with menu items in English + Simplified Chinese plus prices in the formS$X.XX nett. Sample (Kids Menu):Chicken Wing 5pcs S$4.90 nett 鸡翅5只 Double Potato S$3.90 nett 双份薯角 Corn Cream Soup S$2.90 nett 玉米奶油浓汤 Kid's Meal S$5.90 nett 儿童套餐 Italian Pudding S$3.90 nett 意式奶冻 Oreo Cheese Cake S$3.90 nett 奥利奥芝士蛋糕 Free Flow Drink for Kids (Age 4-12) S$1.50 nett per paxItem structure repeats:
{English Name} S${price} nett {Chinese Name}. The header line (date stamp like2026.03) and footer disclaimer (Presentation of food may differ...) are predictable boilerplate — strip them before parsing. -
(Optional) Outlet directory —
GET https://www.saizeriya.com.sg/search/returns ~60 KB of static HTML with 44 outlets. Each outlet is a<div class="bubbleInfo">containing:<div class="popup_m pop2-{slug}"><h6>{Outlet Name}</h6></div> <div class="popup pop1-{slug}"><p> <span class="header01">{Outlet Name}</span><br> Address:<br> {Street address}, Singapore {postal code}</br> Tel: {phone}<br> Fax: {phone}<br> </p></div>Outlets are grouped by region anchor (
#central,#east,#north,#northEast,#west). A short slug (lcsc,csm,nex, etc.) appears in bothpop1-*andpop2-*class names — usable as a stable outlet ID.
Browser fallback
Only useful when (a) your transport blocks PDF downloads entirely or (b) you specifically need the rendered visual layout. Open https://www.saizeriya.com.sg/menu/ in any Chromium session, snapshot the three PDF anchor refs, click into each one. The browser's built-in PDF viewer renders the menu inline. There is no benefit over GET + pdf-parse for data extraction — the Chromium PDF viewer is non-introspectable from the snapshot tree (PDF.js renders into a <canvas>/embed that yields no a11y refs), so you cannot extract menu text from the browser DOM. Use the browser only to download the bytes (via in-page fetch()) when your HTTP transport has a body-size cap.
Site-Specific Gotchas
- Filenames are date-stamped and rotate — the current pattern is
{type}{YYYYMM}[S]_[suffix].pdf. Observed today: GrandGrandMenu202603S_single.pdf(2026-03), Lunchlunch_202511.pdf(2025-11), Kidskids202603S.pdf(2026-03). TheSsuffix appears on Grand + Kids but not Lunch — assume it's a Singapore-region tag, not a guaranteed pattern. Always parse/menu/HTML for the current filenames; never hard-code. Hard-coded URLs will silently 404 the next time the marketing team refreshes the menu. - Grand Menu exceeds 1 MB (currently 8.84 MB). Transports with a small response cap (notably the Browserbase Fetch API, capped at 1 MB and returning
502 The response body exceeded the maximum allowed size of 1MB) cannot pull it in one shot. Apache servesAccept-Ranges: bytesso useRange:requests, or pull the bytes inside a browser session via in-pagefetch(). Lunch and Kids PDFs fit under 1 MB and can use any transport. - No sitemap, no robots.txt, no JSON endpoints. Confirmed 404 on
/robots.txt,/sitemap.xml,/sitemap_index.xml,/api/menu,/menu.json,/data/menu.json(2026-05-19). Do not waste turns probing for a JSON API — it does not exist. - No anti-bot / no auth. Plain HTTPS
GETreturns 200 OK on every endpoint.--proxies/--verifiedflags on the Browserbase session are unnecessary cost for this site; bare fetch is fine. htmldeclareslang="ja"despite serving English — the site was forked from Saizeriya Japan and thelangattribute was never updated. Don't rely onlangto detect locale; trust the.sgdomain instead.- Multiple GA/GTM tags — pages embed three Google Analytics IDs (
UA-65535147-1,UA-134913146-1,UA-140695686-1) but no data of interest. They do not gate content and can be ignored. - Currency / GST — every menu page footer states "All prices are nett (inclusive of GST, No service charge)" and the homepage repeats
GST Inclusive & No service charge. Treat all extracted prices as final consumer-paid SGD; no separate tax math needed. - Operating hours are global, not per-outlet. Footer states
11:00 am – 10:00 pm (Last Order 09:30 pm). Individual outlets may close earlier (esp. CNY) — confirm with the outlet via theTel:in the/search/block before relying on these hours. - PDFs are flat single pages. Each of the three is
pages=1perpdf-parse. Don't paginate — iterate items via regex on the extracted text (S\$\d+\.\d{2}\s*nettis a reliable price anchor). - Chinese translations are co-located. Items in the PDF text stream alternate English line → price line → Chinese line. When the layout uses centered/spaced glyphs (e.g.
KID'S ME NU), tabs and stray whitespace appear mid-word — normalize withs/\s+/ /gbefore keyword matching. - Apache
Content-Security-Policy: upgrade-insecure-requestsis the only security header; no HSTS, no CORS preflight. Cross-originfetch()fromhttps://www.saizeriya.com.sg/is unrestricted to its own origin, which is what the in-browser large-PDF retrieval trick relies on.
Expected Output
{
"fetched_at": "2026-05-19T00:15:08Z",
"menu_index": {
"source_url": "https://www.saizeriya.com.sg/menu/",
"grand_menu_url": "https://www.saizeriya.com.sg/pdf/GrandMenu202603S_single.pdf",
"lunch_menu_url": "https://www.saizeriya.com.sg/pdf/lunch_202511.pdf",
"kids_menu_url": "https://www.saizeriya.com.sg/pdf/kids202603S.pdf",
"grand_menu_version": "2026.03",
"lunch_menu_version": "2025.12",
"kids_menu_version": "2026.03",
"grand_menu_last_modified": "2026-03-23T04:14:02Z",
"grand_menu_size_bytes": 8843067
},
"items": [
{
"menu": "kids",
"name_en": "Chicken Wing 5pcs",
"name_zh": "鸡翅5只",
"price_sgd": 4.90,
"price_nett": true,
"notes": null
},
{
"menu": "kids",
"name_en": "Free Flow Drink for Kids",
"name_zh": null,
"price_sgd": 1.50,
"price_nett": true,
"notes": "Age 4-12 years old only; per pax"
},
{
"menu": "lunch",
"name_en": "Teriyaki Chicken Lunch",
"name_zh": "照烧酱鸡排套餐",
"price_sgd": 9.00,
"price_nett": true,
"notes": "Mon-Fri 11:00am-5:00pm, excl. PH; includes free-flow hot & cold beverage"
}
],
"outlets": [
{
"id": "lcsc",
"name": "Liang Court SC",
"region": "central",
"address": "177 River Valley Road, #02-22 Liang Court Shopping Centre, Singapore 179030",
"tel": "6970 2588",
"fax": "6970 2589"
},
{
"id": "csm",
"name": "City Square Mall",
"region": "central",
"address": "180 Kitchener Road, #B2-55/56 City Square Mall, Singapore 208539",
"tel": null,
"fax": null
}
],
"outlet_count": 44,
"hours_global": "11:00 am – 10:00 pm (Last Order 09:30 pm)",
"currency": "SGD",
"tax_note": "All prices nett (inclusive of GST, no service charge)"
}