RecipeBee Recipe Discovery & Extraction
Purpose
Discover and extract structured recipe data from RecipeBee — the public catalog at recipebee.app. Given either a natural-language query (e.g. "chicken stir-fry", "vegan breakfast", "30-minute dinner") or a direct recipe URL, return the full schema.org/Recipe payload: name, description, hero image, author, ingredients with quantities, numbered cooking steps, prep/cook/total times, yield, recipe category, cuisine, keywords, and nutrition metadata. Also supports topic-based browsing via category and tag indexes, and bulk discovery via sitemap.xml. Read-only.
Out of scope (login-gated): RecipeBee's AI recipe generation, meal planning, shopping lists, and personal cookbooks live under /auth/, /meal-plans/, /shopping-lists/, and /dashboard/ — all Disallow'd in robots.txt and require an authenticated session. The iOS app drives those features; the public web surface is discovery + extraction only. Do not attempt to scrape or trigger those — they will redirect to /login.
When to Use
- Importing a single recipe from a known
recipebee.app/recipes/{slug}URL into a downstream meal-planner, grocery-list builder, or recipe-card store. - Topic-driven discovery: "give me three high-protein chicken recipes under 30 minutes", "find me Indian comfort food", "vegan breakfast ideas". Resolve the topic to a
/tags/{slug}or/categories/{slug}index page, then extract each recipe. - Bulk catalog mirroring (e.g. building a search index over RecipeBee's full corpus). Use
sitemap.xmlas the authoritative listing. - Powering an LLM-side meal-plan or shopping-list synthesizer with verified structured recipes as input. The AI synthesis itself happens in the caller's context; this skill only fetches and structures the source recipes.
Workflow
RecipeBee is a Next.js App Router site (RSC). Every /recipes/{slug} page server-side-renders a complete schema.org/Recipe JSON-LD block, plus HowTo and FAQPage blocks — lead with HTTP fetch + JSON-LD parsing for extraction. Browser sessions are only needed to hydrate the /browse index (which renders client-side). No anti-bot, no auth required for public pages, no proxies needed. The site explicitly allows GPTBot, ChatGPT-User, Claude-Web, and PerplexityBot in robots.txt for the discovery surfaces below.
1. Resolve the query to one or more recipe URLs
Pick the discovery surface based on the input shape:
| Input | Surface | Method |
|---|---|---|
Direct URL recipebee.app/recipes/{slug} | n/a — skip to step 2 | — |
| Topic / dietary preference matching a known tag | /tags/{slug} | HTTP fetch (partial SSR — see gotcha) |
| Topic matching a known category | /categories/{slug} | HTTP fetch (partial SSR) |
| Broad query / "anything" / "popular recipes" | /browse | Browser required (fully client-rendered) |
| Bulk mirror — all recipes | /sitemap.xml | HTTP fetch — complete listing |
| Natural-language free-text search | ⚠️ broken — see gotcha | Use sitemap + client-side fuzzy match instead |
Canonical tag/category enums (from sitemap.xml 2026-05-19):
- Categories:
breakfast,dinner,dessert,salads,side-dishes,drinks,coffee,30-minute-meals,one-pot-meals,meal-prep,quick-and-easy,budget-friendly,comfort-food,clean-eating,kids-friendly,baking,vegetarian,vegan,gluten-free,low-carb,high-protein,seed-oil-free,asian-cuisine,italian-cuisine,mediterranean,russian-cuisine,indian-cuisine,middle-eastern,chicken,beef,weird. - Tags:
comfort-food,indian,avocado,basil,beef,bell-peppers,broccoli,chicken,creamy,cucumber,customizable,egg,fish,fruity,no-bake,potato,refreshing,salmon,sauce,spiced,stir-fry,sweet,tomato,warming,weird,breakfast,lunch,dinner,snack,dessert,quick.
For a natural-language query, map it to the closest tag or category from these enums (this is the LLM-side intent step). If multiple terms apply, hit each surface and dedupe by recipe slug. Example: "high-protein chicken stir-fry" → fetch /categories/high-protein, /categories/chicken, /tags/stir-fry; intersect the recipe slugs.
Tag/category extraction (HTTP path):
curl -s "https://recipebee.app/tags/chicken" \
| grep -oE 'href="/recipes/[a-z0-9-]+"' \
| sed 's/href="//;s/"$//' \
| sort -u
# 6 SSR'd anchors for /tags/chicken as of 2026-05-19
Or via the browse cloud fetch envelope (same payload, easier to parse with node):
browse cloud fetch "https://recipebee.app/tags/chicken" \
| node -e "let s='';process.stdin.on('data',c=>s+=c).on('end',()=>{
const j=JSON.parse(s);
const links=[...new Set([...j.content.matchAll(/href=\"(\\/recipes\\/[a-z0-9-]+)\"/g)].map(m=>m[1]))];
console.log(JSON.stringify(links));
})"
/browse extraction (browser path — only when no tag/category fits):
sid=$(browse cloud sessions create --keep-alive | node -e "let s='';process.stdin.on('data',c=>s+=c).on('end',()=>process.stdout.write(JSON.parse(s).id))")
export BROWSE_SESSION="$sid"
browse open --remote "https://recipebee.app/browse"
sleep 3 # wait for hydration — /browse renders 0 anchors in initial HTML, ~20 after hydration
browse get html body --remote \
| node -e "let s='';process.stdin.on('data',c=>s+=c).on('end',()=>{
const j=JSON.parse(s);
const links=[...new Set([...j.html.matchAll(/href=\"(\\/recipes\\/[a-z0-9-]+)\"/g)].map(m=>m[1]))];
console.log(JSON.stringify(links));
})"
browse cloud sessions update "$sid" --status REQUEST_RELEASE
sitemap.xml extraction (bulk discovery — fastest, returns the full corpus):
browse cloud fetch "https://recipebee.app/sitemap.xml" \
| node -e "let s='';process.stdin.on('data',c=>s+=c).on('end',()=>{
const j=JSON.parse(s);
const slugs=[...j.content.matchAll(/<loc>https:\\/\\/recipebee\\.app\\/recipes\\/([a-z0-9-]+)<\\/loc>/g)].map(m=>m[1]);
console.log(slugs.length, 'recipes');
console.log(slugs);
})"
2. Extract the recipe via JSON-LD
browse cloud fetch "https://recipebee.app/recipes/{slug}" \
| node -e "let s='';process.stdin.on('data',c=>s+=c).on('end',()=>{
const j=JSON.parse(s);
const blocks=[...j.content.matchAll(/<script[^>]*type=\"application\\/ld\\+json\"[^>]*>([\\s\\S]*?)<\\/script>/g)];
for (const b of blocks) {
try {
const o=JSON.parse(b[1]);
if (o['@type']==='Recipe') { console.log(JSON.stringify(o, null, 2)); return; }
} catch(e){}
}
console.error('no Recipe JSON-LD found');
process.exit(1);
})"
The page emits ~9 JSON-LD blocks (Organization × 2, WebSite × 2, BreadcrumbList × 2, Recipe, FAQPage, HowTo). The Recipe block is canonical; ignore the duplicate HowTo block (it carries the same instructions in a different schema for Google rich-snippet compatibility).
3. Normalize the output
Convert ISO-8601 durations (PT45M, PT1H20M) to integer minutes; split the comma-separated keywords string into an array; coerce recipeYield to { value, unit } (e.g. "12 servings" → { value: 12, unit: "servings" }). See the Expected Output section below for the canonical shape.
4. (Optional) Enrich with FAQ + HowTo blocks
The same page also exposes FAQPage (auto-generated Q&A about prep time, servings, and ingredients) and HowTo (re-rendering of recipeInstructions with an estimatedCost field and a supply[] summary). Extract these if your downstream wants user-facing FAQ snippets or a budget hint.
Site-Specific Gotchas
- The in-page search backend is currently broken.
/search?q=<query>and the on-page search form both return"Failed to load search results. Please try again later."(verified 2026-05-19 withq=chicken— no recipes returned, even though/tags/chickenlists six chicken recipes and the sitemap lists more). The page loads, populates the input from?q=, then fails on the XHR. Do not depend on/searchfor discovery — fall back to sitemap + tag/category filtering. The breakage is server-side, not anti-bot — a residential proxy will not fix it. /browseand/searchare fully client-rendered. The initial HTML for these two routes contains zero/recipes/{slug}anchors. They only populate after React hydration runs. HTTP-fetch discovery from these surfaces will return an empty list. Use a browser session (browse open+ 2–3s wait), or skip them in favor of/tags/{slug},/categories/{slug}, orsitemap.xml, which are server-rendered.- Tag/category pages are partial-SSR.
/tags/{slug}and/categories/{slug}server-render the first ~6 recipes above the fold but load the rest after hydration. For complete topic coverage, either (a) open in a browser and scroll, or (b) cross-reference againstsitemap.xml(which lists all published recipes regardless of tag). - JSON-LD has duplicate
Organization/WebSite/BreadcrumbListblocks. Don't be alarmed byblocks.length === 9on a single recipe page — only one block matches@type: 'Recipe'. Filter on@typeinstead of array position. recipeInstructionsshape isHowToStep[], not strings. Each step is an object{ '@type': 'HowToStep', position: N, text: '...', name: 'Step N' }. Map tostep.textfor human-readable instructions. The legacy "string array" form ofrecipeInstructions(used by some other recipe sites) does not appear on RecipeBee.recipeIngredientlines are pre-formatted free text, not parsed. Each entry looks like"3 cup all-purpose flour"or"2 1/4 teaspoon active dry yeast". There's no separatequantity/unit/namedecomposition. If the downstream needs a shopping-list aggregation, run an LLM or a recipe-parser library (e.g.ingreedy,recipe-scrapers) on these strings.- Many recipes have sparse metadata. User-submitted recipes (e.g.
/recipes/fried-rice) often omitcookTime,recipeCategory,recipeCuisine, and have a one-wordkeywords. Editorial recipes (e.g./recipes/butter-chicken-stuffed-buns-soft-fluffy) carry the full set. Always defensive-parse: treat every field exceptname,recipeIngredient,recipeInstructionsas optional.nutritionis always present but minimally — most recipes only carryservingSize, not calorie/macro counts. keywordsis a comma-separated string, not an array. Split on,and trim. A typical value:"chicken, Indian, Snack, Comfort Food, spiced, Lunch, Dinner". These overlap with bothtagsandcategoriesbut are not a strict subset — use them as a third hint signal.- Time fields are ISO-8601 durations.
prepTime: "PT45M",cookTime: "PT20M",totalTime: "PT1H5M". Parse with a small regex (/PT(?:(\d+)H)?(?:(\d+)M)?/) —Duration.fromISOfromluxonalso works if the caller has it. - Image URLs come from
images.recipebee.appCDN. Some are user-uploaded (/users/{uuid}/recipes/{uuid}/...), some are AI-generated (/recipes/{uuid}/ai-generated/...). Both are publicly hot-linkable. Theimagefield can be a single string or a single-element array — normalize toimageUrl = Array.isArray(image) ? image[0] : image. - No
/api/is reachable.robots.txtDisallows it for all bots, and the endpoint returns nothing useful from an unauthenticated session. Don't waste time probing for an undocumented JSON API — the JSON-LD path IS the API. - AI meal-plan / shopping-list / recipe-generation features require an account. Reachable only via the iOS app or after
/login(which the agent has no credentials for). Do not attempt to drive/dashboard,/meal-plans,/shopping-lists,/settings,/verified, or/auth/*— they will 302 to/login. The skill's job is to surface source recipes; downstream AI synthesis (meal plans, shopping lists, recommendations tailored to dietary preferences) is the caller's responsibility, working from the extracted recipes. - The iOS app's "import from website / social media" flow is not exposed on the web. RecipeBee's marketing copy mentions importing recipes from external sites and TikTok-style social videos — that capability lives in the iOS client and the private backend. There is no public
/importendpoint. If the caller needs to import a recipe from a third-party site, they should use the agent's generalschema.org/RecipeJSON-LD extraction skill directly on the source URL (most major recipe sites publish the same schema for Google rich snippets). - No anti-bot, no rate-limit observed (Next.js + nginx, ~50ms p50 for
cloud fetch). A bare cloud session (no--verified, no--proxies) handles every public surface tested. Keep request volume sane (≤ 2 req/s) as a courtesy. - Build-id-tagged Next.js data endpoints (
/_next/data/{buildId}/...json) are not exposed. The app uses RSC, not getStaticProps — there's no JSON sidecar to short-circuit to. The JSON-LD inlined in the HTML is the cheapest structured source.
Expected Output
{
"url": "https://recipebee.app/recipes/butter-chicken-stuffed-buns-soft-fluffy",
"slug": "butter-chicken-stuffed-buns-soft-fluffy",
"name": "Butter Chicken Stuffed Buns (Soft & Fluffy)",
"description": "Soft, fluffy buns stuffed with creamy butter chicken filling.",
"imageUrl": "https://images.recipebee.app/users/61e02866-.../gallery/11FEEF66-...jpeg",
"author": { "name": "RecipeBee", "url": "https://recipebee.app" },
"datePublished": "2026-05-10T01:36:33.000Z",
"recipeCategory": "Baking",
"recipeCuisine": "Indian",
"keywords": ["chicken", "Indian", "Snack", "Comfort Food", "spiced", "Lunch", "Dinner"],
"yield": { "value": 12, "unit": "servings" },
"times": {
"prepMinutes": 45,
"cookMinutes": 20,
"totalMinutes": 65
},
"ingredients": [
"3 cup all-purpose flour",
"2 1/4 teaspoon active dry yeast",
"2 tablespoon granulated sugar",
"1 teaspoon salt",
"1 cup warm milk (110°F/45°C)"
],
"steps": [
{ "position": 1, "text": "In a small bowl, combine warm milk, sugar, and yeast. Stir gently and let rest for 5-10 minutes until foamy." },
{ "position": 2, "text": "..." }
],
"nutrition": { "servingSize": "1 serving (makes 12)" },
"faq": [
{ "question": "How long does it take to make ...?", "answer": "..." }
],
"source": {
"site": "recipebee.app",
"extractedFrom": "jsonld",
"method": "http-fetch"
}
}
Discovery-mode output (when the input is a query, not a URL — return a list before extracting):
{
"query": "high-protein chicken stir-fry",
"resolved": {
"categories": ["high-protein", "chicken"],
"tags": ["chicken", "stir-fry"]
},
"candidates": [
{ "slug": "spicy-seed-oil-free-beef-and-broccoli-stir-fry", "url": "https://recipebee.app/recipes/spicy-seed-oil-free-beef-and-broccoli-stir-fry", "matchedOn": ["stir-fry"] },
{ "slug": "diabetic-friendly-chicken-and-bell-pepper-stir-fry", "url": "https://recipebee.app/recipes/diabetic-friendly-chicken-and-bell-pepper-stir-fry", "matchedOn": ["chicken", "stir-fry"] }
],
"fetched": [ /* full extracted Recipe objects from the top N candidates */ ]
}
Empty / failure shapes:
// Query resolves to a tag/category that has no recipes (rare — these are pre-curated enums)
{ "query": "...", "candidates": [], "reason": "no_recipes_in_topic" }
// Direct URL 404s (recipe was unpublished or slug typo)
{ "url": "...", "error": "not_found", "statusCode": 404 }
// Recipe page loaded but JSON-LD Recipe block missing (should not happen on /recipes/ — flag as anomaly)
{ "url": "...", "error": "no_recipe_jsonld", "statusCode": 200, "hint": "page may not be a recipe detail page" }
// Search route invoked — currently broken (see gotcha)
{ "query": "...", "error": "search_backend_unavailable", "fallback": "use sitemap.xml + tag/category filters instead" }