The Windows Club Article Search
Purpose
Return a list of TheWindowsClub articles matching a keyword query — each with title, canonical URL, publish date (local + GMT), HTML excerpt, author id, category ids, and tag ids. Optionally scope the search to a category (broad bucket: Windows / General / Office / Downloads / Security) or a tag (any of 399 topical labels: Outlook, Excel, Chrome, Edge, Errors, Troubleshoot, Windows Updates, etc.). Read-only; never posts, comments, or interacts with login-gated routes.
When to Use
- Daily / hourly monitoring of new TheWindowsClub posts on a Windows topic (BSOD, Windows Update errors, registry tweaks, Edge/Chrome issues, Office 365 problems).
- Hydrating known article IDs into full title/excerpt/link records for downstream summarisation.
- Bulk extraction of every article in a category or tag (e.g. all 715 Outlook-tagged posts, all 848 Security category posts) for offline indexing.
- Anywhere you would otherwise scrape TheWindowsClub HTML — the WP REST API is faster, returns structured fields, and is Cloudflare-cached.
Workflow
TheWindowsClub is a standard WordPress site with its public REST API exposed at https://www.thewindowsclub.com/wp-json/wp/v2/... — no auth, no cookies, no anti-bot challenge, no --verified requirement. Cloudflare fronts the origin and caches responses (Cf-Cache-Status: HIT on repeat queries; max-age=691200 ≈ 8 days). Residential proxies are not required for the API (browse cloud fetch with default egress returns 200 OK), but most browser-sandbox environments have outbound HTTP firewalled, so route every request through browse cloud fetch or a Browserbase session. Lead with the API path; the browser path costs ~10-30× more turns per result and truncates excerpts.
-
Build the search URL. The primary endpoint is
/wp-json/wp/v2/posts. Keep the response small with_fields=and tune sort:GET https://www.thewindowsclub.com/wp-json/wp/v2/posts ?search={url-encoded query} &per_page={1..100} # WP cap is 100; default is 10 &page={N} # 1-indexed &orderby={date|relevance|modified|title|id} &order={desc|asc} # default desc &categories={id} # optional broad-bucket filter &tags={id} # optional topical-tag filter &_fields=id,date,date_gmt,modified,slug,link,title,excerpt,author,categories,tagsDefault sort is
orderby=date&order=desc(newest first).orderby=relevanceis only honoured whensearch=is also supplied and produces materially different (better-matched) results — e.g.search=fix+windows+updatewith default sort returns the latest "Windows Update" article (any topic); withorderby=relevanceit returns "Fix Windows Update error 0x80070BC9" at rank 1. -
Read the response totals from headers, not the body — WP returns the items array only:
X-Wp-Total— total matching posts (e.g.728forsearch=fix blue screen,6979forsearch=outlook).X-Wp-Totalpages— total pages at the currentper_page(e.g.73atper_page=10,8atper_page=100).Link: <...page=N+1>; rel="next", <...page=N-1>; rel="prev"— RFC 5988 pagination links.
-
Decode each post. Every item in the JSON array has WordPress's standard shape; the fields you need:
id— stable WP post id (e.g.107739). Use for single-post hydration viaGET /wp-json/wp/v2/posts/{id}.date— local publish time ("2026-05-05T03:29:00", no timezone suffix — the site's TZ is IST, UTC+05:30).date_gmt— UTC publish time ("2026-05-04T21:59:00"). Prefer this for sorting / "since" filters —dateis timezone-bare.modified/modified_gmt— last edit timestamps (articles are routinely updated;modified > dateis normal and not a republish).slug— URL slug ("logi-options-lets-you-control-and-personalize-logitech-devices").link— canonical article URL ("https://www.thewindowsclub.com/{slug}"). No date in path — flat slug-only URL pattern.title.rendered— HTML-entity-encoded title ("This calendar can’t be shared..."). Decode HTML entities before display.excerpt.rendered— opening-paragraph HTML, wrapped in<p>...</p>, occasionally truncated mid-word followed by[…]or similar. Strip tags + decode entities for plain-text.author— numeric WP user id. Hydrate viaGET /wp-json/wp/v2/users/{id}if you need the display name.categories— array of category ids. TheWindowsClub uses only 5 top-level categories:569Windows (11955 posts),186General (5520),130Office (2808),8Downloads (2750),6Security (848). Most posts have exactly one.tags— array of tag ids. 399 tags total — this is the meaningful topical taxonomy. Top tags:11Games,14Freeware,73Troubleshoot,753Errors,424Outlook,435Excel,174Chrome,1176Edge,4Features,150Windows Updates.
-
Construct human-readable category/tag names (optional, recommended for output). The full taxonomy fits in one request each:
GET /wp-json/wp/v2/categories?per_page=100&_fields=id,name,slug,count GET /wp-json/wp/v2/tags?per_page=100&orderby=count&order=desc&_fields=id,name,slug,countCategories endpoint returns only 5 items total. Tags endpoint paginates (
X-Wp-Totalpages: 4atper_page=100). Cache locally — the taxonomy changes rarely. -
Paginate. Increment
page=until you have enough results or reachX-Wp-Totalpages. WP returns HTTP400(rest_post_invalid_page_number) if you exceed total pages; stop one short. -
Sub-100 batches for unbounded crawls. WP caps
per_pageat100. For large result sets (e.g. all 6,979 "outlook" matches), iteratepage=1..70atper_page=100. Throttle to ~1 req/s — Cloudflare caches GETs so repeats are nearly free, but bursts on uncached queries can trip rate-limit middleware.
Lightweight alternative — the search endpoint
/wp-json/wp/v2/search returns the same set with only id, title, url, type, subtype per item (~10× smaller payload). It also includes WP pages (subtype=page), not just posts — pass subtype=post to filter. Use when you only need title + link and don't care about date/excerpt:
GET /wp-json/wp/v2/search?search={q}&subtype=post&per_page=100&page={N}
Browser fallback
If the JSON API ever returns a Cloudflare interstitial or /wp-json/ is disabled, fall back to the site's built-in search at https://www.thewindowsclub.com/?s={url-encoded query}. The page is server-rendered (snapshot returns refs; no need to wait for JS). Article cards live under repeating blocks; for each block extract:
- URL:
<h2 class="entry-title"><a href="(https://www\.thewindowsclub\.com/[^"]+)" - Title: text content of the same
<a>(HTML-entity decoded) - Date:
<time[^>]+datetime="([^"]+)"(ISO 8601, IST) - Excerpt:
<div class="entry-summary">\s*<p>([^<]+)</p>(truncated to ~30 words by the theme — shorter than the API's excerpt) - Author:
<a rel="author"[^>]*>([^<]+)</a>
Pagination at the bottom: /page/{N}/?s={q} — same ?s= query carried forward. Capture a browse get html body per page and run the above regex set; do not use browse snapshot + click to enumerate (~3 turns per card vs. one fetch for the whole page). A Browserbase session with --verified --proxies is recommended for the browser path because Cloudflare's bot challenge can fire on bare egress.
Site-Specific Gotchas
- Cloudflare caches API GETs aggressively (
max-age=691200≈ 8 days,Cf-Cache-Status: HITon repeats). Identical queries return identical bytes — a freshly published article may not appear insearch=results for several hours after publish if a popular query is sitting on a cached miss. For monitoring, useorderby=date&search=on each poll and de-dupe byidclient-side; do not rely onX-Wp-Totalchanging in real-time. datefield is timezone-naive (IST = UTC+05:30) —"2026-05-05T03:29:00"is IST, not UTC. For absolute timestamps, usedate_gmtwhich is correctly suffixed (also lacksZbut is GMT by name). Same applies tomodifiedvsmodified_gmt.- Title and excerpt are HTML-encoded. Smart quotes appear as
’, ampersands as&, etc. Always decode HTML entities before display.excerpt.renderedis wrapped in<p>...</p>— strip tags first. - Only 5 top-level categories — topical filtering lives in tags.
categories=569(Windows) covers ~12k posts and isn't a useful narrowing filter. Usetags={tag-id}(e.g.tags=424for Outlook → 283 results when combined withsearch=error) for meaningful scope. Fetch the full tag list once and cache locally. orderby=relevanceis silently ignored withoutsearch=— you'll get date-desc results. Always pairorderby=relevancewith a non-emptysearchquery.searchdoes fuzzy multi-token AND-matching —search=fix+windows+updatematches posts containing all three tokens anywhere in title/content/excerpt. There is no quoted-phrase operator;search="fix windows update"is treated the same as the unquoted version. For exact-phrase matching, post-filter the JSON bytitle.rendered.toLowerCase().includes(phrase).per_pageis hard-capped at 100. Requestingper_page=200silently caps to 100 (no error). Total result count comes from headers (X-Wp-Total), not from counting items.- Page-overflow returns HTTP 400, not 200 with empty array. Requesting
page=N+1pastX-Wp-Totalpagesreturns{"code":"rest_post_invalid_page_number","data":{"status":400}}. CheckX-Wp-Totalpagesand stop one short. - Excerpts are sometimes truncated mid-word with
[…]or…. They are not full article bodies — for full text, fetchcontent.renderedby omitting_fields=from the request (response will be 5-20× larger per post). - Article URL pattern is flat slug —
https://www.thewindowsclub.com/{slug}, no/year/month/prefix. Easy to construct fromslugalone. - Modified timestamp ≠ republish. Articles are routinely edited (typo fixes, link refreshes).
modified_gmt > date_gmtby months or years is normal; do not interpret it as a fresh publish event. X-Robots-Tag: noindexon the/wp-json/API responses is meta-information about the API endpoint itself (not the underlying posts) — it tells search engines not to index the API URLs. Safe to ignore for scraping.- The site exposes a sitemap at
https://www.thewindowsclub.com/sitemap_index.xml(referenced in/robots.txt). For complete-archive enumeration (~25k posts), the sitemap is faster than paginatingwp/v2/posts— but it has only URLs + lastmod, no titles/excerpts. Use for URL inventory; use the API for content. - Article-page
browse openmay report awaitForMainLoadStatetimeout because of slow third-party ad/analytics scripts on the article body. The DOM is interactive long beforeloadfires — the screenshot and HTML are valid even when the navigation call returns with a timeout error. For the API path this is irrelevant; for the browser fallback, usebrowse get html bodyrather than waiting forload.
Expected Output
{
"query": "fix blue screen",
"filters": {
"categories": null,
"tags": null,
"orderby": "relevance",
"order": "desc"
},
"total_results": 728,
"total_pages": 8,
"per_page": 100,
"page": 1,
"articles": [
{
"id": 107739,
"title": "How to fix Blue Screen in Windows 11 or Windows 10",
"slug": "blue-screen-death-windows-10",
"url": "https://www.thewindowsclub.com/blue-screen-death-windows-10",
"date_local": "2025-01-04T21:09:00",
"date_gmt": "2025-01-04T15:39:00",
"modified_gmt": "2026-01-12T10:22:00",
"excerpt": "Windows 11/10 too has the Blue Screen of Death (BSOD) or Stop Error screen that appears when you are in the middle of something, upgrading the operating system...",
"author_id": 136,
"category_ids": [569],
"category_names": ["Windows"],
"tag_ids": [239],
"tag_names": ["Blue Screen"]
},
{
"id": 534689,
"title": "Logi Options+ lets you control and personalize Logitech devices",
"slug": "logi-options-lets-you-control-and-personalize-logitech-devices",
"url": "https://www.thewindowsclub.com/logi-options-lets-you-control-and-personalize-logitech-devices",
"date_local": "2026-05-05T03:29:00",
"date_gmt": "2026-05-04T21:59:00",
"modified_gmt": "2026-05-05T08:37:38",
"excerpt": "Logitech devices are designed not just to work, but to work smarter, with added customization, comfort, and productivity-focused features...",
"author_id": 136,
"category_ids": [8],
"category_names": ["Downloads"],
"tag_ids": [14],
"tag_names": ["Freeware"]
}
]
}
Minimal-shape output when callers only need title + URL (using /wp-json/wp/v2/search):
{
"query": "fix blue screen",
"total_results": 729,
"articles": [
{
"id": 107739,
"title": "How to fix Blue Screen in Windows 11 or Windows 10",
"url": "https://www.thewindowsclub.com/blue-screen-death-windows-10",
"type": "post"
}
]
}
Empty-result shape (valid query, no matches):
{
"query": "completely-nonsense-query-xyz-zzz",
"total_results": 0,
"total_pages": 0,
"articles": []
}
Page-overflow error shape (when caller paginates past total_pages):
{
"error": "rest_post_invalid_page_number",
"status": 400,
"message": "The page number requested is larger than the number of pages available."
}