Fetch example.com Homepage Content
Purpose
Read-only extraction of the example.com homepage and return its h1 heading text and the first paragraph text (and, optionally, the trailing "Learn more" link). example.com is the IANA-reserved illustrative domain whose homepage is a single static HTML document served by Cloudflare — no JavaScript rendering, no anti-bot, no authentication. The optimal path is a raw HTTP fetch and a minimal HTML parse; a browser session is not required.
When to Use
- An agent needs a known-stable, zero-friction target to smoke-test its fetch + parse pipeline end to end.
- A documentation, tutorial, or eval harness needs the canonical "hello world" web payload returned in a normalized shape.
- A connectivity / DNS / TLS check needs to confirm not just that
example.comis reachable but that the expected document body is being served (e.g. detecting a captive portal or middlebox interception). - A demo wants to show a JSON-shaped extraction of
h1+ lead paragraph from any URL, using example.com as the safe reference input.
Workflow
The recommended path is a single HTTP fetch. example.com serves a complete, server-rendered HTML document — there is nothing for a browser to do that curl-equivalent tooling cannot.
-
Fetch the page with
browse cloud fetch. No--proxies, no--verified, no session needed:browse cloud fetch https://example.comThe response is a JSON envelope; the
contentfield holds the raw HTML andstatusCodeshould be200. -
Parse the HTML for the two required fields. The document structure is stable: a single
<h1>inside a<div>, followed by two<p>elements (the descriptive paragraph and a paragraph containing only the "Learn more"<a>). Any minimal parser works — examples:- Node:
cheerio→$('h1').text()and$('p').first().text(). - Python:
BeautifulSoup→soup.h1.get_text(strip=True)andsoup.find('p').get_text(strip=True). - Regex (acceptable because the document is hand-authored and stable):
/<h1>([^<]+)<\/h1>/and/<p>([^<]+)<\/p>/.
- Node:
-
Normalize whitespace on the extracted strings (collapse runs of whitespace, strip leading/trailing) before returning. The served HTML is minified onto a single line, so naive substring extraction will not have stray newlines, but downstream consumers should still be defensive.
-
Return the structured shape shown in Expected Output.
Browser fallback
Only worth using if your harness has no HTTP-fetch primitive at all, or if you want a visual screenshot for a marketplace card. Cost is ~2 orders of magnitude higher than browse cloud fetch (cloud session spin-up dominates).
sid=$(browse cloud sessions create --keep-alive | jq -r .id)
export BROWSE_SESSION="$sid"
browse open https://example.com --remote
browse get markdown body --remote
# {"markdown":"# Example Domain\n\nThis domain is for use in documentation examples without needing permission. Avoid use in operations.\n\n[Learn more](https://iana.org/domains/example)"}
browse cloud sessions update "$sid" --status REQUEST_RELEASE
The browse get markdown body output is already cleanly normalized; split on \n\n to separate the heading line from the first paragraph.
Site-Specific Gotchas
- The page text is not historical-museum content — it changed. The widely-quoted older version that began "This domain is for use in illustrative examples in documents…" is no longer what's served. As of
Last-Modified: Thu, 14 May 2026 05:31:28 GMT, the lead paragraph reads: "This domain is for use in documentation examples without needing permission. Avoid use in operations." Do not hardcode the paragraph text in tests — extract it at runtime, or your skill will silently rot the next time IANA updates the copy. - Cloudflare edge caching is aggressive (
Cf-Cache-Status: HIT,Ageheader in the tens of thousands of seconds is normal). TheLast-Modifiedheader is therefore the authoritative freshness signal, notDate. If you need to detect a content change, compareLast-Modifiedrather than re-fetching on a timer. - Allowed methods are
GET, HEADonly (Allow: GET, HEAD). Do not waste retries onPOST/OPTIONS; the origin will refuse them. - No
robots.txtenforcement and no rate limiting observed at single-digit requests per minute. This is the IANA reference domain, intentionally permissive for documentation use. Do not abuse it (do not use it as a load-test target — there are dedicated services for that). example.com,example.org,example.net, andexample.eduall serve the same payload from the same infrastructure. If your skill is generalized for "IANA example domains", treat them interchangeably; only the host header in the request differs.- The page does NOT include the host string
example.comin its visible body — only the title (<title>Example Domain</title>) and theh1(Example Domain) name the page. Do not assume the body contains the domain literal. - There is no API in the conventional sense. The HTML document itself is the API. Do not waste iterations probing for
/api/,/v1/, GraphQL, or sitemaps — none exist. Content-Encoding: br(Brotli) is returned by default.browse cloud fetchand modern HTTP clients decode this transparently; rawsocket-level clients will need to advertiseAccept-Encoding: identityif they cannot decode Brotli.
Expected Output
{
"url": "https://example.com",
"status": 200,
"fetched_at": "2026-05-19T00:00:43Z",
"last_modified": "2026-05-14T05:31:28Z",
"title": "Example Domain",
"h1": "Example Domain",
"first_paragraph": "This domain is for use in documentation examples without needing permission. Avoid use in operations.",
"learn_more_url": "https://iana.org/domains/example"
}
If you cannot reach the origin (DNS failure, TLS failure, captive portal returning a non-Example Domain body), return an error shape rather than fabricated content:
{
"url": "https://example.com",
"status": 0,
"error": "fetch_failed",
"error_detail": "ENOTFOUND example.com"
}
If you reach the origin but the document shape has drifted (no h1, or zero p elements found), return a partial-success shape with the raw HTML attached for debugging — never silently substitute defaults:
{
"url": "https://example.com",
"status": 200,
"h1": null,
"first_paragraph": null,
"error": "unexpected_document_shape",
"raw_html": "<!doctype html>..."
}