OpenRouter: Compare LLMs and Build with the Best
Purpose
Given a project requirement — typical inputs are some combination of (a) budget per million tokens, (b) speed/latency target, (c) required input/output modalities (text/image/audio/file/video), (d) required supported_parameters (e.g. tools, structured_outputs, reasoning), (e) minimum context length, (f) use-case category (programming, roleplay, finance, legal, marketing, health, academia, translation, technology, SEO) — return a ranked shortlist of OpenRouter models with full cost/speed/uptime data, name a recommended pick with rationale, and emit a ready-to-run code snippet that calls the chosen model through OpenRouter's OpenAI-compatible /api/v1/chat/completions endpoint. Read-only with respect to the catalog; the "build" step issues a real (billable) inference call only when the caller supplies their own OPENROUTER_API_KEY.
When to Use
- A developer asks "what's the cheapest model that supports tool calling and vision?"
- A latency-sensitive product (autocomplete, voice agent) needs the fastest model meeting a quality bar.
- An agent picking a sub-model dynamically per task class — long-context summarization vs. cheap classification.
- Cost-optimization sweeps across an existing OpenRouter integration ("what's the next-cheapest model that still passes our evals?").
- Anywhere you'd otherwise scrape the
/modelspage or read screenshots of the leaderboard — the JSON API has every field the UI surfaces, plus per-provider throughput and uptime that the UI buries.
Workflow
OpenRouter exposes a public JSON catalog at https://openrouter.ai/api/v1/models and per-provider stats at https://openrouter.ai/api/v1/models/{model_id}/endpoints. Both reads are unauthenticated — no API key, no cookies, no anti-bot stealth, no proxy required. The /api/v1/chat/completions build endpoint is OpenAI-compatible and uses a Bearer key from https://openrouter.ai/settings/keys. Lead with the API path; the browser path works as a fallback and exists chiefly to surface the use-case category leaderboards that don't have a clean JSON equivalent.
1. Pull the model catalog
GET https://openrouter.ai/api/v1/models
Accept: application/json
Returns { "data": [Model, ...] } with ~350 models. Response is ~450 KB gzipped JSON, no pagination. Each Model has:
| Field | Notes |
|---|---|
id | Slug used in chat-completion model field — e.g. anthropic/claude-sonnet-4, openai/gpt-5-nano, deepseek/deepseek-v4-flash. Free tiers append :free (e.g. minimax/minimax-m2.5:free). |
canonical_slug | Internal versioned slug. Don't pass to /chat/completions — use id. |
name | Human display name. |
description | Long-form. Often embeds use-case category rankings as inline text like "Translation (#27)Finance (#19)" — useful regex signal for category filtering. |
context_length | Max prompt+completion tokens. |
architecture.input_modalities | Subset of text, image, file, audio, video. |
architecture.output_modalities | Subset of text, image, audio. |
architecture.tokenizer | Claude, GPT, Llama, Mistral, Other, etc. — relevant only for offline token counting. |
pricing.prompt | USD per token (not per 1M — see gotcha below). |
pricing.completion | USD per token. |
pricing.web_search, pricing.input_cache_read, pricing.input_cache_write | Add-on rates; web_search is per call, caches are per token. Often missing for non-Anthropic/OpenAI models — treat absence as "not supported". |
top_provider.context_length | Effective context after provider-side truncation. May be lower than the model's nominal context_length. |
top_provider.max_completion_tokens | Max output tokens. |
top_provider.is_moderated | Provider applies a moderation layer. |
supported_parameters | Array — see "Filtering by capability" below for the canonical enum. |
default_parameters | Provider-recommended defaults (temperature, top_p, etc.). |
supported_voices | TTS voices for audio-output models; null for text-only. |
knowledge_cutoff | ISO date or null. |
expiration_date | When a model variant will be removed. Filter out non-null past dates. |
links | { deeplink, ... } to docs and provider pages. |
Filtering by capability — supported_parameters is the most useful filter axis. Observed enum across the live catalog:
frequency_penalty, include_reasoning, logit_bias, logprobs, max_completion_tokens,
max_tokens, min_p, parallel_tool_calls, presence_penalty, reasoning, reasoning_effort,
repetition_penalty, response_format, seed, stop, structured_outputs, temperature,
tool_choice, tools, top_a, top_k, top_logprobs, top_p, verbosity, web_search_options
The /api/v1/models?supported_parameters=tools query param is accepted by the server and narrows the catalog server-side — but input_modalities=image, category=programming, and order=... are page-only params (no JSON filtering); apply those client-side after fetching the full list.
2. (Optional) Pull per-provider endpoint stats — the speed signal
The catalog has prices but no speed data. For a candidate id, fetch:
GET https://openrouter.ai/api/v1/models/{id}/endpoints
Returns { "data": { "id": ..., "endpoints": [Endpoint, ...] } }. Each Endpoint is one provider's serving of the model. Speed/health fields:
| Field | Meaning |
|---|---|
provider_name / tag | e.g. Anthropic / anthropic, Google / google-vertex/global. The tag is what provider routing accepts in chat-completions. |
pricing.* | Per-provider price (can differ from catalog top-line; the catalog shows top_provider's price). |
context_length / max_completion_tokens | Per-provider — Bedrock often capped at 200K where Anthropic-direct exposes 1M. |
throughput_last_30m | Tokens/sec rolling avg. Often null if the endpoint hasn't been hit in 30m. |
latency_last_30m | First-token latency, seconds. Often null for low-traffic endpoints. |
uptime_last_30m / uptime_last_5m / uptime_last_1d | Percent. The uptime_last_1d is the most reliable health signal — <95 for a day means actively unstable. |
status | 0 = healthy, -5 (observed) = degraded/unavailable. Filter to status === 0. |
quantization | e.g. unknown, fp8, int4. Affects quality. |
supports_implicit_caching | Anthropic-style 90% input-token discount for cache hits. |
Note on null speed fields. Throughput and latency are rolling 30-minute aggregates; low-volume models (most of the 350-model catalog) will show null for one or both. Don't filter null-out as "broken" — fall back to uptime_last_1d and provider reputation. When speed is present, it's authoritative.
3. Rank and pick
A practical ranking function for "cheapest acceptable" looks like:
const score = (m) => {
if (m.architecture.input_modalities.indexOf(requiredModality) < 0) return Infinity;
if (requiredParams.some(p => m.supported_parameters.indexOf(p) < 0)) return Infinity;
if (m.context_length < minContext) return Infinity;
return parseFloat(m.pricing.prompt) + parseFloat(m.pricing.completion);
};
For "fastest acceptable", fetch /endpoints for each candidate and rank by -throughput_last_30m (descending), tie-break by latency_last_30m ascending, then by price ascending. For a "balanced" pick — weighted geometric mean of normalized (1/price), throughput, and uptime_last_1d.
When in doubt, present 3 candidates ranked by the user's stated priority and let them pick.
4. Use-case category lookup
The catalog doesn't expose a clean category field per model, but two signals are available:
- Inline in
description: most ranked models embed strings like"Programming (#3)Roleplay (#12)"near the top ofdescription. Parse with/\b(Programming|Roleplay|Marketing|Finance|Legal|Health|Academia|Translation|Technology|SEO)\s*\(#(\d+)\)/g. - Browser fallback:
https://openrouter.ai/rankingshas the human-curated leaderboards. ThecategoryURL param on/models(?category=programming) filters the page but is not honored by/api/v1/models.
If the user asks "which model is best for X" where X is a use case, start with the inline-description signal; fall back to the rankings page only if you need beyond-top-10 data.
5. Build — call the chosen model
OpenRouter exposes an OpenAI-compatible chat endpoint:
POST https://openrouter.ai/api/v1/chat/completions
Authorization: Bearer $OPENROUTER_API_KEY
Content-Type: application/json
HTTP-Referer: <your-site-url> # optional, helps with usage analytics
X-Title: <your-app-name> # optional, surfaces in /apps leaderboard
{
"model": "<id from step 1, e.g. anthropic/claude-sonnet-4>",
"messages": [
{ "role": "system", "content": "..." },
{ "role": "user", "content": "..." }
],
"tools": [ ... ], // if you filtered by supported_parameters=tools
"temperature": 0.7,
"max_tokens": 2048,
"provider": { "order": ["anthropic", "google-vertex/global"] } // optional routing
}
Three idiomatic SDK forms — all return identical OpenAI-shaped responses:
# Python — OpenAI SDK pointed at OpenRouter
from openai import OpenAI
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key=os.environ["OPENROUTER_API_KEY"],
)
resp = client.chat.completions.create(
model="anthropic/claude-sonnet-4",
messages=[{"role": "user", "content": "Hello"}],
extra_headers={"HTTP-Referer": "https://myapp.com", "X-Title": "MyApp"},
)
// Node — fetch
const r = await fetch("https://openrouter.ai/api/v1/chat/completions", {
method: "POST",
headers: {
"Authorization": `Bearer ${process.env.OPENROUTER_API_KEY}`,
"Content-Type": "application/json",
"HTTP-Referer": "https://myapp.com",
"X-Title": "MyApp",
},
body: JSON.stringify({
model: "anthropic/claude-sonnet-4",
messages: [{ role: "user", content: "Hello" }],
}),
});
const data = await r.json();
# curl
curl https://openrouter.ai/api/v1/chat/completions \
-H "Authorization: Bearer $OPENROUTER_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"anthropic/claude-sonnet-4","messages":[{"role":"user","content":"Hello"}]}'
Do not actually issue the chat-completion call unless the user explicitly asks you to and supplies their own key. The catalog/endpoint reads are free; chat completions bill against the key holder's balance. Emit the snippet ready-to-run; let the user execute.
Browser fallback
If the JSON API is unreachable (it never has been in practice — cloudflare edge, Cache-Control: private, no-store, no rate limit on read paths) or you need a screenshot for a human reviewer:
https://openrouter.ai/models?fmt=table
&order=<pricing-low-to-high|throughput-high-to-low|latency-low-to-high|context-high-to-low|newest>
&supported_parameters=<comma-separated, e.g. tools,structured_outputs>
&input_modalities=<comma-separated, e.g. image,file>
&category=<programming|roleplay|finance|legal|marketing|health|academia|translation|technology|seo>
All URL params are preserved on navigation — the page is a thin client over the same /api/v1/models JSON with client-side sort/filter. No login wall, no Cloudflare challenge, no proxy needed; a bare browse open works. fmt=table is denser than fmt=cards. Wait ~3s after load before snapshotting — the model list lazy-renders.
The model detail page https://openrouter.ai/{org}/{model-name} (e.g. /anthropic/claude-sonnet-4) is the human view of the /api/v1/models/{id}/endpoints data — provider pricing table, throughput chart, uptime sparkline. Capture as a screenshot when a user wants a visual comparison.
Site-Specific Gotchas
pricing.promptandpricing.completionare per token, not per 1M tokens. The web UI displays the per-1M figure (e.g. "$3 /M input tokens" for Claude Sonnet 4); the API returns"0.000003". Multiply by1_000_000for the per-1M display value. Free-tier:freemodels havepricing.prompt === "0"andpricing.completion === "0"— 28 of them in the live catalog as of 2026-05-19.- Use
id, notcanonical_slug, in chat-completionmodelfield.canonical_slugis the dated internal version (anthropic/claude-4.7-opus-fast-20260512) and is not accepted by the routing layer. - Free models (
:freesuffix) are heavily rate-limited and route to community endpoints. Throughput/latency are best-effort and frequentlynull. Useful for evaluation but not production unless paired with a paid fallback inprovider.order. /api/v1/models?supported_parameters=toolsis the only server-side filter that works.input_modalities=,category=,order=are accepted by the/modelspage (URL state, client-side filter) but not by/api/v1/models— passing them returns the full catalog. Apply those filters client-side after fetching./rankings/{category}sub-paths 404-redirect.https://openrouter.ai/rankings/financeredirects back to/rankings(no per-category page). The category breakdown lives in dropdowns on the/rankingspage itself and as inline ranks inside each model'sdescriptionfield — parse from there./api/v1/creditsand most non-/modelspaths require a session cookie or Bearer token. Don't probe them anonymously expecting JSON — you'll get{"error":{"message":"No cookie auth credentials found","code":401}}. The two open read paths are/api/v1/modelsand/api/v1/models/{id}/endpoints.throughput_last_30mandlatency_last_30mare frequentlynull. Low-traffic models (anything outside the top-30 by usage) won't have a recent rolling window.null≠ slow; it means "no data". Fall back touptime_last_1dfor health and treat speed as unknown when both throughput fields are null on every endpoint.- Per-provider context/max-output vary. Same model
id, differentendpoints[i].context_length— Anthropic-direct typically exposes the maximum (1M for Sonnet 4); Bedrock often caps at 200K. If the user needs the full context window, route explicitly viaprovider.order: ["anthropic"]. top_provideris the default route, not necessarily the cheapest or fastest. OpenRouter's load balancer picks among healthy endpoints; the price/speed of an actual call depends on which provider got the request. For deterministic cost, pin a provider via theprovider.orderfield on the chat-completion call.status === -5indicates a degraded/unavailable endpoint. Observed for Bedrock Sonnet 4 (eu-west-1, uptime ~44% over 1d). Always filterendpointstostatus === 0before computing aggregate speed/cost.web_search,input_cache_read,input_cache_writeare often missing inpricing. Treat absence as "feature not supported" — don't substitute0.- Use-case categories are not first-class API fields. They're embedded as inline rank strings in each model's
description("Programming (#3)Marketing (#5)...") and as page-only UI on/rankings. The known category enum from page traversal:Programming, Roleplay, Marketing, Finance, Legal, Health, Academia, Translation, Technology, SEO. Use the description regex for ranking; the/models?category=URL is a UI filter only. HTTP-RefererandX-Titleheaders are optional but recommended. They surface your app in/appsleaderboards and help OpenRouter route to better providers for your traffic profile. They do not affect billing.- No anti-bot, no Cloudflare challenge on read paths. A residential proxy is unnecessary for
/api/v1/models,/api/v1/models/{id}/endpoints, or the/modelsHTML page. We probed with a barebrowse cloud fetch(no proxy, no stealth) and got 200 OK withServer: cloudflareandCache-Control: private, no-store. The metadata'sverified: true / proxies: trueflags reflect the screenshot-capture session config, not a requirement.
Expected Output
The skill should return a JSON object with the ranked shortlist, the recommended pick, and a build snippet. Three illustrative shapes:
// Shape 1 — cheap+tools+vision request
{
"success": true,
"criteria": {
"modalities": ["image", "text"],
"supported_parameters": ["tools"],
"min_context": 100000,
"rank_by": "cost"
},
"shortlist": [
{
"id": "google/gemma-3-12b-it",
"name": "Google: Gemma 3 12B",
"prompt_price_per_1m": 0.04,
"completion_price_per_1m": 0.13,
"context_length": 131072,
"input_modalities": ["text", "image"],
"supported_parameters": ["tools", "structured_outputs", "..."],
"top_provider_uptime_1d": 99.8,
"throughput_last_30m_tps": null
},
{
"id": "amazon/nova-lite-v1",
"name": "Amazon: Nova Lite 1.0",
"prompt_price_per_1m": 0.06,
"completion_price_per_1m": 0.24,
"context_length": 307200,
"input_modalities": ["text", "image", "video"],
"supported_parameters": ["tools", "..."],
"top_provider_uptime_1d": 99.95,
"throughput_last_30m_tps": 142.3
}
],
"recommendation": {
"id": "amazon/nova-lite-v1",
"rationale": "Cheapest model meeting all criteria with non-null throughput data and 99.95% 1-day uptime; Gemma 3 12B is fractionally cheaper but has no live throughput signal."
},
"build_snippet": {
"language": "python",
"code": "from openai import OpenAI\nclient = OpenAI(base_url='https://openrouter.ai/api/v1', api_key=os.environ['OPENROUTER_API_KEY'])\nresp = client.chat.completions.create(model='amazon/nova-lite-v1', messages=[...], tools=[...])\n"
}
}
// Shape 2 — speed-first request with provider pinning
{
"success": true,
"criteria": { "rank_by": "throughput", "use_case": "programming" },
"shortlist": [
{
"id": "anthropic/claude-opus-4.7-fast",
"name": "Anthropic: Claude Opus 4.7 (Fast)",
"prompt_price_per_1m": 30,
"completion_price_per_1m": 150,
"endpoints": [
{ "provider_name": "Anthropic", "tag": "anthropic", "throughput_tps": 142, "latency_s": 0.7, "uptime_1d": 100 }
]
}
],
"recommendation": {
"id": "anthropic/claude-opus-4.7-fast",
"provider_pin": "anthropic",
"rationale": "Fast-mode variant explicitly priced for high throughput; single-provider routing avoids fallback variance."
},
"build_snippet": { "language": "curl", "code": "curl https://openrouter.ai/api/v1/chat/completions -H 'Authorization: Bearer $OPENROUTER_API_KEY' -d '{\"model\":\"anthropic/claude-opus-4.7-fast\",\"provider\":{\"order\":[\"anthropic\"]},\"messages\":[...]}'" }
}
// Shape 3 — no model meets criteria
{
"success": false,
"reason": "no_model_matches_criteria",
"criteria": { "modalities": ["audio"], "supported_parameters": ["tools"], "max_prompt_price_per_1m": 0.10 },
"closest_matches": [
{
"id": "...",
"name": "...",
"violation": "prompt_price_per_1m=0.50 exceeds 0.10 budget"
}
],
"suggestion": "Relax max_prompt_price_per_1m to 0.50 or drop the tools requirement (no audio-input model currently both supports tool calling and prices below $0.10/M input)."
}