Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.auriko.ai/llms.txt

Use this file to discover all available pages before exploring further.

Fine-tune routing with suffix shortcuts, multi-model requests, quality constraints, and data policies. For basic routing, see Routing options.

Prerequisites

  • An Auriko API key
  • Python 3.10+ with the OpenAI SDK (pip install openai) or the auriko SDK (pip install auriko)
    • OR Node.js 18+ with the OpenAI SDK (npm install openai) or @auriko/sdk (npm install @auriko/sdk)
  • Familiarity with Routing options

How routing works

When you send a request, Auriko’s router:
  1. Enumerates candidates — finds all providers offering the requested model(s)
  2. Filters by constraints — removes providers that violate your routing options (data policy, Bring Your Own Key (BYOK) requirement, performance constraints, excluded providers)
  3. Scores by strategy — ranks remaining candidates using your optimize strategy:
    • cost: Cost-optimized, well-rounded
    • cost-focus: Aggressively minimize cost (default)
    • ttft: TTFT-optimized, well-rounded
    • ttft-focus: Aggressively minimize time to first token
    • tps: Throughput-optimized, well-rounded
    • tps-focus: Aggressively maximize tokens per second
    • balanced: All dimensions weighted evenly
  4. Selects and routes — selects from the ranked list, favoring higher-scored providers
  5. Falls back if needed — if the provider fails and allow_fallbacks is true, retries with the next candidate (up to max_fallback_attempts)
See Python SDK or TypeScript SDK for routing code examples.

Use suffix shortcuts

Append a suffix to any model name for quick routing configuration:
SuffixStrategyDescription
:cost-focuscost-focusAggressively minimize cost
:costcostCost-optimized, well-rounded
:ttft-focusttft-focusAggressively minimize time to first token
:ttftttftTTFT-optimized, well-rounded
:tps-focustps-focusAggressively maximize tokens per second
:tpstpsThroughput-optimized, well-rounded
:balancedbalancedAll dimensions weighted evenly
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

response = client.chat.completions.create(
    model="gpt-4o:cost-focus",
    messages=[{"role": "user", "content": "Hello!"}]
)

response = client.chat.completions.create(
    model="claude-sonnet-4-20250514:ttft",
    messages=[{"role": "user", "content": "Hello!"}]
)
Suffixes work with any HTTP client:
curl https://api.auriko.ai/v1/chat/completions \
  -H "Authorization: Bearer $AURIKO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o:cost-focus",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'
The router parses suffixes only when the model ID contains exactly one colon. Fine-tuned models with multiple colons (for example, ft:gpt-4o:org:custom) pass through unchanged.

Route across models

Pass gateway.models instead of model to route across multiple models (mutually exclusive with model, max 10):
response = client.chat.completions.create(
    model="",
    messages=[{"role": "user", "content": "Hello!"}],
    extra_body={"gateway": {
        "models": ["gpt-4o", "claude-sonnet-4-20250514", "gemini-flash-latest"],
        "routing": {"mode": "pool"}
    }}
)
ModeBehavior
pool (default)Select the best-scoring provider across all requested models
fallbackTry all providers for the first model, then the second model, and so on

Set quality constraints

Filter providers by performance requirements. Pass constraint ceilings under gateway.routing:
FieldTypeDescription
max_cost_per_1mnumberMaximum cost per 1 million tokens (USD)
max_ttft_msintegerMaximum time to first token in milliseconds
min_throughput_tpsnumberMinimum throughput in tokens per second
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    extra_body={"gateway": {"routing": {
        "optimize": "balanced",
        "min_throughput_tps": 30,
        "weights": {"cost": 0.6, "ttft": 0.4}
    }}}
)
Constraint ceilings (max_ttft_ms, min_throughput_tps, max_cost_per_1m) evaluate against median (p50) metrics. To rank providers by worst-case (p95) TTFT or throughput for scoring, set ttft_percentile or throughput_percentile — see Choose metric percentile.

Filter by parameter support

Not all providers support every optional parameter. By default, Auriko drops unsupported parameters and adds a warning to the response. Set require_parameters to true to only route to providers that accept the optional parameters you sent:
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    seed=42,
    extra_body={"gateway": {"routing": {
        "optimize": "cost",
        "require_parameters": True,
    }}}
)
The following parameters have per-provider support. When you set require_parameters to true, Auriko checks that your provider supports each one you sent: temperature, top_p, seed, logit_bias, logprobs, top_logprobs, n, presence_penalty, frequency_penalty, user, parallel_tool_calls, web_search_options, verbosity, prompt_cache_key, safety_identifier. require_parameters composes with other constraints. A provider must pass all filters to be eligible. You can check which parameters each provider supports via the model directory endpoint, where each provider entry includes accepted_params and supported_parameters fields.

Set custom weights

You can override preset strategies with custom weights across three dimensions.
DimensionFieldWhat it controls
CostcostFavor lower-cost providers
LatencyttftFavor lower time-to-first-token
ThroughputthroughputFavor higher tokens-per-second
Pass routing.weights with your desired dimensions:
# Only cost matters
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    extra_body={"gateway": {"routing": {"weights": {"cost": 1}}}}
)

# Mostly cost, some latency
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    extra_body={"gateway": {"routing": {"weights": {"cost": 0.7, "ttft": 0.3}}}}
)

# All three dimensions
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    extra_body={"gateway": {"routing": {"weights": {"cost": 0.85, "ttft": 0.5, "throughput": 0.5}}}}
)
  • The server accepts any non-negative numbers and normalizes them proportionally.
  • Omitted dimensions default to 0.
  • At least one dimension must be greater than 0.
  • weights overrides the optimize preset, and the response metadata contains routing_strategy: "custom".
  • To score using worst-case metrics, set ttft_percentile and/or throughput_percentile to "p95". See Choose metric percentile.
Providers approaching their rate limits are automatically deprioritized.

Choose metric percentile

By default, Auriko scores providers using median (p50) metrics. You can switch to 95th-percentile (worst-case) independently for TTFT and throughput:
FieldControlsDefault
ttft_percentileTTFT scoring (which providers rank higher)p50
throughput_percentileThroughput scoring (which providers rank higher)p50
Both scoring fields accept "p50" (median) or "p95" (worst-case). They work with presets and custom weights — no weights required. Example — rank providers by worst-case (p95) TTFT instead of median:
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    extra_body={"gateway": {"routing": {
        "optimize": "ttft",
        "ttft_percentile": "p95"
    }}}
)

Data policy

Control how providers handle your data:
PolicyDescription
none (default)No restrictions
no_trainingProvider must not use data for training
zdrZero data retention — strictest policy
The hierarchy is zdr > no_training > none. When a per-request policy intersects with an account-level policy, the most restrictive one wins.
response = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Sensitive financial data..."}],
    extra_body={"gateway": {"routing": {"data_policy": "zdr"}}}
)

Opt in to premium tiers

Premium-tier offerings are excluded from routing by default. Set tier to opt in:
ValueEffect
"priority"Includes Anthropic Fast Mode offerings (2.5x speed, 6x cost)
omitted (default)Excludes premium-tier offerings
response = client.chat.completions.create(
    model="claude-opus-4-6",
    messages=[{"role": "user", "content": "Hello!"}],
    extra_body={"gateway": {"routing": {"tier": "priority"}}}
)
Auriko’s “priority” tier refers to Anthropic Fast Mode, not Anthropic’s separate Priority Tier (committed capacity SLA). See Routing options for details.

Provider alias normalization

Provider names in providers and exclude_providers are case-insensitive and support aliases:
AliasCanonical name
google, google_ai, googleai, geminigoogle_ai_studio
fireworksfireworks_ai
togethertogether_ai
Unrecognized names pass through as-is (lowercased).

Configure fallbacks

By default, Auriko retries with alternative providers on 429 (rate limit), 5xx (server error), and timeout responses.
SettingDefaultDescription
allow_fallbackstrueEnable automatic fallback to alternative providers
max_fallback_attempts19Safety ceiling on fallback attempts beyond the primary, range 1-19 (chain length 20 total)
timeout_ms20000 (streaming) / 180000 (non-streaming)Per-attempt timeout in milliseconds. For streaming: time to first byte. For non-streaming: time to complete response
deadline_msNone (streaming) / 540000 (non-streaming)Hard wall-clock cap across all fallback attempts. Non-streaming requests default to 9 minutes; streaming has no default (connections are long-lived). Set explicitly to override
You can configure per-attempt timeouts with timeout_ms. To set a hard wall-clock cap across all fallback attempts, use deadline_ms. Non-streaming requests have a 9-minute default deadline; streaming requests have no deadline (opt-in only). If the deadline is exceeded, the request fails with a timeout error.
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    extra_body={"gateway": {"routing": {
        "allow_fallbacks": True,
        "max_fallback_attempts": 5
    }}}
)