Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.auriko.ai/llms.txt

Use this file to discover all available pages before exploring further.

Auriko selects a provider for each request based on your routing configuration. Pass routing options in the gateway.routing object to control strategy, constraints, and provider filtering.

Prerequisites

  • An Auriko API key
  • Python 3.10+ with the OpenAI SDK (pip install openai) or the auriko SDK (pip install auriko)
    • OR Node.js 18+ with the OpenAI SDK (npm install openai) or @auriko/sdk (npm install @auriko/sdk)

Compare strategies

Auriko supports seven optimization strategies:
StrategyDescriptionBest For
costCost-optimized, well-roundedCost-conscious production, budget-sensitive apps
cost-focus (default)Aggressively minimize costMaximum cost savings, no latency requirements
ttftTTFT-optimized, well-roundedStreaming UX, interactive apps
ttft-focusAggressively minimize time to first tokenReal-time applications, chatbots
tpsThroughput-optimized, well-roundedHigh-volume processing
tps-focusAggressively maximize tokens per secondMaximum throughput, pipeline processing
balancedAll dimensions weighted evenlyGeneral-purpose, mixed workloads

Base vs. focus

Base strategies (cost, ttft, tps) optimize for the named dimension while still considering other quality factors. Focus strategies (cost-focus, ttft-focus, tps-focus) optimize almost entirely for the named dimension. Other factors have minimal influence.
TypeBehaviorUse when
Base (cost, ttft, tps)Favors the named dimension, well-roundedProduction workloads needing reliable performance
Focus (cost-focus, ttft-focus, tps-focus)Aggressively optimizes the named dimensionBatch processing, real-time streaming UI, high-throughput pipelines
For custom weight configurations beyond the preset strategies, see Set custom weights.

Optimize for cost

Auriko computes the expected cost of each request at every available provider and routes to the cheapest one. The cost model accounts for caching and pricing tiers. See Cost optimization for configuration, code examples, and the full cost model.

Optimize for latency

Route requests to low-latency providers:
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Quick answer: 2+2?"}],
    extra_body={"gateway": {"routing": {"optimize": "ttft-focus"}}}
)

Set latency constraints

Set maximum time-to-first-token (TTFT):
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello!"}],
    extra_body={"gateway": {"routing": {
        "optimize": "cost",
        "max_ttft_ms": 1000,
        "ttft_percentile": "p50",
    }}}
)
If no provider can meet the latency constraint, Auriko returns a 400 error.
max_ttft_ms evaluates against median (p50) metrics by default. To constrain on worst-case latency, set ttft_percentile to "p95". See Choose metric percentile.

Set cost ceilings

Exclude providers that exceed a per-1M-token budget:
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello!"}],
    extra_body={"gateway": {"routing": {
        "optimize": "cost",
        "max_cost_per_1m": 10.00,
    }}}
)
Auriko calculates cost as the average of input and output price per 1M tokens. Providers exceeding this ceiling are excluded from routing. For fine-grained constraints, see Advanced routing and Cost optimization.

Require supported parameters

Set require_parameters to true to only route to providers that accept all optional parameters you sent (like seed, logit_bias, or top_logprobs). Without this flag, Auriko drops unsupported parameters and adds a warning to the response.
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    seed=42,
    extra_body={"gateway": {"routing": {
        "optimize": "cost",
        "require_parameters": True,
    }}}
)
If no provider supports the parameters you sent, Auriko returns a 400 error with code required_params_not_supported. See Filter by parameter support for the full list of parameters this applies to.

Prefer or exclude providers

Prefer or exclude specific providers:
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

# Only consider these providers
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    extra_body={"gateway": {"routing": {"providers": ["openai", "anthropic"]}}}
)

# Exclude providers
response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello!"}],
    extra_body={"gateway": {"routing": {"exclude_providers": ["deepseek"]}}}
)
You can hint at a preferred provider without restricting the candidate pool:
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    extra_body={"gateway": {"routing": {"prefer": "openai"}}}
)
prefer is a soft hint. If the preferred provider is available, Auriko routes to it. If not, routing proceeds normally. Unlike providers, a prefer miss doesn’t fail the request.

Restrict key source

Force requests to use only BYOK (bring-your-own-key) or only platform-managed keys:
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

# Use only your own provider keys
response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello!"}],
    extra_body={"gateway": {"routing": {"only_byok": True}}}
)

# Use only Auriko platform keys
response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello!"}],
    extra_body={"gateway": {"routing": {"only_platform": True}}}
)
Both are booleans, default false. Setting both to true returns a 400 error. They’re mutually exclusive. When no key of the requested type is available, the request fails with no fallback. See Bring Your Own Key for BYOK setup.

Opt in to premium tiers

Premium-tier offerings are excluded from routing by default to prevent accidental cost escalation. Set tier to opt in.
ValueEffect
"priority"Includes Anthropic Fast Mode offerings (2.5x speed, 6x cost)
omitted (default)Excludes premium-tier offerings
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

response = client.chat.completions.create(
    model="claude-opus-4-6",
    messages=[{"role": "user", "content": "Hello!"}],
    extra_body={"gateway": {"routing": {"tier": "priority"}}}
)
Auriko’s “priority” tier refers to Anthropic Fast Mode, not Anthropic’s separate Priority Tier (committed capacity SLA).
Without tier, requests to models available only under a premium tier return tier_opt_in_required.

Read routing metadata

Every response carries routing information:
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

metadata = response.routing_metadata
print(f"Provider: {metadata.provider}")
print(f"Model: {metadata.provider_model_id}")
print(f"Strategy: {metadata.routing_strategy}")
print(f"Cost: ${metadata.cost.usd:.6f}")
For routing metadata with the OpenAI SDK, see OpenAI Compatibility. For the complete field reference including fallback chain, warnings, and all optional fields, see Response Extensions.

Combine routing options

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What's the capital of France?"}
    ],
    extra_body={"gateway": {"routing": {
        "optimize": "ttft-focus",
        "max_ttft_ms": 1000,
    }}}
)

print(response.choices[0].message.content)