Routing Options

Auriko intelligently routes your requests across multiple providers. Use routing options to optimize for your specific needs.

Prerequisites

An Auriko API key
Python 3.10+ with auriko SDK installed (pip install auriko)
- OR Node.js 18+ with @auriko/sdk installed (npm install @auriko/sdk)

Overview

Auriko supports six optimization strategies:

Strategy	Description	Best For
`cost`	Route to cheapest provider	Batch processing, non-urgent tasks
`cheapest`	Absolute lowest cost	Maximum cost savings, no latency requirements
`speed`	Minimize latency, maximize throughput	Real-time applications, chatbots
`ttft`	Minimize time to first token	Streaming UX, interactive apps
`throughput`	Maximize tokens per second	High-volume processing
`balanced` (default)	Weighted combination	General-purpose, mixed workloads

Cost Optimization

Minimize your LLM costs:

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello!"}],
    routing={
        "optimize": "cost"
    }
)

# See which provider was used and the cost
print(f"Provider: {response.routing_metadata.provider}")
print(f"Cost: ${response.routing_metadata.cost.billable_cost_usd:.6f}")

Latency Optimization

Get the fastest response:

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Quick answer: 2+2?"}],
    routing={
        "optimize": "speed"
    }
)

print(f"Latency: {response.routing_metadata.total_latency_ms}ms")

Latency Constraints

Set maximum time-to-first-token (TTFT):

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello!"}],
    routing={
        "optimize": "cost",
        "max_ttft_ms": 200  # Must start responding within 200ms
    }
)

If no provider can meet the latency constraint, Auriko returns a 503 error.

Set a cost ceiling

Exclude providers that exceed a per-1M-token budget:

import os
from auriko import Client

client = Client(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello!"}],
    routing={
        "optimize": "cost",
        "max_cost_per_1m": 5.00  # Max $5.00 per 1M tokens (average of input + output)
    }
)

Auriko calculates cost as the average of input and output price per 1M tokens. Providers exceeding this ceiling are excluded from routing. For fine-grained quality and cost constraints, see Advanced routing.

Provider Preferences

Prefer or exclude specific providers:

# Only consider these providers
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    routing={
        "providers": ["openai", "anthropic"]
    }
)

# Exclude providers
response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello!"}],
    routing={
        "exclude_providers": ["deepseek"]
    }
)

Restrict key source

Force requests to use only BYOK (bring-your-own-key) or only platform-managed keys:

import os
from auriko import Client

client = Client(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

# Use only your own provider keys
response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello!"}],
    routing={
        "only_byok": True
    }
)

# Use only Auriko platform keys
response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello!"}],
    routing={
        "only_platform": True
    }
)

Both are booleans, default false. Setting both to true returns a 400 error — they are mutually exclusive. When no key of the requested type is available, the request fails with no fallback. See Bring Your Own Key for BYOK setup.

Routing Metadata

Every response carries routing information:

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

metadata = response.routing_metadata
print(f"Provider: {metadata.provider}")
print(f"Model: {metadata.provider_model_id}")
print(f"Latency: {metadata.total_latency_ms}ms")
print(f"Input tokens: {metadata.cost.input_tokens}")
print(f"Output tokens: {metadata.cost.output_tokens}")
print(f"Cost: ${metadata.cost.billable_cost_usd:.6f}")

For the complete field reference including fallback chain, warnings, and all optional fields, see Response Extensions.

Full Example

import os
from auriko import Client

client = Client(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

# For a chatbot: optimize speed with cost ceiling
response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What's the capital of France?"}
    ],
    routing={
        "optimize": "speed",
        "max_ttft_ms": 150,
    }
)

print(response.choices[0].message.content)
print(f"\n--- Routing Info ---")
print(f"Provider: {response.routing_metadata.provider}")
print(f"Latency: {response.routing_metadata.total_latency_ms}ms")
print(f"Cost: ${response.routing_metadata.cost.billable_cost_usd:.6f}")

OpenAI SDK Compatibility

Using the OpenAI SDK, pass routing options via extra_body:

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello!"}],
    extra_body={
        "routing": {
            "optimize": "cost",
            "max_ttft_ms": 200
        }
    }
)

Choose a strategy

Match your use case to the right routing strategy:

Use case	Strategy	Key constraints	Example
Chatbot / real-time UI	`speed` or `ttft`	`max_ttft_ms: 200`	Interactive conversation
Batch processing	`cost` or `cheapest`	—	Document summarization
High-volume pipeline	`throughput`	`min_throughput_tps: 50`	Log analysis
Cost-conscious real-time	`cost`	`max_ttft_ms: 500`	Customer support
Compliance-sensitive	`balanced`	`data_policy: "zdr"`	Financial data
Multi-model exploration	`balanced`	`models: [...]`	A/B testing

Start max_ttft_ms at 200-500ms and adjust — setting it too low causes 503 errors when no provider meets the constraint.

For fine-grained control, see Advanced routing.

Get Started

Guides

Frameworks

Routing Options

Prerequisites

Overview

Cost Optimization

Latency Optimization

Latency Constraints

Set a cost ceiling

Provider Preferences

Restrict key source

Routing Metadata

Full Example

OpenAI SDK Compatibility

Choose a strategy

Get Started

Guides

Frameworks

​Prerequisites

​Overview

​Cost Optimization

​Latency Optimization

​Latency Constraints

​Set a cost ceiling

​Provider Preferences

​Restrict key source

​Routing Metadata

​Full Example

​OpenAI SDK Compatibility

​Choose a strategy

Prerequisites

Overview

Cost Optimization

Latency Optimization

Latency Constraints

Set a cost ceiling

Provider Preferences

Restrict key source

Routing Metadata

Full Example

OpenAI SDK Compatibility

Choose a strategy