Auriko intelligently routes your requests across multiple providers. Use routing options to optimize for your specific needs.
Prerequisites
- An Auriko API key
- Python 3.10+ with
auriko SDK installed (pip install auriko)
- OR Node.js 18+ with
@auriko/sdk installed (npm install @auriko/sdk)
Overview
Auriko supports six optimization strategies:
| Strategy | Description | Best For |
|---|
cost | Route to cheapest provider | Batch processing, non-urgent tasks |
cheapest | Absolute lowest cost | Maximum cost savings, no latency requirements |
speed | Minimize latency, maximize throughput | Real-time applications, chatbots |
ttft | Minimize time to first token | Streaming UX, interactive apps |
throughput | Maximize tokens per second | High-volume processing |
balanced (default) | Weighted combination | General-purpose, mixed workloads |
Cost Optimization
Minimize your LLM costs:
response = client.chat.completions.create(
model="gpt-5.4",
messages=[{"role": "user", "content": "Hello!"}],
routing={
"optimize": "cost"
}
)
# See which provider was used and the cost
print(f"Provider: {response.routing_metadata.provider}")
print(f"Cost: ${response.routing_metadata.cost.billable_cost_usd:.6f}")
Latency Optimization
Get the fastest response:
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Quick answer: 2+2?"}],
routing={
"optimize": "speed"
}
)
print(f"Latency: {response.routing_metadata.total_latency_ms}ms")
Latency Constraints
Set maximum time-to-first-token (TTFT):
response = client.chat.completions.create(
model="gpt-5.4",
messages=[{"role": "user", "content": "Hello!"}],
routing={
"optimize": "cost",
"max_ttft_ms": 200 # Must start responding within 200ms
}
)
If no provider can meet the latency constraint, Auriko returns a 503 error.
Set a cost ceiling
Exclude providers that exceed a per-1M-token budget:
import os
from auriko import Client
client = Client(
api_key=os.environ["AURIKO_API_KEY"],
base_url="https://api.auriko.ai/v1"
)
response = client.chat.completions.create(
model="gpt-5.4",
messages=[{"role": "user", "content": "Hello!"}],
routing={
"optimize": "cost",
"max_cost_per_1m": 5.00 # Max $5.00 per 1M tokens (average of input + output)
}
)
Auriko calculates cost as the average of input and output price per 1M tokens. Providers exceeding this ceiling are excluded from routing. For fine-grained quality and cost constraints, see Advanced routing.
Provider Preferences
Prefer or exclude specific providers:
# Only consider these providers
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}],
routing={
"providers": ["openai", "anthropic"]
}
)
# Exclude providers
response = client.chat.completions.create(
model="gpt-5.4",
messages=[{"role": "user", "content": "Hello!"}],
routing={
"exclude_providers": ["deepseek"]
}
)
Restrict key source
Force requests to use only BYOK (bring-your-own-key) or only platform-managed keys:
import os
from auriko import Client
client = Client(
api_key=os.environ["AURIKO_API_KEY"],
base_url="https://api.auriko.ai/v1"
)
# Use only your own provider keys
response = client.chat.completions.create(
model="gpt-5.4",
messages=[{"role": "user", "content": "Hello!"}],
routing={
"only_byok": True
}
)
# Use only Auriko platform keys
response = client.chat.completions.create(
model="gpt-5.4",
messages=[{"role": "user", "content": "Hello!"}],
routing={
"only_platform": True
}
)
Both are booleans, default false. Setting both to true returns a 400 error — they are mutually exclusive. When no key of the requested type is available, the request fails with no fallback. See Bring Your Own Key for BYOK setup.
Every response carries routing information:
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)
metadata = response.routing_metadata
print(f"Provider: {metadata.provider}")
print(f"Model: {metadata.provider_model_id}")
print(f"Latency: {metadata.total_latency_ms}ms")
print(f"Input tokens: {metadata.cost.input_tokens}")
print(f"Output tokens: {metadata.cost.output_tokens}")
print(f"Cost: ${metadata.cost.billable_cost_usd:.6f}")
For the complete field reference including fallback chain, warnings, and all optional fields, see Response Extensions.
Full Example
import os
from auriko import Client
client = Client(
api_key=os.environ["AURIKO_API_KEY"],
base_url="https://api.auriko.ai/v1"
)
# For a chatbot: optimize speed with cost ceiling
response = client.chat.completions.create(
model="gpt-5.4",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What's the capital of France?"}
],
routing={
"optimize": "speed",
"max_ttft_ms": 150,
}
)
print(response.choices[0].message.content)
print(f"\n--- Routing Info ---")
print(f"Provider: {response.routing_metadata.provider}")
print(f"Latency: {response.routing_metadata.total_latency_ms}ms")
print(f"Cost: ${response.routing_metadata.cost.billable_cost_usd:.6f}")
OpenAI SDK Compatibility
Using the OpenAI SDK, pass routing options via extra_body:
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["AURIKO_API_KEY"],
base_url="https://api.auriko.ai/v1"
)
response = client.chat.completions.create(
model="gpt-5.4",
messages=[{"role": "user", "content": "Hello!"}],
extra_body={
"routing": {
"optimize": "cost",
"max_ttft_ms": 200
}
}
)
Choose a strategy
Match your use case to the right routing strategy:
| Use case | Strategy | Key constraints | Example |
|---|
| Chatbot / real-time UI | speed or ttft | max_ttft_ms: 200 | Interactive conversation |
| Batch processing | cost or cheapest | — | Document summarization |
| High-volume pipeline | throughput | min_throughput_tps: 50 | Log analysis |
| Cost-conscious real-time | cost | max_ttft_ms: 500 | Customer support |
| Compliance-sensitive | balanced | data_policy: "zdr" | Financial data |
| Multi-model exploration | balanced | models: [...] | A/B testing |
Start max_ttft_ms at 200-500ms and adjust — setting it too low causes 503 errors when no provider meets the constraint.
For fine-grained control, see Advanced routing.