Skip to main content
Auriko intelligently routes your requests across multiple providers. Use routing options to optimize for your specific needs.

Prerequisites

  • An Auriko API key
  • Python 3.10+ with auriko SDK installed (pip install auriko)
    • OR Node.js 18+ with @auriko/sdk installed (npm install @auriko/sdk)

Overview

Auriko supports six optimization strategies:
StrategyDescriptionBest For
costRoute to cheapest providerBatch processing, non-urgent tasks
cheapestAbsolute lowest costMaximum cost savings, no latency requirements
speedMinimize latency, maximize throughputReal-time applications, chatbots
ttftMinimize time to first tokenStreaming UX, interactive apps
throughputMaximize tokens per secondHigh-volume processing
balanced (default)Weighted combinationGeneral-purpose, mixed workloads

Cost Optimization

Minimize your LLM costs:
response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello!"}],
    routing={
        "optimize": "cost"
    }
)

# See which provider was used and the cost
print(f"Provider: {response.routing_metadata.provider}")
print(f"Cost: ${response.routing_metadata.cost.billable_cost_usd:.6f}")

Latency Optimization

Get the fastest response:
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Quick answer: 2+2?"}],
    routing={
        "optimize": "speed"
    }
)

print(f"Latency: {response.routing_metadata.total_latency_ms}ms")

Latency Constraints

Set maximum time-to-first-token (TTFT):
response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello!"}],
    routing={
        "optimize": "cost",
        "max_ttft_ms": 200  # Must start responding within 200ms
    }
)
If no provider can meet the latency constraint, Auriko returns a 503 error.

Set a cost ceiling

Exclude providers that exceed a per-1M-token budget:
import os
from auriko import Client

client = Client(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello!"}],
    routing={
        "optimize": "cost",
        "max_cost_per_1m": 5.00  # Max $5.00 per 1M tokens (average of input + output)
    }
)
Auriko calculates cost as the average of input and output price per 1M tokens. Providers exceeding this ceiling are excluded from routing. For fine-grained quality and cost constraints, see Advanced routing.

Provider Preferences

Prefer or exclude specific providers:
# Only consider these providers
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    routing={
        "providers": ["openai", "anthropic"]
    }
)

# Exclude providers
response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello!"}],
    routing={
        "exclude_providers": ["deepseek"]
    }
)

Restrict key source

Force requests to use only BYOK (bring-your-own-key) or only platform-managed keys:
import os
from auriko import Client

client = Client(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

# Use only your own provider keys
response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello!"}],
    routing={
        "only_byok": True
    }
)

# Use only Auriko platform keys
response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello!"}],
    routing={
        "only_platform": True
    }
)
Both are booleans, default false. Setting both to true returns a 400 error — they are mutually exclusive. When no key of the requested type is available, the request fails with no fallback. See Bring Your Own Key for BYOK setup.

Routing Metadata

Every response carries routing information:
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

metadata = response.routing_metadata
print(f"Provider: {metadata.provider}")
print(f"Model: {metadata.provider_model_id}")
print(f"Latency: {metadata.total_latency_ms}ms")
print(f"Input tokens: {metadata.cost.input_tokens}")
print(f"Output tokens: {metadata.cost.output_tokens}")
print(f"Cost: ${metadata.cost.billable_cost_usd:.6f}")
For the complete field reference including fallback chain, warnings, and all optional fields, see Response Extensions.

Full Example

import os
from auriko import Client

client = Client(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

# For a chatbot: optimize speed with cost ceiling
response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What's the capital of France?"}
    ],
    routing={
        "optimize": "speed",
        "max_ttft_ms": 150,
    }
)

print(response.choices[0].message.content)
print(f"\n--- Routing Info ---")
print(f"Provider: {response.routing_metadata.provider}")
print(f"Latency: {response.routing_metadata.total_latency_ms}ms")
print(f"Cost: ${response.routing_metadata.cost.billable_cost_usd:.6f}")

OpenAI SDK Compatibility

Using the OpenAI SDK, pass routing options via extra_body:
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello!"}],
    extra_body={
        "routing": {
            "optimize": "cost",
            "max_ttft_ms": 200
        }
    }
)

Choose a strategy

Match your use case to the right routing strategy:
Use caseStrategyKey constraintsExample
Chatbot / real-time UIspeed or ttftmax_ttft_ms: 200Interactive conversation
Batch processingcost or cheapestDocument summarization
High-volume pipelinethroughputmin_throughput_tps: 50Log analysis
Cost-conscious real-timecostmax_ttft_ms: 500Customer support
Compliance-sensitivebalanceddata_policy: "zdr"Financial data
Multi-model explorationbalancedmodels: [...]A/B testing
Start max_ttft_ms at 200-500ms and adjust — setting it too low causes 503 errors when no provider meets the constraint.
For fine-grained control, see Advanced routing.