Advanced Routing

Fine-tune routing with suffix shortcuts, multi-model requests, quality constraints, and data policies. For basic routing, see Routing options.

Prerequisites

An Auriko API key
A session token (for routing defaults)
Python 3.10+ with auriko SDK installed (pip install auriko)
- OR Node.js 18+ with @auriko/sdk installed (npm install @auriko/sdk)
Familiarity with Routing options

How routing works

When you send a request, Auriko’s router:

Enumerates candidates — finds all providers offering the requested model(s)
Filters by constraints — removes providers that violate your routing options (data policy, Bring Your Own Key (BYOK) requirement, min success rate, excluded providers)
Scores by strategy — ranks remaining candidates using your optimize strategy:
- cost / cheapest: lowest price per token
- ttft / speed: lowest latency to first token
- throughput: highest tokens per second
- balanced: weighted combination of cost, latency, and throughput
Selects and routes — selects from the ranked list, favoring higher-scored providers
Falls back if needed — if the provider fails and allow_fallbacks is true, retries with the next candidate (up to max_fallback_attempts)

See Python SDK or TypeScript SDK for routing code examples.

Use suffix shortcuts

Append a suffix to any model name for quick routing configuration:

Suffix	Strategy	Effect
`:floor`	`cheapest`	Absolute lowest cost
`:cost`	`cost`	Cost-optimized with more spread
`:nitro`	`speed`	Fastest overall provider
`:fast`	`ttft`	Fastest time to first token
`:balanced`	`balanced`	Weighted combination

import os
from auriko import Client

client = Client(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

# Cheapest provider for gpt-4o
response = client.chat.completions.create(
    model="gpt-4o:floor",
    messages=[{"role": "user", "content": "Hello!"}]
)

# Fastest time to first token
response = client.chat.completions.create(
    model="claude-sonnet-4-20250514:fast",
    messages=[{"role": "user", "content": "Hello!"}]
)

The router parses suffixes only when the model ID contains exactly one colon. Fine-tuned models with multiple colons (for example, ft:gpt-4o:org:custom) pass through unchanged.

Route across models

Pass models instead of model to route across multiple models (mutually exclusive with model, max 10):

import os
from auriko import Client

client = Client(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

# Pool mode (default): best provider across all models
response = client.chat.completions.create(
    models=["gpt-4o", "claude-sonnet-4-20250514", "gemini-2.0-flash"],
    messages=[{"role": "user", "content": "Hello!"}],
    routing={"mode": "pool"}
)

# Fallback mode: try models in order
response = client.chat.completions.create(
    models=["gpt-4o", "claude-sonnet-4-20250514"],
    messages=[{"role": "user", "content": "Hello!"}],
    routing={"mode": "fallback"}
)

Mode	Behavior
`pool` (default)	Select the best-scoring provider across all requested models
`fallback`	Try all providers for the first model, then the second model, and so on

Set quality constraints

Filter providers by performance requirements:

Constraint	Type	Description
`min_throughput_tps`	number	Minimum tokens per second
`min_success_rate`	number (0–1)	Minimum success rate
`max_cost_per_1m`	number	Maximum cost per 1M tokens (average of input + output)
`max_ttft_ms`	number	Maximum time to first token in milliseconds
`weights`	object	Custom scoring weights: `{ cost, ttft, throughput, reliability }`. Overrides preset.

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    routing={
        "optimize": "balanced",
        "min_throughput_tps": 50,
        "min_success_rate": 0.95,
        "weights": {"cost": 0.6, "ttft": 0.4}
    }
)

Custom weights let you control the exact tradeoff between cost, latency, throughput, and reliability. When provided, they override the preset coefficients. The server normalizes weights to sum to 1.0.

Data policy

Control how providers handle your data:

Policy	Description
`none` (default)	No restrictions
`no_training`	Provider must not use data for training
`zdr`	Zero data retention — strictest policy

The hierarchy is zdr > no_training > none. When a per-request policy intersects with an account-level policy, the most restrictive one wins.

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Sensitive financial data..."}],
    routing={"data_policy": "zdr"}
)

Provider alias normalization

Provider names in providers and exclude_providers are case-insensitive and support aliases:

Alias	Canonical name
`google`, `google_ai`, `googleai`, `gemini`	`google_ai_studio`
`fireworks`	`fireworks_ai`
`together`	`together_ai`

Unrecognized names pass through as-is (lowercased).

Configure fallbacks

By default, Auriko retries with alternative providers on 429 (rate limit), 5xx (server error), and timeout responses.

Setting	Default	Description
`allow_fallbacks`	`true`	Enable automatic fallback to alternative providers
`max_fallback_attempts`	3	Maximum fallback attempts (not counting the primary attempt)

Timeouts: 10 seconds for the first byte on streaming requests, 60 seconds total for non-streaming requests.

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    routing={
        "allow_fallbacks": True,
        "max_fallback_attempts": 5
    }
)

Set workspace defaults

Set default routing options for all requests in a workspace:

# Get current defaults
curl https://api.auriko.ai/v1/workspaces/{workspace_id}/routing-defaults \
  -H "Authorization: Bearer $SESSION_JWT"

# Set defaults
curl -X PATCH https://api.auriko.ai/v1/workspaces/{workspace_id}/routing-defaults \
  -H "Authorization: Bearer $SESSION_JWT" \
  -H "Content-Type: application/json" \
  -d '{
    "optimize": "cost",
    "data_policy": "no_training"
  }'

Routing defaults use session authentication, not API key authentication. See Authentication.

To clear all routing defaults, send an empty object {} as the PATCH body. Per-request routing options override workspace defaults. Model suffix overrides sit between workspace defaults and per-request options.

Precedence

Per-request routing options (highest)
Model suffix overrides (for example, :floor)
Workspace routing defaults (lowest)

Get Started

Guides

Frameworks

Advanced Routing

Prerequisites

How routing works

Use suffix shortcuts

Route across models

Set quality constraints

Data policy

Provider alias normalization

Configure fallbacks

Set workspace defaults

Precedence

Get Started

Guides

Frameworks

​Prerequisites

​How routing works

​Use suffix shortcuts

​Route across models

​Set quality constraints

​Data policy

​Provider alias normalization

​Configure fallbacks

​Set workspace defaults

​Precedence

Prerequisites

How routing works

Use suffix shortcuts

Route across models

Set quality constraints

Data policy

Provider alias normalization

Configure fallbacks

Set workspace defaults

Precedence