Skip to main content
Fine-tune routing with suffix shortcuts, multi-model requests, quality constraints, and data policies. For basic routing, see Routing options.

Prerequisites

  • An Auriko API key
  • A session token (for routing defaults)
  • Python 3.10+ with auriko SDK installed (pip install auriko)
    • OR Node.js 18+ with @auriko/sdk installed (npm install @auriko/sdk)
  • Familiarity with Routing options

How routing works

When you send a request, Auriko’s router:
  1. Enumerates candidates — finds all providers offering the requested model(s)
  2. Filters by constraints — removes providers that violate your routing options (data policy, Bring Your Own Key (BYOK) requirement, min success rate, excluded providers)
  3. Scores by strategy — ranks remaining candidates using your optimize strategy:
    • cost / cheapest: lowest price per token
    • ttft / speed: lowest latency to first token
    • throughput: highest tokens per second
    • balanced: weighted combination of cost, latency, and throughput
  4. Selects and routes — selects from the ranked list, favoring higher-scored providers
  5. Falls back if needed — if the provider fails and allow_fallbacks is true, retries with the next candidate (up to max_fallback_attempts)
See Python SDK or TypeScript SDK for routing code examples.

Use suffix shortcuts

Append a suffix to any model name for quick routing configuration:
SuffixStrategyEffect
:floorcheapestAbsolute lowest cost
:costcostCost-optimized with more spread
:nitrospeedFastest overall provider
:fastttftFastest time to first token
:balancedbalancedWeighted combination
import os
from auriko import Client

client = Client(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

# Cheapest provider for gpt-4o
response = client.chat.completions.create(
    model="gpt-4o:floor",
    messages=[{"role": "user", "content": "Hello!"}]
)

# Fastest time to first token
response = client.chat.completions.create(
    model="claude-sonnet-4-20250514:fast",
    messages=[{"role": "user", "content": "Hello!"}]
)
The router parses suffixes only when the model ID contains exactly one colon. Fine-tuned models with multiple colons (for example, ft:gpt-4o:org:custom) pass through unchanged.

Route across models

Pass models instead of model to route across multiple models (mutually exclusive with model, max 10):
import os
from auriko import Client

client = Client(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

# Pool mode (default): best provider across all models
response = client.chat.completions.create(
    models=["gpt-4o", "claude-sonnet-4-20250514", "gemini-2.0-flash"],
    messages=[{"role": "user", "content": "Hello!"}],
    routing={"mode": "pool"}
)

# Fallback mode: try models in order
response = client.chat.completions.create(
    models=["gpt-4o", "claude-sonnet-4-20250514"],
    messages=[{"role": "user", "content": "Hello!"}],
    routing={"mode": "fallback"}
)
ModeBehavior
pool (default)Select the best-scoring provider across all requested models
fallbackTry all providers for the first model, then the second model, and so on

Set quality constraints

Filter providers by performance requirements:
ConstraintTypeDescription
min_throughput_tpsnumberMinimum tokens per second
min_success_ratenumber (0–1)Minimum success rate
max_cost_per_1mnumberMaximum cost per 1M tokens (average of input + output)
max_ttft_msnumberMaximum time to first token in milliseconds
weightsobjectCustom scoring weights: { cost, ttft, throughput, reliability }. Overrides preset.
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    routing={
        "optimize": "balanced",
        "min_throughput_tps": 50,
        "min_success_rate": 0.95,
        "weights": {"cost": 0.6, "ttft": 0.4}
    }
)
Custom weights let you control the exact tradeoff between cost, latency, throughput, and reliability. When provided, they override the preset coefficients. The server normalizes weights to sum to 1.0.

Data policy

Control how providers handle your data:
PolicyDescription
none (default)No restrictions
no_trainingProvider must not use data for training
zdrZero data retention — strictest policy
The hierarchy is zdr > no_training > none. When a per-request policy intersects with an account-level policy, the most restrictive one wins.
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Sensitive financial data..."}],
    routing={"data_policy": "zdr"}
)

Provider alias normalization

Provider names in providers and exclude_providers are case-insensitive and support aliases:
AliasCanonical name
google, google_ai, googleai, geminigoogle_ai_studio
fireworksfireworks_ai
togethertogether_ai
Unrecognized names pass through as-is (lowercased).

Configure fallbacks

By default, Auriko retries with alternative providers on 429 (rate limit), 5xx (server error), and timeout responses.
SettingDefaultDescription
allow_fallbackstrueEnable automatic fallback to alternative providers
max_fallback_attempts3Maximum fallback attempts (not counting the primary attempt)
Timeouts: 10 seconds for the first byte on streaming requests, 60 seconds total for non-streaming requests.
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    routing={
        "allow_fallbacks": True,
        "max_fallback_attempts": 5
    }
)

Set workspace defaults

Set default routing options for all requests in a workspace:
# Get current defaults
curl https://api.auriko.ai/v1/workspaces/{workspace_id}/routing-defaults \
  -H "Authorization: Bearer $SESSION_JWT"

# Set defaults
curl -X PATCH https://api.auriko.ai/v1/workspaces/{workspace_id}/routing-defaults \
  -H "Authorization: Bearer $SESSION_JWT" \
  -H "Content-Type: application/json" \
  -d '{
    "optimize": "cost",
    "data_policy": "no_training"
  }'
Routing defaults use session authentication, not API key authentication. See Authentication.
To clear all routing defaults, send an empty object {} as the PATCH body. Per-request routing options override workspace defaults. Model suffix overrides sit between workspace defaults and per-request options.

Precedence

  1. Per-request routing options (highest)
  2. Model suffix overrides (for example, :floor)
  3. Workspace routing defaults (lowest)