Skip to main content
Auriko can save you 30-70% on LLM costs by intelligently routing requests to the most cost-effective provider.

Prerequisites

  • An Auriko API key
  • Python 3.10+ with auriko SDK installed (pip install auriko)
    • OR Node.js 18+ with @auriko/sdk installed (npm install @auriko/sdk)
  • Active usage to see cost comparisons

How It Works

When you set optimize: "cost", Auriko:
  1. Identifies all providers that can serve your model
  2. Compares real-time pricing across providers
  3. Routes to the cheapest available option
  4. Falls back to alternatives if the cheapest is unavailable

Enable Cost Optimization

import os
from auriko import Client

client = Client(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello!"}],
    routing={
        "optimize": "cost"
    }
)

# See the actual cost
print(f"Cost: ${response.routing_metadata.cost.billable_cost_usd:.6f}")
print(f"Provider: {response.routing_metadata.provider}")

Cost with Latency Constraints

Optimize for cost while maintaining latency requirements:
response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello!"}],
    routing={
        "optimize": "cost",
        "max_ttft_ms": 500  # Max 500ms to first token
    }
)
Auriko will find the cheapest provider that can meet the latency constraint.

Restrict key source

If you have negotiated provider rates through your own API keys, force requests to use only BYOK keys for cost control:
import os
from auriko import Client

client = Client(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello!"}],
    routing={
        "optimize": "cost",
        "only_byok": True  # Use only your own provider keys
    }
)
See Routing options for the full constraint API and Bring Your Own Key for BYOK setup.

View Your Costs

Every response includes detailed cost information:
cost = response.routing_metadata.cost
print(f"Input tokens: {cost.input_tokens}")
print(f"Output tokens: {cost.output_tokens}")
print(f"Total cost: ${cost.billable_cost_usd:.6f}")

Cost Comparison Example

Without Auriko (single provider):
100,000 requests × $0.01/request = $1,000/day
With Auriko cost optimization:
100,000 requests × $0.004/request = $400/day
Savings: $600/day (60%)

Cost Breakdown

Track costs by model and provider in your dashboard:
ModelOpenAIAnthropicFireworks AIAuriko (optimized)
GPT-4o$0.005/1K--$0.005/1K
Claude Sonnet-$0.003/1K$0.003/1K$0.003/1K
Auriko automatically selects the cheapest option for each model.

Best Practices

Batch Similar Requests

Group similar requests to maximize cache hits and reduce costs

Use Appropriate Models

Use smaller models for simple tasks, reserve large models for complex ones

Monitor Usage

Track costs in your dashboard to identify optimization opportunities

Set Budgets

Configure spending limits in your dashboard settings

Use Cases

Background Processing

For batch jobs where latency doesn’t matter:
# Process documents overnight at lowest cost
for doc in documents:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": f"Summarize: {doc}"}],
        routing={"optimize": "cost"}
    )
    save_summary(doc.id, response.choices[0].message.content)

With Latency Budget

For user-facing features with cost consciousness:
# Respond quickly but minimize cost
response = client.chat.completions.create(
    model="gpt-5.4",
    messages=conversation,
    routing={
        "optimize": "cost",
        "max_ttft_ms": 300  # User won't notice < 300ms
    }
)

A/B test providers

Compare costs across providers:
import random

# 10% to primary, 90% cost-optimized
if random.random() < 0.1:
    routing = {"providers": ["anthropic"]}
else:
    routing = {"optimize": "cost"}

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=messages,
    routing=routing
)

# Log for analysis
log_cost(
    provider=response.routing_metadata.provider,
    cost=response.routing_metadata.cost.billable_cost_usd
)

Dashboard

Track your cost savings in the Auriko dashboard:
  • Total spend by day/week/month
  • Cost per model
  • Cost per provider
  • Savings vs. single-provider baseline

View Dashboard

Monitor your usage and costs in real-time