Cost Optimization

Auriko can save you 30-70% on LLM costs by intelligently routing requests to the most cost-effective provider.

Prerequisites

An Auriko API key
Python 3.10+ with auriko SDK installed (pip install auriko)
- OR Node.js 18+ with @auriko/sdk installed (npm install @auriko/sdk)
Active usage to see cost comparisons

How It Works

When you set optimize: "cost", Auriko:

Identifies all providers that can serve your model
Compares real-time pricing across providers
Routes to the cheapest available option
Falls back to alternatives if the cheapest is unavailable

Enable Cost Optimization

import os
from auriko import Client

client = Client(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello!"}],
    routing={
        "optimize": "cost"
    }
)

# See the actual cost
print(f"Cost: ${response.routing_metadata.cost.billable_cost_usd:.6f}")
print(f"Provider: {response.routing_metadata.provider}")

Cost with Latency Constraints

Optimize for cost while maintaining latency requirements:

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello!"}],
    routing={
        "optimize": "cost",
        "max_ttft_ms": 500  # Max 500ms to first token
    }
)

Auriko will find the cheapest provider that can meet the latency constraint.

Restrict key source

If you have negotiated provider rates through your own API keys, force requests to use only BYOK keys for cost control:

import os
from auriko import Client

client = Client(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello!"}],
    routing={
        "optimize": "cost",
        "only_byok": True  # Use only your own provider keys
    }
)

See Routing options for the full constraint API and Bring Your Own Key for BYOK setup.

View Your Costs

Every response includes detailed cost information:

cost = response.routing_metadata.cost
print(f"Input tokens: {cost.input_tokens}")
print(f"Output tokens: {cost.output_tokens}")
print(f"Total cost: ${cost.billable_cost_usd:.6f}")

Cost Comparison Example

Without Auriko (single provider):

100,000 requests × $0.01/request = $1,000/day

With Auriko cost optimization:

100,000 requests × $0.004/request = $400/day
Savings: $600/day (60%)

Cost Breakdown

Track costs by model and provider in your dashboard:

Model	OpenAI	Anthropic	Fireworks AI	Auriko (optimized)
GPT-4o	$0.005/1K	-	-	$0.005/1K
Claude Sonnet	-	$0.003/1K	$0.003/1K	$0.003/1K

Auriko automatically selects the cheapest option for each model.

Best Practices

Batch Similar Requests

Group similar requests to maximize cache hits and reduce costs

Use Appropriate Models

Use smaller models for simple tasks, reserve large models for complex ones

Monitor Usage

Track costs in your dashboard to identify optimization opportunities

Set Budgets

Configure spending limits in your dashboard settings

Use Cases

Background Processing

For batch jobs where latency doesn’t matter:

# Process documents overnight at lowest cost
for doc in documents:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": f"Summarize: {doc}"}],
        routing={"optimize": "cost"}
    )
    save_summary(doc.id, response.choices[0].message.content)

With Latency Budget

For user-facing features with cost consciousness:

# Respond quickly but minimize cost
response = client.chat.completions.create(
    model="gpt-5.4",
    messages=conversation,
    routing={
        "optimize": "cost",
        "max_ttft_ms": 300  # User won't notice < 300ms
    }
)

A/B test providers

Compare costs across providers:

import random

# 10% to primary, 90% cost-optimized
if random.random() < 0.1:
    routing = {"providers": ["anthropic"]}
else:
    routing = {"optimize": "cost"}

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=messages,
    routing=routing
)

# Log for analysis
log_cost(
    provider=response.routing_metadata.provider,
    cost=response.routing_metadata.cost.billable_cost_usd
)

Dashboard

Track your cost savings in the Auriko dashboard:

Total spend by day/week/month
Cost per model
Cost per provider
Savings vs. single-provider baseline

View Dashboard

Monitor your usage and costs in real-time

Get Started

Guides

Frameworks

Cost Optimization

Prerequisites

How It Works

Enable Cost Optimization

Cost with Latency Constraints

Restrict key source

View Your Costs

Cost Comparison Example

Cost Breakdown

Best Practices

Batch Similar Requests

Use Appropriate Models

Monitor Usage

Set Budgets

Use Cases

Background Processing

With Latency Budget

A/B test providers

Dashboard

View Dashboard

Get Started

Guides

Frameworks

​Prerequisites

​How It Works

​Enable Cost Optimization

​Cost with Latency Constraints

​Restrict key source

​View Your Costs

​Cost Comparison Example

​Cost Breakdown

​Best Practices

Batch Similar Requests

Use Appropriate Models

Monitor Usage

Set Budgets

​Use Cases

​Background Processing

​With Latency Budget

​A/B test providers

​Dashboard

View Dashboard

Prerequisites

How It Works

Enable Cost Optimization

Cost with Latency Constraints

Restrict key source

View Your Costs

Cost Comparison Example

Cost Breakdown

Best Practices

Use Cases

Background Processing

With Latency Budget

A/B test providers

Dashboard