Auriko’s proprietary cost model computes the expected cost (the predicted cost accounting for caching, pricing tiers, and your usage patterns) of each request at every available provider and routes to the cheapest one.Documentation Index
Fetch the complete documentation index at: https://docs.auriko.ai/llms.txt
Use this file to discover all available pages before exploring further.
Prerequisites
- An Auriko API key
- Python 3.10+ with the OpenAI SDK (
pip install openai) or the auriko SDK (pip install auriko)- OR Node.js 18+ with the OpenAI SDK (
npm install openai) or@auriko/sdk(npm install @auriko/sdk)
- OR Node.js 18+ with the OpenAI SDK (
Enable cost optimization
To route by cost, setoptimize to "cost":
Understand the cost model
Pricing page rates show what a cached token costs if it gets cached. They don’t tell you which tokens get cached or under what conditions. Two providers quoting identical rates can produce different bills on the same workload. Auriko maintains a proprietary data pipeline and cost model that tracks provider-side caching mechanics, estimates your usage patterns, and predicts the expected cost of each request at every available provider.Provider tracking
Auriko’s data pipeline tracks each provider’s caching mechanics: discount depths, minimum token thresholds, block granularity, write costs, expiration windows, and pricing tiers that shift with context length. This data updates as providers change infrastructure.Usage estimation
Auriko estimates request-level variables from your usage patterns: prefix length, reuse frequency, request timing, conversation depth, and output volume. This predicts how each provider’s caching performs for your specific traffic. Auriko is a zero data retention proxy. Pattern estimation uses usage metadata only. Read the Privacy Policy for details.Per-request cost prediction
For each request, the cost model combines provider data and usage estimates to compute the expected cost at every available provider. It routes to the cheapest one. This is a per-request decision, not a static ranking. A provider with higher list prices can be cheaper over a multi-turn conversation if its caching mechanics produce more cache hits for your workload. Cached tokens cost less than uncached tokens. Cache reads cost less than regular input, but writing to cache can cost more. The cost model accounts for these differences.Set latency constraints
To optimize for cost while enforcing a latency ceiling, addmax_ttft_ms:
Maximize savings with cost-focus
cost-focus aggressively minimizes cost with minimal weight on other factors:
| Strategy | Behavior |
|---|---|
cost | Favors cheaper providers while considering performance and latency |
cost-focus | Routes to the cheapest provider with minimal weight on other factors |
cost-focus weights cost more aggressively.
For the general base vs. focus explanation, see Base vs. focus.
Set cost ceilings
To exclude providers above a price threshold, setmax_cost_per_1m:
Restrict key source
If you have negotiated provider rates through your own API keys, force requests to use only BYOK keys for cost control:Track cost and savings
Every response includes the billable cost incost.usd. The usage breakdown shows prompt_tokens, cached_tokens, and completion_tokens.
cached_tokens from usage.prompt_tokens_details.
When cost-optimized routing triggers a failover, Auriko falls back in cost order to the next cheapest eligible provider.
The cost and savings data in each response reflect the output of Auriko’s cost model, not list-price arithmetic.
Optimize your workload
Structure your workload to maximize cost savings.- Long, stable system prompts: Maximize cache reuse across requests.
- Consistent conversation IDs: These help providers maintain cache affinity.
- Steady request cadence: Bursty traffic can defeat cache expiration windows.
- Prompt length: Prompts below provider minimum token thresholds get zero cache discount.
- Strategy choice:
cost-focusaggressively minimizes cost.costadds weight to latency and performance. - Monitor: Track cost and savings in the dashboard.
Apply to use cases
Background processing
Batch processing withcost-focus routing:
With latency budget
Cost routing with a latency constraint:Monitor costs
Track your cost savings in the Auriko dashboard:- Total spend by day/week/month
- Cost per model
- Cost per provider
- Savings vs. single-provider baseline
View Dashboard
Monitor your usage and costs in real-time