Base URL
Endpoints
Inference API
| Endpoint | Method | Description |
|---|---|---|
/v1/chat/completions | POST | Create a chat completion |
/v1/me | GET | Get API key identity |
Discovery API
| Endpoint | Method | Description |
|---|---|---|
/v1/registry/providers | GET | List available providers |
/v1/registry/models | GET | List canonical models |
/v1/directory/models | GET | Full model directory with pricing |
Management API
| Resource | Endpoints | Description |
|---|---|---|
| Workspaces | 4 | Create, list, get, update workspaces |
| Routing | 2 | Get and update workspace routing defaults |
| Budgets | 5 | CRUD for spending limits |
| API Keys | 4 | Create, list, revoke keys; usage stats |
| Billing | 1 | Credit balance |
| Provider Keys | 7 | BYOK key management |
Management API endpoints use session authentication. Workspace and budget reads also accept API keys. See Authentication.
OpenAI Compatibility
Auriko supports the same request/response format as OpenAI. Existing code using OpenAI’s API can switch to Auriko by changing:- Base URL:
https://api.openai.com/v1→https://api.auriko.ai/v1 - API Key: Use your Auriko API key (starts with
ak_)
Auriko Extensions
In addition to OpenAI-compatible fields, Auriko responses carry:routing_metadata- Information about which provider handled the requestrouting(request) - Options to optimize for cost, speed, or throughputauriko_metadata(request) - Custom tags and trace IDs for request tracking
Response Extensions
Every chat completion response carries arouting_metadata object:
Field reference
| Field | Type | Always present | Description |
|---|---|---|---|
provider | string | yes | Provider that served the request (e.g., openai, anthropic) |
provider_model_id | string | yes | Provider’s internal model ID |
tier | string | no | Pricing tier when applicable (e.g., flex, standard). Omitted for providers without tiers. |
model_canonical | string | yes | Canonical model ID from the request |
routing_strategy | string | yes | Strategy used: cost, speed, balanced, ttft, throughput, cheapest, or custom. custom is returned when explicit routing.weights are provided. |
candidates_total | number | yes | Total provider candidates before filtering |
candidates_viable | number | yes | Candidates remaining after constraint filtering |
routing_decision_ms | number | yes | Time spent on the routing decision (ms) |
ttft_ms | number | no | Time to first token (ms). Streaming responses only. |
total_latency_ms | number | yes | Total request latency (ms) |
cost | object | no | Cost breakdown. Present when token usage is available and pricing is token-based. |
fallback_chain | array | no | Fallback attempt history. Present only when the primary provider failed and a fallback was used. |
warnings | string[] | no | Warnings about ignored or unsupported routing configuration. Omitted when empty. |
cost fields: input_tokens (number), output_tokens (number), provider_cost_usd (number), billable_cost_usd (number).
fallback_chain entries: provider (string), status ("success" or "failed"), reason (string, present on failed entries only).
Request Extensions
Pass routing options to optimize your requests:Request metadata
Attach custom metadata to requests for tracking and observability. Auriko stripsauriko_metadata before forwarding to the provider. The auriko_ prefix avoids collision with OpenAI/Anthropic native metadata fields.
| Field | Type | Limits |
|---|---|---|
tags | string[] | Max 100 tags, each max 50 chars |
user_id | string | Max 255 chars |
trace_id | string | Max 255 chars |
custom_fields | Record<string, string> | Max 10 fields, keys max 50 chars, values max 200 chars |
Response headers
Every response carries custom headers organized by category.Request tracing
| Header | Description |
|---|---|
X-Request-ID | Unique request identifier for support and debugging |
Routing headers
| Header | Description |
|---|---|
X-Provider-Used | Provider that served the request |
X-Model-Requested | Model ID from the original request |
X-Model-Canonical | Canonical model ID after alias resolution |
X-Model-Used | Actual model ID sent to the provider |
X-Routing-Strategy | Strategy used (cost, speed, balanced, etc.) |
X-Routing-Time-Ms | Time spent on routing decision |
X-Api-Key-Source | Key type used (platform or byok) |
X-Multi-Model-Count | Number of models in a multi-model request |
Fallback headers
| Header | Description |
|---|---|
X-Fallback-Enabled | Whether fallback is enabled for this request |
X-Fallback-Used | Whether a fallback was triggered |
X-Fallback-Depth | Number of fallback attempts made |
X-Fallback-Original-Provider | Provider of the first (failed) attempt |
X-Fallback-Attempted-Providers | Comma-separated list of all attempted providers |
X-Fallback-Reason | Reason the primary provider failed |
X-Fallback-Total-Time-Ms | Total time across all attempts |
X-Fallback-Max-Attempts | Maximum fallback attempts configured |
Error diagnostic headers
| Header | Description |
|---|---|
X-Error-Provider | Provider that returned the error |
X-Error-Type | Error classification |
X-Error-Retryable | Whether the error is safe to retry |
Billing headers
| Header | Description |
|---|---|
X-Credits-Balance-Microdollars | Current credit balance in microdollars |
X-Credits-Tier | Current billing tier |
Budget headers
Returned when budgets are configured. Error headers appear on 402 responses when a budget is exceeded. Spend and limit headers appear on successful responses.| Header | Description |
|---|---|
X-Budget-Exceeded | true when the request was rejected for exceeding a budget (402 only) |
X-Budget-Exceeded-Period | Budget period that was exceeded: daily, weekly, or monthly (402 only) |
X-Budget-Exceeded-Scope | Scope of the exceeded budget: workspace, api_key, or byok_provider (402 only) |
X-Budget-Daily-Spend | Current daily spend in USD |
X-Budget-Daily-Limit | Configured daily budget limit in USD |
X-Budget-Weekly-Spend | Current weekly spend in USD |
X-Budget-Weekly-Limit | Configured weekly budget limit in USD |
X-Budget-Monthly-Spend | Current monthly spend in USD |
X-Budget-Monthly-Limit | Configured monthly budget limit in USD |
Cache headers
| Header | Description |
|---|---|
X-Cache-Savings-Percent | Percentage of tokens saved via prompt caching |
Authentication
Learn how to authenticate your requests