# AUTO-GENERATED — DO NOT EDIT MANUALLY
# Source: scripts/docs/generate_llms_txt.py
# Regenerated on every deploy

# Auriko

> Intelligent LLM routing API with OpenAI-compatible interface. Route requests
> across multiple AI providers to optimize for cost, latency, and reliability.

## Sections in this document
1. Guides — streaming, tool calling, routing, cost optimization, error handling, prompt caching, budget management, advanced routing
2. API Reference — endpoints, parameters, schemas, error codes
3. Python SDK Reference
4. TypeScript SDK Reference
5. Framework Integrations — LangChain, CrewAI, OpenAI Agents SDK, Google ADK, LlamaIndex
6. Platform — rate limits, team management, BYOK

===

# Auriko Guides

## Page: Streaming

Stream responses in real-time for a better user experience. Auriko supports Server-Sent Events (SSE) for streaming.

---

## Page: Streaming > Section: Prerequisites

- An [Auriko API key](https://auriko.ai/signup)
- Python 3.10+ with `auriko` SDK installed (`pip install auriko`)
  - OR Node.js 18+ with `@auriko/sdk` installed (`npm install @auriko/sdk`)

---

## Page: Streaming > Section: Stream responses

Stream a chat completion response:

```python Python
import os
from auriko import Client

client = Client(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a short story"}],
    stream=True
)

for chunk in stream:
    if chunk.choices and chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
```

```typescript TypeScript
import { Client } from "@auriko/sdk";

const client = new Client({
    apiKey: process.env.AURIKO_API_KEY,
    baseUrl: "https://api.auriko.ai/v1",
});

const stream = await client.chat.completions.create({
    model: "gpt-4o",
    messages: [{ role: "user", content: "Write a short story" }],
    stream: true,
});

for await (const chunk of stream) {
    if (chunk.choices[0]?.delta?.content) {
        process.stdout.write(chunk.choices[0].delta.content);
    }
}
```

```bash cURL
curl https://api.auriko.ai/v1/chat/completions \
  -H "Authorization: Bearer $AURIKO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Write a short story"}],
    "stream": true
  }'
```

---

## Page: Streaming > Section: Stream asynchronously (Python)

Stream with the async client:

```python
import os
from auriko import AsyncClient
import asyncio

async def stream_response():
    client = AsyncClient(
        api_key=os.environ["AURIKO_API_KEY"],
        base_url="https://api.auriko.ai/v1"
    )

    stream = await client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Write a short story"}],
        stream=True
    )
    
    async for chunk in stream:
        if chunk.choices and chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="", flush=True)

asyncio.run(stream_response())
```

---

## Page: Streaming > Section: Stream events

Each chunk contains:

```python
# ChatCompletionChunk
chunk.id           # "chatcmpl-abc123"
chunk.model        # "gpt-4o"
chunk.created      # 1234567890
chunk.choices[0].delta.content  # Token content (may be None)
chunk.choices[0].delta.role     # "assistant" (first chunk only)
chunk.choices[0].finish_reason  # None until last chunk ("stop")
```

---

## Page: Streaming > Section: Handle final chunks

The last chunk carries `finish_reason` and usage. Auriko forces `include_usage: true` on all streaming requests. You don't need to set `stream_options` manually.

```python
stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True
)

full_content = ""
usage = None

for chunk in stream:
    if chunk.choices:
        if chunk.choices[0].delta.content:
            full_content += chunk.choices[0].delta.content
        if chunk.choices[0].finish_reason:
            print(f"\n\nFinished: {chunk.choices[0].finish_reason}")
    if chunk.usage:
        usage = chunk.usage

if usage:
    print(f"Tokens used: {usage.total_tokens}")
```

  Auriko forces `stream_options.include_usage` to `true` for accurate billing. Setting it explicitly is harmless but unnecessary.

---

## Page: Streaming > Section: Stream properties

The stream object exposes usage, routing metadata, and response headers after iteration completes.

| Property | Python | TypeScript | Available |
|----------|--------|------------|-----------|
| Token usage | `stream.usage` | `stream.usage` | After iteration |
| Routing info | `stream.routing_metadata` | `stream.routing_metadata` | After iteration |
| Response headers | `stream.response_headers` | `stream.responseHeaders` | Immediately |
| Close connection | `stream.close()` | `stream.close()` | Any time |

Use the stream as a context manager to ensure the connection is released:

```python Python
with client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True
) as stream:
    for chunk in stream:
        if chunk.choices and chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="", flush=True)

# Available after iteration
if stream.usage:
    print(f"Tokens: {stream.usage.total_tokens}")
if stream.routing_metadata:
    print(f"Provider: {stream.routing_metadata.provider}")
```

```typescript TypeScript
const stream = await client.chat.completions.create({
    model: "gpt-4o",
    messages: [{ role: "user", content: "Hello!" }],
    stream: true,
});

for await (const chunk of stream) {
    if (chunk.choices[0]?.delta?.content) {
        process.stdout.write(chunk.choices[0].delta.content);
    }
}

// Available after iteration
console.log(`Tokens: ${stream.usage?.total_tokens}`);
console.log(`Provider: ${stream.routing_metadata?.provider}`);
```

Use an async context manager for automatic cleanup:

```python
async with await client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True
) as stream:
    async for chunk in stream:
        if chunk.choices and chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="", flush=True)
```

  `routing_metadata` and `usage` are only present in the **final chunk** (with `choices: []`). Consume the stream to completion to access them.

In TypeScript, you can only iterate a stream once. A second attempt throws an error.

---

## Page: Streaming > Section: Stream with tools

Accumulate tool call fragments from a streamed response:

```python
stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What's the weather in Paris?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get weather",
            "parameters": {"type": "object", "properties": {"city": {"type": "string"}}}
        }
    }],
    stream=True
)

tool_calls = []
for chunk in stream:
    if not chunk.choices:
        continue
    delta = chunk.choices[0].delta
    
    # Handle tool call streaming
    if delta.tool_calls:
        for tc in delta.tool_calls:
            if tc.index >= len(tool_calls):
                tool_calls.append({"id": tc.id, "function": {"name": "", "arguments": ""}})
            if tc.function and tc.function.name:
                tool_calls[tc.index]["function"]["name"] += tc.function.name
            if tc.function and tc.function.arguments:
                tool_calls[tc.index]["function"]["arguments"] += tc.function.arguments

print(tool_calls)
```

See [Tool Calling Guide](/guides/tool-calling) for function definitions and multi-turn tool conversations.

---

## Page: Streaming > Section: Stream with routing options

Pass routing options to a streaming request:

```python Python
stream = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True,
    routing={
        "optimize": "speed",
        "max_ttft_ms": 100,
    }
)

for chunk in stream:
    if chunk.choices and chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
```

```typescript TypeScript
const stream = await client.chat.completions.create({
    model: "gpt-5.4",
    messages: [{ role: "user", content: "Hello!" }],
    stream: true,
    routing: {
        optimize: "speed",
        max_ttft_ms: 100,
    },
});

for await (const chunk of stream) {
    if (chunk.choices[0]?.delta?.content) {
        process.stdout.write(chunk.choices[0].delta.content);
    }
}
```

---

## Page: Streaming > Section: Handle stream errors

Catch errors during streaming:

```python
import os
from auriko import Client, ProviderError, RateLimitError

client = Client(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

try:
    stream = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}],
        stream=True
    )
    
    for chunk in stream:
        if chunk.choices and chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="", flush=True)

except ProviderError as e:
    print(f"Provider error: {e}")
except RateLimitError as e:
    print(f"Rate limited: {e}")
```

See [Error Handling Guide](/guides/error-handling) for retry strategies and circuit breakers.

---

## Page: Streaming > Section: SSE format

Raw SSE events look like this. Auriko appends a final event with `routing_metadata` and `usage` before `[DONE]`.

```
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1234567890,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1234567890,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1234567890,"model":"gpt-4o","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1234567890,"model":"gpt-4o","choices":[],"usage":{"prompt_tokens":8,"completion_tokens":2,"total_tokens":10},"routing_metadata":{"provider":"openai","routing_strategy":"balanced","total_latency_ms":847,"cost":{"billable_cost_usd":0.00015}}}

data: [DONE]
```

  The final event before `[DONE]` carries `routing_metadata` and `usage` with `choices: []`. SDKs expose these as `stream.routing_metadata` and `stream.usage` after iteration.

---

## Page: Tool Calling

Let LLMs call your functions to interact with external systems, databases, and APIs.

---

## Page: Tool Calling > Section: Prerequisites

- An [Auriko API key](https://auriko.ai/signup)
- Python 3.10+ with `auriko` SDK installed (`pip install auriko`)
  - OR Node.js 18+ with `@auriko/sdk` installed (`npm install @auriko/sdk`)
- A model that supports tool calling (e.g., GPT-4o, Claude 3.5 Sonnet)

---

## Page: Tool Calling > Section: Define tools

Define tools as JSON schemas describing the function signature:

```python
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "The city name"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature unit"
                    }
                },
                "required": ["city"]
            }
        }
    }
]
```

---

## Page: Tool Calling > Section: Call tools

Send a request with tools and check the response:

```python Python
import os
from auriko import Client
import json

client = Client(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string"}
                },
                "required": ["city"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What's the weather in Paris?"}],
    tools=tools
)

if response.choices[0].message.tool_calls:
    tool_call = response.choices[0].message.tool_calls[0]
    print(f"Function: {tool_call.function.name}")
    print(f"Arguments: {json.loads(tool_call.function.arguments)}")
```

```typescript TypeScript
import { Client } from "@auriko/sdk";

const client = new Client({
    apiKey: process.env.AURIKO_API_KEY,
    baseUrl: "https://api.auriko.ai/v1",
});

const tools = [
    {
        type: "function" as const,
        function: {
            name: "get_weather",
            description: "Get weather for a city",
            parameters: {
                type: "object",
                properties: {
                    city: { type: "string" },
                },
                required: ["city"],
            },
        },
    },
];

const response = await client.chat.completions.create({
    model: "gpt-4o",
    messages: [{ role: "user", content: "What's the weather in Paris?" }],
    tools,
});

if (response.choices[0].message.tool_calls) {
    const toolCall = response.choices[0].message.tool_calls[0];
    console.log(`Function: ${toolCall.function.name}`);
    console.log(`Arguments: ${JSON.parse(toolCall.function.arguments)}`);
}
```

---

## Page: Tool Calling > Section: Execute tool calls

After receiving tool calls, execute them and send the results back:

```python
import json

def get_weather(city: str) -> str:
    # Your actual implementation here
    return f"Weather in {city}: 72°F, sunny"

# Step 1: Initial request
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What's the weather in Paris?"}],
    tools=tools
)

# Step 2: Check for tool calls
message = response.choices[0].message
if message.tool_calls:
    # Build message history
    messages = [
        {"role": "user", "content": "What's the weather in Paris?"},
        message.model_dump(),  # Assistant message with tool_calls
    ]
    
    # Execute each tool call
    for tool_call in message.tool_calls:
        args = json.loads(tool_call.function.arguments)
        
        if tool_call.function.name == "get_weather":
            result = get_weather(args["city"])
        
        # Add tool result
        messages.append({
            "role": "tool",
            "tool_call_id": tool_call.id,
            "content": result
        })
    
    # Step 3: Get final response
    final_response = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        tools=tools
    )
    
    print(final_response.choices[0].message.content)
```

---

## Page: Tool Calling > Section: Use multiple tools

Define multiple tools in the same request:

```python
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get weather for a city",
            "parameters": {
                "type": "object",
                "properties": {"city": {"type": "string"}},
                "required": ["city"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "search_web",
            "description": "Search the web for information",
            "parameters": {
                "type": "object",
                "properties": {"query": {"type": "string"}},
                "required": ["query"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "send_email",
            "description": "Send an email",
            "parameters": {
                "type": "object",
                "properties": {
                    "to": {"type": "string"},
                    "subject": {"type": "string"},
                    "body": {"type": "string"}
                },
                "required": ["to", "subject", "body"]
            }
        }
    }
]
```

---

## Page: Tool Calling > Section: Use parallel tool calls

Models can request multiple tool calls in parallel:

```python
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{
        "role": "user",
        "content": "What's the weather in Paris and Tokyo?"
    }],
    tools=tools
)

# May return two tool calls
if response.choices[0].message.tool_calls:
    for tool_call in response.choices[0].message.tool_calls:
        print(f"{tool_call.function.name}: {tool_call.function.arguments}")
```

---

## Page: Tool Calling > Section: Control tool choice

Control which tools the model can use:

```python
# Let model decide
response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools,
    tool_choice="auto"  # default
)

# Force tool use
response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools,
    tool_choice="required"
)

# Force specific tool
response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools,
    tool_choice={"type": "function", "function": {"name": "get_weather"}}
)

# Disable tools
response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools,
    tool_choice="none"
)
```

---

## Page: Tool Calling > Section: Stream tool calls

Accumulate streamed tool call fragments:

```python
stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What's the weather in Paris?"}],
    tools=tools,
    stream=True
)

tool_calls = {}
for chunk in stream:
    if not chunk.choices:
        continue
    delta = chunk.choices[0].delta
    
    if delta.tool_calls:
        for tc in delta.tool_calls:
            idx = tc.index
            if idx not in tool_calls:
                tool_calls[idx] = {"id": tc.id, "function": {"name": "", "arguments": ""}}
            if tc.function and tc.function.name:
                tool_calls[idx]["function"]["name"] += tc.function.name
            if tc.function and tc.function.arguments:
                tool_calls[idx]["function"]["arguments"] += tc.function.arguments

print(list(tool_calls.values()))
```

See [Streaming Guide](/guides/streaming#stream-with-tools) for full streaming patterns including error handling and metadata access.

---

## Page: Tool Calling > Section: Convert legacy functions

Auriko auto-converts the deprecated `functions`/`function_call` parameters to the modern `tools`/`tool_choice` format:

| Legacy parameter | Converted to | Condition |
|-----------------|-------------|-----------|
| `functions` | `tools` | Only if `tools` is absent |
| `function_call: "auto"` | `tool_choice: "auto"` | Only if `tool_choice` is absent |
| `function_call: "none"` | `tool_choice: "none"` | Only if `tool_choice` is absent |
| `function_call: {name: "fn"}` | `tool_choice: {type: "function", function: {name: "fn"}}` | Only if `tool_choice` is absent |

Conversion only runs when the legacy field is present and the modern field is absent. If both are present, the modern field takes precedence.

Use `tools`/`tool_choice` for new code. Auriko supports the legacy format for backward compatibility.

  **Provider compatibility:** Most major providers support function calling (`tools` / `tool_choice`), but subfeatures such as `parallel_tool_calls` vary by provider. Auriko filters out providers that don't support tool calling at all, but doesn't guarantee every provider-specific tool subfeature. Check `/v1/directory/models` for current capability details.

---

## Page: Tool Calling > Section: Best practices

Write clear, specific function descriptions so the model knows when to use them.
  
  
    Always validate tool call arguments before executing.
  
  
    Return helpful error messages in tool results when execution fails.
  
  
    Only include relevant tools to reduce confusion and latency.

---

## Page: Structured Output

You can force models to return valid JSON, optionally conforming to a specific schema.

---

## Page: Structured Output > Section: Prerequisites

- An [Auriko API key](https://auriko.ai/signup)
- Python 3.10+ with `auriko` SDK installed (`pip install auriko`)
  - OR Node.js 18+ with `@auriko/sdk` installed (`npm install @auriko/sdk`)

---

## Page: Structured Output > Section: Choose a response format

Auriko supports three response format types:

| Type | Description | Use case |
|------|-------------|----------|
| `text` | Default. Model returns plain text. | General chat, creative writing |
| `json_object` | Model returns valid JSON. No schema enforcement. | Flexible JSON extraction |
| `json_schema` | Model returns JSON matching a provided schema. | Typed data extraction, API responses |

`json_schema` and `json_object` are separate capabilities. `json_schema` has broader model support. **Claude** supports `json_schema` but not `json_object`.

If you request an unsupported mode, Auriko returns a `503` with a suggested alternative.

Check per-model support on the [models page](https://optimal-inference.vercel.app/models) or via the [Directory API](/api-reference/list-directory-models). `json_schema` appears as **Structured Output**, `json_object` as **JSON Mode**.

---

## Page: Structured Output > Section: Return JSON

To return any JSON output, set `response_format` to `json_object`:

```python Python
import os
from auriko import Client

client = Client(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "Extract the user's name and age as JSON."},
        {"role": "user", "content": "I'm Alice and I'm 30 years old."}
    ],
    response_format={"type": "json_object"}
)

print(response.choices[0].message.content)
# {"name": "Alice", "age": 30}
```

```typescript TypeScript
import { Client } from "@auriko/sdk";

const client = new Client({
    apiKey: process.env.AURIKO_API_KEY,
    baseUrl: "https://api.auriko.ai/v1",
});

const response = await client.chat.completions.create({
    model: "gpt-4o",
    messages: [
        { role: "system", content: "Extract the user's name and age as JSON." },
        { role: "user", content: "I'm Alice and I'm 30 years old." },
    ],
    response_format: { type: "json_object" },
});

console.log(response.choices[0].message.content);
// {"name": "Alice", "age": 30}
```

The model returns valid JSON, but the structure isn't guaranteed. For strict schema conformance, use `json_schema` instead.

When using `json_object` mode, always include the word "JSON" in your system or user message. **OpenAI** and **DeepSeek** require this and return a 400 error without it. Including it is harmless on other providers. The `json_schema` mode does not have this requirement.

The examples above include "JSON" in the system message.

---

## Page: Structured Output > Section: Enforce schema

You can enforce a specific JSON structure by providing a schema:

```python Python
import os
import json
from auriko import Client

client = Client(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "user", "content": "Extract: Alice is 30, lives in NYC, alice@example.com"}
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "ContactInfo",
            "schema": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "age": {"type": "integer"},
                    "city": {"type": "string"},
                    "email": {"type": "string"}
                },
                "required": ["name", "age", "city", "email"]
            }
        }
    }
)

contact = json.loads(response.choices[0].message.content)
print(contact["name"])   # Alice
print(contact["email"])  # alice@example.com
```

```typescript TypeScript
import { Client } from "@auriko/sdk";

const client = new Client({
    apiKey: process.env.AURIKO_API_KEY,
    baseUrl: "https://api.auriko.ai/v1",
});

const response = await client.chat.completions.create({
    model: "gpt-4o",
    messages: [
        { role: "user", content: "Extract: Alice is 30, lives in NYC, alice@example.com" },
    ],
    response_format: {
        type: "json_schema",
        json_schema: {
            name: "ContactInfo",
            schema: {
                type: "object",
                properties: {
                    name: { type: "string" },
                    age: { type: "integer" },
                    city: { type: "string" },
                    email: { type: "string" },
                },
                required: ["name", "age", "city", "email"],
            },
        },
    },
});

const contact = JSON.parse(response.choices[0].message.content!);
console.log(contact.name);   // Alice
console.log(contact.email);  // alice@example.com
```

The `json_schema` object requires a `name` field. The `schema` field accepts a standard JSON Schema definition.

Auriko automatically routes to providers that support your requested format. If no provider supports it, you get a clear error with suggestions.

---

## Page: Structured Output > Section: Use OpenAI SDK

You can use the standard OpenAI SDK pointed at Auriko:

```python Python
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "user", "content": "Extract: Bob is 25, lives in London, bob@example.com"}
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "ContactInfo",
            "schema": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "age": {"type": "integer"},
                    "city": {"type": "string"},
                    "email": {"type": "string"}
                },
                "required": ["name", "age", "city", "email"]
            }
        }
    }
)

print(response.choices[0].message.content)
# {"name": "Bob", "age": 25, "city": "London", "email": "bob@example.com"}
```

```typescript TypeScript
import OpenAI from "openai";

const client = new OpenAI({
    apiKey: process.env.AURIKO_API_KEY,
    baseURL: "https://api.auriko.ai/v1",
});

const response = await client.chat.completions.create({
    model: "gpt-4o",
    messages: [
        { role: "user", content: "Extract: Bob is 25, lives in London, bob@example.com" },
    ],
    response_format: {
        type: "json_schema",
        json_schema: {
            name: "ContactInfo",
            schema: {
                type: "object",
                properties: {
                    name: { type: "string" },
                    age: { type: "integer" },
                    city: { type: "string" },
                    email: { type: "string" },
                },
                required: ["name", "age", "city", "email"],
            },
        },
    },
});

console.log(response.choices[0].message.content);
// {"name": "Bob", "age": 25, "city": "London", "email": "bob@example.com"}
```

The same `response_format` field works with both the Auriko SDK and the OpenAI SDK.

---

## Page: Structured Output > Section: Resources

Call functions from LLM responses
  
  
    Optimize for cost, speed, or throughput
  
  
    Handle errors and retries
  
  
    See which models support structured output and JSON mode

---

## Page: Routing Options

Auriko intelligently routes your requests across multiple providers. Use routing options to optimize for your specific needs.

---

## Page: Routing Options > Section: Prerequisites

- An [Auriko API key](https://auriko.ai/signup)
- Python 3.10+ with `auriko` SDK installed (`pip install auriko`)
  - OR Node.js 18+ with `@auriko/sdk` installed (`npm install @auriko/sdk`)

---

## Page: Routing Options > Section: Overview

Auriko supports six optimization strategies:

| Strategy | Description | Best For |
|----------|-------------|----------|
| `cost` | Route to cheapest provider | Batch processing, non-urgent tasks |
| `cheapest` | Absolute lowest cost | Maximum cost savings, no latency requirements |
| `speed` | Minimize latency, maximize throughput | Real-time applications, chatbots |
| `ttft` | Minimize time to first token | Streaming UX, interactive apps |
| `throughput` | Maximize tokens per second | High-volume processing |
| `balanced` (default) | Weighted combination | General-purpose, mixed workloads |

---

## Page: Routing Options > Section: Cost Optimization

Minimize your LLM costs:

```python Python
response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello!"}],
    routing={
        "optimize": "cost"
    }
)

# See which provider was used and the cost
print(f"Provider: {response.routing_metadata.provider}")
print(f"Cost: ${response.routing_metadata.cost.billable_cost_usd:.6f}")
```

```typescript TypeScript
const response = await client.chat.completions.create({
    model: "gpt-5.4",
    messages: [{ role: "user", content: "Hello!" }],
    routing: {
        optimize: "cost",
    },
});

console.log(`Provider: ${response.routing_metadata?.provider}`);
```

---

## Page: Routing Options > Section: Latency Optimization

Get the fastest response:

```python Python
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Quick answer: 2+2?"}],
    routing={
        "optimize": "speed"
    }
)

print(f"Latency: {response.routing_metadata.total_latency_ms}ms")
```

```typescript TypeScript
const response = await client.chat.completions.create({
    model: "gpt-4o",
    messages: [{ role: "user", content: "Quick answer: 2+2?" }],
    routing: {
        optimize: "speed",
    },
});

console.log(`Latency: ${response.routing_metadata?.total_latency_ms}ms`);
```

---

## Page: Routing Options > Section: Latency Constraints

Set maximum time-to-first-token (TTFT):

```python Python
response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello!"}],
    routing={
        "optimize": "cost",
        "max_ttft_ms": 200  # Must start responding within 200ms
    }
)
```

```typescript TypeScript
const response = await client.chat.completions.create({
    model: "gpt-5.4",
    messages: [{ role: "user", content: "Hello!" }],
    routing: {
        optimize: "cost",
        max_ttft_ms: 200, // Must start responding within 200ms
    },
});
```

If no provider can meet the latency constraint, Auriko returns a 503 error.

---

## Page: Routing Options > Section: Set a cost ceiling

Exclude providers that exceed a per-1M-token budget:

```python Python
import os
from auriko import Client

client = Client(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello!"}],
    routing={
        "optimize": "cost",
        "max_cost_per_1m": 5.00  # Max $5.00 per 1M tokens (average of input + output)
    }
)
```

```typescript TypeScript
const response = await client.chat.completions.create({
    model: "gpt-5.4",
    messages: [{ role: "user", content: "Hello!" }],
    routing: {
        optimize: "cost",
        max_cost_per_1m: 5.0, // Max $5.00 per 1M tokens (average of input + output)
    },
});
```

Auriko calculates cost as the average of input and output price per 1M tokens. Providers exceeding this ceiling are excluded from routing. For fine-grained quality and cost constraints, see [Advanced routing](/guides/advanced-routing).

---

## Page: Routing Options > Section: Provider Preferences

Prefer or exclude specific providers:

```python Python
# Only consider these providers
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    routing={
        "providers": ["openai", "anthropic"]
    }
)

# Exclude providers
response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello!"}],
    routing={
        "exclude_providers": ["deepseek"]
    }
)
```

```typescript TypeScript
// Only consider these providers
const response = await client.chat.completions.create({
    model: "gpt-4o",
    messages: [{ role: "user", content: "Hello!" }],
    routing: {
        providers: ["openai", "anthropic"],
    },
});

// Exclude providers
const response2 = await client.chat.completions.create({
    model: "gpt-5.4",
    messages: [{ role: "user", content: "Hello!" }],
    routing: {
        exclude_providers: ["deepseek"],
    },
});
```

---

## Page: Routing Options > Section: Restrict key source

Force requests to use only BYOK (bring-your-own-key) or only platform-managed keys:

```python Python
import os
from auriko import Client

client = Client(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

# Use only your own provider keys
response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello!"}],
    routing={
        "only_byok": True
    }
)

# Use only Auriko platform keys
response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello!"}],
    routing={
        "only_platform": True
    }
)
```

```typescript TypeScript
// Use only your own provider keys
const response = await client.chat.completions.create({
    model: "gpt-5.4",
    messages: [{ role: "user", content: "Hello!" }],
    routing: {
        only_byok: true,
    },
});

// Use only Auriko platform keys
const response2 = await client.chat.completions.create({
    model: "gpt-5.4",
    messages: [{ role: "user", content: "Hello!" }],
    routing: {
        only_platform: true,
    },
});
```

Both are booleans, default `false`. Setting both to `true` returns a 400 error — they are mutually exclusive. When no key of the requested type is available, the request fails with no fallback. See [Bring Your Own Key](/platform/byok) for BYOK setup.

---

## Page: Routing Options > Section: Routing Metadata

Every response carries routing information:

```python Python
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

metadata = response.routing_metadata
print(f"Provider: {metadata.provider}")
print(f"Model: {metadata.provider_model_id}")
print(f"Latency: {metadata.total_latency_ms}ms")
print(f"Input tokens: {metadata.cost.input_tokens}")
print(f"Output tokens: {metadata.cost.output_tokens}")
print(f"Cost: ${metadata.cost.billable_cost_usd:.6f}")
```

```typescript TypeScript
const response = await client.chat.completions.create({
    model: "gpt-4o",
    messages: [{ role: "user", content: "Hello!" }],
});

const metadata = response.routing_metadata;
console.log(`Provider: ${metadata?.provider}`);
console.log(`Model: ${metadata?.providerModelId}`);
console.log(`Latency: ${metadata?.total_latency_ms}ms`);
console.log(`Input tokens: ${metadata?.cost?.input_tokens}`);
console.log(`Output tokens: ${metadata?.cost?.output_tokens}`);
console.log(`Cost: $${metadata?.cost?.billable_cost_usd}`);
```

For the complete field reference including fallback chain, warnings, and all optional fields, see [Response Extensions](/api-reference/overview#response-extensions).

---

## Page: Routing Options > Section: Full Example

```python Python
import os
from auriko import Client

client = Client(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

# For a chatbot: optimize speed with cost ceiling
response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What's the capital of France?"}
    ],
    routing={
        "optimize": "speed",
        "max_ttft_ms": 150,
    }
)

print(response.choices[0].message.content)
print(f"\n--- Routing Info ---")
print(f"Provider: {response.routing_metadata.provider}")
print(f"Latency: {response.routing_metadata.total_latency_ms}ms")
print(f"Cost: ${response.routing_metadata.cost.billable_cost_usd:.6f}")
```

```typescript TypeScript
import { Client } from "@auriko/sdk";

const client = new Client({
    apiKey: process.env.AURIKO_API_KEY,
    baseUrl: "https://api.auriko.ai/v1",
});

const response = await client.chat.completions.create({
    model: "gpt-5.4",
    messages: [
        { role: "system", content: "You are a helpful assistant." },
        { role: "user", content: "What's the capital of France?" },
    ],
    routing: {
        optimize: "speed",
        max_ttft_ms: 150,
    },
});

console.log(response.choices[0].message.content);
console.log(`\n--- Routing Info ---`);
console.log(`Provider: ${response.routing_metadata?.provider}`);
console.log(`Latency: ${response.routing_metadata?.total_latency_ms}ms`);
console.log(`Cost: $${response.routing_metadata?.cost?.billable_cost_usd}`);
```

---

## Page: Routing Options > Section: OpenAI SDK Compatibility

Using the OpenAI SDK, pass routing options via `extra_body`:

```python
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello!"}],
    extra_body={
        "routing": {
            "optimize": "cost",
            "max_ttft_ms": 200
        }
    }
)
```

---

## Page: Routing Options > Section: Choose a strategy

Match your use case to the right routing strategy:

| Use case | Strategy | Key constraints | Example |
|----------|----------|-----------------|---------|
| Chatbot / real-time UI | `speed` or `ttft` | `max_ttft_ms: 200` | Interactive conversation |
| Batch processing | `cost` or `cheapest` | — | Document summarization |
| High-volume pipeline | `throughput` | `min_throughput_tps: 50` | Log analysis |
| Cost-conscious real-time | `cost` | `max_ttft_ms: 500` | Customer support |
| Compliance-sensitive | `balanced` | `data_policy: "zdr"` | Financial data |
| Multi-model exploration | `balanced` | `models: [...]` | A/B testing |

  Start `max_ttft_ms` at 200-500ms and adjust — setting it too low causes 503 errors when no provider meets the constraint.

For fine-grained control, see [Advanced routing](/guides/advanced-routing).

---

## Page: Cost Optimization

Auriko can save you 30-70% on LLM costs by intelligently routing requests to the most cost-effective provider.

---

## Page: Cost Optimization > Section: Prerequisites

- An [Auriko API key](https://auriko.ai/signup)
- Python 3.10+ with `auriko` SDK installed (`pip install auriko`)
  - OR Node.js 18+ with `@auriko/sdk` installed (`npm install @auriko/sdk`)
- Active usage to see cost comparisons

---

## Page: Cost Optimization > Section: How It Works

When you set `optimize: "cost"`, Auriko:

1. Identifies all providers that can serve your model
2. Compares real-time pricing across providers
3. Routes to the cheapest available option
4. Falls back to alternatives if the cheapest is unavailable

---

## Page: Cost Optimization > Section: Enable Cost Optimization

```python Python
import os
from auriko import Client

client = Client(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello!"}],
    routing={
        "optimize": "cost"
    }
)

# See the actual cost
print(f"Cost: ${response.routing_metadata.cost.billable_cost_usd:.6f}")
print(f"Provider: {response.routing_metadata.provider}")
```

```typescript TypeScript
import { Client } from "@auriko/sdk";

const client = new Client({
    apiKey: process.env.AURIKO_API_KEY,
    baseUrl: "https://api.auriko.ai/v1",
});

const response = await client.chat.completions.create({
    model: "gpt-5.4",
    messages: [{ role: "user", content: "Hello!" }],
    routing: {
        optimize: "cost",
    },
});

console.log(`Provider: ${response.routing_metadata?.provider}`);
```

---

## Page: Cost Optimization > Section: Cost with Latency Constraints

Optimize for cost while maintaining latency requirements:

```python Python
response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello!"}],
    routing={
        "optimize": "cost",
        "max_ttft_ms": 500  # Max 500ms to first token
    }
)
```

```typescript TypeScript
const response = await client.chat.completions.create({
    model: "gpt-5.4",
    messages: [{ role: "user", content: "Hello!" }],
    routing: {
        optimize: "cost",
        max_ttft_ms: 500, // Max 500ms to first token
    },
});
```

Auriko will find the cheapest provider that can meet the latency constraint.

---

## Page: Cost Optimization > Section: Restrict key source

If you have negotiated provider rates through your own API keys, force requests to use only BYOK keys for cost control:

```python Python
import os
from auriko import Client

client = Client(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello!"}],
    routing={
        "optimize": "cost",
        "only_byok": True  # Use only your own provider keys
    }
)
```

```typescript TypeScript
const response = await client.chat.completions.create({
    model: "gpt-5.4",
    messages: [{ role: "user", content: "Hello!" }],
    routing: {
        optimize: "cost",
        only_byok: true, // Use only your own provider keys
    },
});
```

See [Routing options](/guides/routing-options#restrict-key-source) for the full constraint API and [Bring Your Own Key](/platform/byok) for BYOK setup.

---

## Page: Cost Optimization > Section: View Your Costs

Every response includes detailed cost information:

```python Python
cost = response.routing_metadata.cost
print(f"Input tokens: {cost.input_tokens}")
print(f"Output tokens: {cost.output_tokens}")
print(f"Total cost: ${cost.billable_cost_usd:.6f}")
```

```typescript TypeScript
const cost = response.routing_metadata?.cost;
console.log(`Input tokens: ${cost?.input_tokens}`);
console.log(`Output tokens: ${cost?.output_tokens}`);
console.log(`Total cost: $${cost?.billable_cost_usd}`);
```

---

## Page: Cost Optimization > Section: Cost Comparison Example

Without Auriko (single provider):
```
100,000 requests × $0.01/request = $1,000/day
```

With Auriko cost optimization:
```
100,000 requests × $0.004/request = $400/day
Savings: $600/day (60%)
```

---

## Page: Cost Optimization > Section: Cost Breakdown

Track costs by model and provider in your dashboard:

| Model | OpenAI | Anthropic | Fireworks AI | **Auriko (optimized)** |
|-------|--------|-----------|--------------|------------------------|
| GPT-4o | $0.005/1K | - | - | **$0.005/1K** |
| Claude Sonnet | - | $0.003/1K | $0.003/1K | **$0.003/1K** |

Auriko automatically selects the cheapest option for each model.

---

## Page: Cost Optimization > Section: Best Practices

Group similar requests to maximize cache hits and reduce costs
  
  
    Use smaller models for simple tasks, reserve large models for complex ones
  
  
    Track costs in your dashboard to identify optimization opportunities
  
  
    Configure spending limits in your dashboard settings

---

## Page: Cost Optimization > Section: Use Cases

### Background Processing

For batch jobs where latency doesn't matter:

```python
# Process documents overnight at lowest cost
for doc in documents:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": f"Summarize: {doc}"}],
        routing={"optimize": "cost"}
    )
    save_summary(doc.id, response.choices[0].message.content)
```

### With Latency Budget

For user-facing features with cost consciousness:

```python
# Respond quickly but minimize cost
response = client.chat.completions.create(
    model="gpt-5.4",
    messages=conversation,
    routing={
        "optimize": "cost",
        "max_ttft_ms": 300  # User won't notice < 300ms
    }
)
```

### A/B test providers

Compare costs across providers:

```python
import random

# 10% to primary, 90% cost-optimized
if random.random() < 0.1:
    routing = {"providers": ["anthropic"]}
else:
    routing = {"optimize": "cost"}

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=messages,
    routing=routing
)

# Log for analysis
log_cost(
    provider=response.routing_metadata.provider,
    cost=response.routing_metadata.cost.billable_cost_usd
)
```

---

## Page: Cost Optimization > Section: Dashboard

Track your cost savings in the Auriko dashboard:

- Total spend by day/week/month
- Cost per model
- Cost per provider
- Savings vs. single-provider baseline

  Monitor your usage and costs in real-time

---

## Page: Error Handling

Handle errors gracefully with retries, fallbacks, and proper exception handling.

---

## Page: Error Handling > Section: Prerequisites

- An [Auriko API key](https://auriko.ai/signup)
- Python 3.10+ with `auriko` SDK installed (`pip install auriko`)
  - OR Node.js 18+ with `@auriko/sdk` installed (`npm install @auriko/sdk`)

---

## Page: Error Handling > Section: Error types

All Auriko errors extend `AurikoAPIError` with these fields:

| Field | Type | Description |
|-------|------|-------------|
| `message` | `str` | Human-readable error message |
| `status_code` | `int` | HTTP status code |
| `code` | `str` | Machine-readable error code |
| `type` | `str \| None` | Error category |
| `param` | `str \| None` | Parameter that caused the error |
| `response_headers` | `ResponseHeaders` | Response headers (includes `request_id` for support) |

The SDK provides 10 specific error classes:

| Exception | Status | When |
|-----------|--------|------|
| `AuthenticationError` | 401 | Invalid or missing API key |
| `InvalidRequestError` | 400 | Malformed request, invalid parameter value, or missing required parameter |
| `InsufficientCreditsError` | 402 | Account has insufficient credits |
| `BudgetExceededError` | 402 | Budget limit hit (workspace, key, or Bring Your Own Key (BYOK) scope) |
| `ModelNotFoundError` | 404 | Requested model not in catalog |
| `RateLimitError` | 429 | Rate limit exceeded |
| `InternalError` | 500 | Unexpected Auriko server error |
| `ProviderError` | 502/503/504 | Upstream provider error, timeout, or all providers failed |
| `ProviderAuthError` | 401 | BYOK key authentication failed at provider |
| `ServiceUnavailableError` | 503 | Auriko service temporarily unavailable |

  Some error codes (like `missing_required_parameter` and `no_providers_available`) map to shared classes (`InvalidRequestError` and `ProviderError` respectively). Any unrecognized error falls to the `AurikoAPIError` base class via status-code fallback.

See the [Python SDK Reference](/sdk/python-reference#error-classes) or [TypeScript SDK Reference](/sdk/typescript-reference#error-classes) for complete error class fields and hierarchy.

---

## Page: Error Handling > Section: Handle errors

Catch typed exceptions:

```python Python
import os
from auriko import (
    Client,
    AurikoAPIError,
    AuthenticationError,
    RateLimitError,
    BudgetExceededError,
    ModelNotFoundError,
    ProviderError,
    # Also available: InvalidRequestError, InsufficientCreditsError,
    # InternalError, ProviderAuthError, ServiceUnavailableError
)

client = Client(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

try:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}]
    )
    print(response.choices[0].message.content)

except AuthenticationError as e:
    print(f"Check your API key: {e}")

except RateLimitError as e:
    print(f"Rate limited, retry later: {e}")

except BudgetExceededError as e:
    print(f"Budget exceeded: {e}")

except ModelNotFoundError as e:
    print(f"Model not found: {e}")

except ProviderError as e:
    print(f"Provider error: {e}")

except AurikoAPIError as e:
    # Catches all other Auriko errors (InternalError,
    # ServiceUnavailableError, InvalidRequestError, etc.)
    print(f"API error ({e.status_code}): {e}")
```

```typescript TypeScript
import {
    Client,
    AurikoAPIError,
    AuthenticationError,
    RateLimitError,
    BudgetExceededError,
    ModelNotFoundError,
    ProviderError,
    // Also available: InvalidRequestError, InsufficientCreditsError,
    // InternalError, ProviderAuthError, ServiceUnavailableError
} from "@auriko/sdk";

const client = new Client({
    apiKey: process.env.AURIKO_API_KEY,
    baseUrl: "https://api.auriko.ai/v1",
});

try {
    const response = await client.chat.completions.create({
        model: "gpt-4o",
        messages: [{ role: "user", content: "Hello!" }],
    });
    console.log(response.choices[0].message.content);

} catch (e) {
    if (e instanceof AuthenticationError) {
        console.log(`Check your API key: ${e.message}`);
    } else if (e instanceof RateLimitError) {
        console.log(`Rate limited, retry later: ${e.message}`);
    } else if (e instanceof BudgetExceededError) {
        console.log(`Budget exceeded: ${e.message}`);
    } else if (e instanceof ModelNotFoundError) {
        console.log(`Model not found: ${e.message}`);
    } else if (e instanceof ProviderError) {
        console.log(`Provider error: ${e.message}`);
    } else if (e instanceof AurikoAPIError) {
        console.log(`API error (${e.statusCode}): ${e.message}`);
    }
}
```

---

## Page: Error Handling > Section: Use built-in retries

The SDK automatically retries transient errors with exponential backoff:

| Setting | Value |
|---------|-------|
| Max retries | 2 (default) |
| Initial interval | 500ms |
| Max interval | 30 seconds |
| Backoff | Exponential (1.5 exponent) + random jitter |
| Retried status codes | 429, 500, 502, 503, 504 |
| Connection/timeout errors | Retried |
| `Retry-After` header | Respected (overrides backoff when present) |

```python Python
import os
from auriko import Client

# Default: 2 retries with exponential backoff
client = Client(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

# More retries for resilience
client = Client(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1",
    max_retries=5
)

# Disable retries entirely
client = Client(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1",
    max_retries=0
)
```

```typescript TypeScript
import { Client } from "@auriko/sdk";

// Default: 2 retries with exponential backoff
const client = new Client({
    apiKey: process.env.AURIKO_API_KEY,
    baseUrl: "https://api.auriko.ai/v1",
});

// More retries for resilience
const resilientClient = new Client({
    apiKey: process.env.AURIKO_API_KEY,
    baseUrl: "https://api.auriko.ai/v1",
    maxRetries: 5,
});

// Disable retries entirely
const noRetryClient = new Client({
    apiKey: process.env.AURIKO_API_KEY,
    baseUrl: "https://api.auriko.ai/v1",
    maxRetries: 0,
});
```

  When the server returns a `Retry-After` header (common with 429 responses), the SDK uses that value instead of the calculated backoff interval.

---

## Page: Error Handling > Section: Retry manually

For more control, implement custom retry logic:

```python Python
import os
import time
from auriko import Client, RateLimitError, ProviderError

client = Client(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1",
    max_retries=0  # Disable auto-retry
)

def make_request_with_retry(messages, max_retries=3):
    last_error = None

    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(
                model="gpt-4o",
                messages=messages
            )
        except RateLimitError as e:
            last_error = e
            wait_time = min(2 ** attempt, 60)  # Cap at 60 seconds
            print(f"Rate limited, waiting {wait_time}s...")
            time.sleep(wait_time)

        except ProviderError as e:
            last_error = e
            wait_time = 2 ** attempt
            print(f"Provider error, retrying in {wait_time}s...")
            time.sleep(wait_time)

    raise last_error

# Usage
response = make_request_with_retry([{"role": "user", "content": "Hello!"}])
```

```typescript TypeScript
import { Client, RateLimitError, ProviderError } from "@auriko/sdk";

const client = new Client({
    apiKey: process.env.AURIKO_API_KEY,
    baseUrl: "https://api.auriko.ai/v1",
    maxRetries: 0, // Disable auto-retry
});

async function makeRequestWithRetry(
    messages: Array<{ role: string; content: string }>,
    maxRetries = 3
) {
    let lastError: Error | undefined;

    for (let attempt = 0; attempt < maxRetries; attempt++) {
        try {
            return await client.chat.completions.create({
                model: "gpt-4o",
                messages,
            });
        } catch (e) {
            lastError = e as Error;
            const waitTime = Math.min(2 ** attempt, 60) * 1000;
            if (e instanceof RateLimitError || e instanceof ProviderError) {
                await new Promise((r) => setTimeout(r, waitTime));
            } else {
                throw e;
            }
        }
    }

    throw lastError;
}
```

---

## Page: Error Handling > Section: Retry asynchronously

Retry with async/await:

```python
import os
import asyncio
from auriko import AsyncClient, RateLimitError, ProviderError

client = AsyncClient(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1",
    max_retries=0
)

async def make_request_with_backoff(messages, max_retries=3):
    for attempt in range(max_retries):
        try:
            return await client.chat.completions.create(
                model="gpt-4o",
                messages=messages
            )
        except (RateLimitError, ProviderError) as e:
            if attempt == max_retries - 1:
                raise
            wait_time = 2 ** attempt
            await asyncio.sleep(wait_time)
```

  **Side effects and retries:** When using tools or multi-step workflows, consider whether retries are safe. A retried request that triggers a tool call may execute the tool twice. For idempotency-sensitive operations, either disable automatic retries (`max_retries=0`) or implement your own deduplication logic.

---

## Page: Error Handling > Section: Fall back to another model

Use a cheaper/faster model as fallback:

```python
import os
from auriko import Client, AurikoAPIError

client = Client(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

def chat_with_fallback(messages):
    try:
        # Try primary model
        return client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            routing={"max_ttft_ms": 200}
        )
    except AurikoAPIError as e:
        print(f"Primary failed ({e}), trying fallback...")

        # Fallback to a different model
        return client.chat.completions.create(
            model="gpt-4o-mini",
            messages=messages
        )
```

---

## Page: Error Handling > Section: Use circuit breakers

Prevent cascading failures:

```python
import os
from datetime import datetime, timedelta, timezone
from auriko import Client, ProviderError

client = Client(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

class CircuitBreaker:
    def __init__(self, failure_threshold=5, reset_timeout=60):
        self.failures = 0
        self.failure_threshold = failure_threshold
        self.reset_timeout = reset_timeout
        self.last_failure = None
        self.is_open = False

    def record_failure(self):
        self.failures += 1
        self.last_failure = datetime.now(timezone.utc)
        if self.failures >= self.failure_threshold:
            self.is_open = True

    def record_success(self):
        self.failures = 0
        self.is_open = False

    def can_proceed(self):
        if not self.is_open:
            return True
        if datetime.now(timezone.utc) - self.last_failure > timedelta(seconds=self.reset_timeout):
            self.is_open = False
            return True
        return False

# Usage
breaker = CircuitBreaker()

def safe_request(messages):
    if not breaker.can_proceed():
        raise Exception("Circuit breaker open, try later")

    try:
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=messages
        )
        breaker.record_success()
        return response
    except ProviderError as e:
        breaker.record_failure()
        raise
```

---

## Page: Error Handling > Section: Set timeouts

```python Python
import os
import httpx
from auriko import Client

client = Client(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1",
    timeout=30.0  # 30 second timeout
)

try:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Write a long essay..."}]
    )
except httpx.TimeoutException:
    print("Request timed out")
```

```typescript TypeScript
import { Client } from "@auriko/sdk";

const client = new Client({
    apiKey: process.env.AURIKO_API_KEY,
    baseUrl: "https://api.auriko.ai/v1",
    timeout: 30000, // 30 second timeout (ms)
});

try {
    const response = await client.chat.completions.create({
        model: "gpt-4o",
        messages: [{ role: "user", content: "Write a long essay..." }],
    });
} catch (e) {
    if (e instanceof Error && e.name === "TimeoutError") {
        console.log("Request timed out");
    }
}
```

---

## Page: Error Handling > Section: Log errors

Log errors for debugging:

```python
import os
import logging
from auriko import Client, AurikoAPIError

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

client = Client(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

try:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}]
    )
except AurikoAPIError as e:
    logger.exception("Chat completion failed", extra={
        "error_type": type(e).__name__,
        "status_code": e.status_code,
        "request_id": e.response_headers.request_id,
        "model": "gpt-4o",
    })
    raise
```

---

## Page: Error Handling > Section: Map OpenAI SDK errors

If you use the OpenAI SDK directly (with `base_url` pointed at Auriko), you can convert OpenAI errors to typed Auriko errors using `map_openai_error()`:

```python
import os
import openai
from auriko import map_openai_error, RateLimitError, BudgetExceededError

client = openai.OpenAI(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

try:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}]
    )
except openai.APIStatusError as e:
    auriko_error = map_openai_error(e)
    if isinstance(auriko_error, RateLimitError):
        print(f"Rate limited. Retry after: {auriko_error.response_headers.rate_limit_reset}")
    elif isinstance(auriko_error, BudgetExceededError):
        print(f"Budget exceeded: {auriko_error.message}")
    else:
        raise auriko_error
```

This gives you access to typed error fields (`status_code`, `code`, `response_headers`) and fine-grained `isinstance` checks, even when using the OpenAI client.

`map_openai_error()` is Python-only. TypeScript users should use the Auriko SDK directly for typed errors.

See [Switching from OpenAI](/switching-from-openai#error-mapping) for migration-focused error mapping.

---

## Page: Error Handling > Section: Best practices

Let the SDK handle transient errors automatically
  
  
    Log errors with context for debugging
  
  
    Prevent requests from hanging indefinitely
  
  
    Use fallback models for critical paths

---

## Page: Prompt Caching

Reduce costs and latency by reusing cached prompt prefixes across requests. Auriko automatically injects cache control directives for all supported providers — no user action needed.

---

## Page: Prompt Caching > Section: Prerequisites

- An [Auriko API key](https://auriko.ai/signup)
- Python 3.10+ with `auriko` SDK installed (`pip install auriko`)
  - OR Node.js 18+ with `@auriko/sdk` installed (`npm install @auriko/sdk`)

---

## Page: Prompt Caching > Section: How it works

Auriko intercepts outgoing requests and injects provider-specific caching directives when conversations exceed provider-specific token thresholds. You send requests normally — caching happens transparently.

When a subsequent request shares the same prompt prefix, the provider serves the cached portion at a reduced cost and lower latency.

---

## Page: Prompt Caching > Section: See it in action

Send a normal request — caching is automatic:

```python Python
import os
from auriko import Client

client = Client(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

response = client.chat.completions.create(
    model="claude-sonnet-4-20250514",
    messages=[
        {"role": "system", "content": "You are a helpful coding assistant..."},
        {"role": "user", "content": "Explain async/await in Python."}
    ]
)

# Check cache usage in the response
usage = response.usage
if hasattr(usage, "prompt_tokens_details") and usage.prompt_tokens_details:
    cached = getattr(usage.prompt_tokens_details, "cached_tokens", 0)
    print(f"Cached tokens: {cached}")
print(f"Total prompt tokens: {usage.prompt_tokens}")
```

```typescript TypeScript
import { Client } from "@auriko/sdk";

const client = new Client({
    apiKey: process.env.AURIKO_API_KEY,
    baseUrl: "https://api.auriko.ai/v1",
});

const response = await client.chat.completions.create({
    model: "claude-sonnet-4-20250514",
    messages: [
        { role: "system", content: "You are a helpful coding assistant..." },
        { role: "user", content: "Explain async/await in Python." },
    ],
});

// Check cache usage in the response
const cached = response.usage?.prompt_tokens_details?.cached_tokens ?? 0;
console.log(`Cached tokens: ${cached}`);
console.log(`Total prompt tokens: ${response.usage?.prompt_tokens}`);
```

---

## Page: Prompt Caching > Section: Provider support

Auriko injects caching directives for four providers. Each uses a different mechanism:

| Provider | User action | Auriko behavior |
|----------|-------------|-----------------|
| Anthropic | None — automatic | Injects `cache_control: {type: "ephemeral"}` when estimated tokens exceed a per-model threshold (1024–4096 tokens depending on model). Skips if user already added `cache_control` blocks. |
| OpenAI | None — automatic | Injects `prompt_cache_key` for server affinity on all requests with a conversation ID. Adds `prompt_cache_retention: "24h"` for supported models (gpt-5, gpt-5.1, gpt-5.2, gpt-4.1 families). Skips retention for zdr data policy. |
| Fireworks | None — automatic | Sets `user` field to conversation ID for same-replica routing and KV-cache reuse. |
| xAI | None — automatic | Injects `x-grok-conv-id` header (UUID4 derived from conversation ID) for session affinity. |

### Anthropic token thresholds

Caching is only activated when the estimated prompt token count exceeds the model-specific threshold:

| Model family | Threshold |
|-------------|-----------|
| claude-sonnet-4-5, claude-sonnet-4, claude-opus-4, claude-opus-4-1, claude-3-7-sonnet | 1024 tokens |
| claude-sonnet-4-6, claude-3-5-haiku, claude-3-haiku | 2048 tokens |
| claude-haiku-4-5, claude-opus-4-5, claude-opus-4-6 | 4096 tokens |

Requests below the threshold skip auto-injection because Anthropic charges for cache writes on small requests.

### Manual cache control

If you need fine-grained control (for example, caching a specific system prompt block), add `cache_control` blocks manually. When Auriko detects any existing `cache_control` in the request, it skips auto-injection entirely.

---

## Page: Prompt Caching > Section: Check cache usage

Cache hit information appears in the response `usage` object:

```json
{
  "usage": {
    "prompt_tokens": 1500,
    "completion_tokens": 200,
    "total_tokens": 1700,
    "prompt_tokens_details": {
      "cached_tokens": 1200
    }
  }
}
```

The `cached_tokens` field shows how many prompt tokens were served from cache.

---

## Page: Prompt Caching > Section: When to use

Caching works best with:

- **Multi-turn conversations** — the shared conversation prefix grows with each turn
- **Long system prompts** — reused across many requests
- **Few-shot examples** — static example blocks cached across calls

Caching provides minimal benefit for:

- **Unique prompts** — no shared prefix to cache
- **Very short prompts** — below provider token thresholds
- **Single-turn requests** — no subsequent requests to benefit from the cache

---

## Page: Extensions and Thinking

Access provider-specific features like thinking tokens through a normalized interface. Auriko translates a single `extensions.thinking` configuration into provider-native formats automatically.

---

## Page: Extensions and Thinking > Section: Prerequisites

- An [Auriko API key](https://auriko.ai/signup)
- Python 3.10+ with `auriko` SDK installed (`pip install auriko`)
  - OR Node.js 18+ with `@auriko/sdk` installed (`npm install @auriko/sdk`)
- A model that supports reasoning (Claude 3.5+, o1, o3, o4-mini, DeepSeek R1, Gemini 2.0 Flash Thinking)

---

## Page: Extensions and Thinking > Section: Enable thinking

Pass `extensions.thinking` in your request to enable extended reasoning:

```python Python
import os
from auriko import Client

client = Client(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

response = client.chat.completions.create(
    model="claude-sonnet-4-20250514",
    messages=[{"role": "user", "content": "Solve this step by step: what is 23! / 20!?"}],
    extensions={"thinking": {"enabled": True, "budget_tokens": 10000}}
)
print(response.choices[0].message.content)
```

```typescript TypeScript
import { Client } from "@auriko/sdk";

const client = new Client({
    apiKey: process.env.AURIKO_API_KEY,
    baseUrl: "https://api.auriko.ai/v1",
});

const response = await client.chat.completions.create({
    model: "claude-sonnet-4-20250514",
    messages: [{ role: "user", content: "Solve this step by step: what is 23! / 20!?" }],
    extensions: { thinking: { enabled: true, budget_tokens: 10000 } },
});
console.log(response.choices[0].message.content);
```

---

## Page: Extensions and Thinking > Section: Check provider support

Auriko translates `extensions.thinking` into provider-native formats:

| Provider | Models | Translation |
|----------|--------|-------------|
| Anthropic | Claude 3.5+, Claude 4 | `thinking: {type: "enabled", budget_tokens: <value>}` — budget passed directly |
| OpenAI | o1, o3, o4-mini | `reasoning_effort: "low" / "medium" / "high"` — mapped from budget_tokens thresholds |
| DeepSeek | R1 | `thinking: {enabled: true, max_tokens: <value>}` — budget passed directly |
| Google AI Studio | Gemini 2.0 Flash Thinking | `thinking_config: {thinking_budget: <value>}` — budget passed directly |
| Other providers | Varies | OpenAI-compatible `reasoning_effort` format (default translator) |

### OpenAI budget mapping

Since OpenAI uses discrete `reasoning_effort` levels instead of a token budget, Auriko maps `budget_tokens` to the appropriate level:

| Budget tokens | Reasoning effort |
|--------------|-----------------|
| < 5,000 | `low` |
| 5,000 – 14,999 | `medium` |
| >= 15,000 | `high` |

If `budget_tokens` is omitted, the default is 8,000 (maps to `medium`).

---

## Page: Extensions and Thinking > Section: Read thinking output

When a model supports reasoning, the thinking output appears in the `reasoning_content` field on the response message:

```python Python
response = client.chat.completions.create(
    model="claude-sonnet-4-20250514",
    messages=[{"role": "user", "content": "Solve step by step: what is 23! / 20!?"}],
    extensions={"thinking": {"enabled": True, "budget_tokens": 10000}}
)

# Access the reasoning (if the model returns it)
if response.choices[0].message.reasoning_content:
    print(f"Reasoning: {response.choices[0].message.reasoning_content}")
print(f"Answer: {response.choices[0].message.content}")
```

```typescript TypeScript
const response = await client.chat.completions.create({
    model: "claude-sonnet-4-20250514",
    messages: [{ role: "user", content: "Solve step by step: what is 23! / 20!?" }],
    extensions: { thinking: { enabled: true, budget_tokens: 10000 } },
});

if (response.choices[0].message.reasoning_content) {
    console.log(`Reasoning: ${response.choices[0].message.reasoning_content}`);
}
console.log(`Answer: ${response.choices[0].message.content}`);
```

### Providers with `reasoning_content`

| Provider | `reasoning_content` populated? | Notes |
|----------|-------------------------------|-------|
| Anthropic | Yes | Extracted from thinking block |
| DeepSeek | Yes | Extracted from thinking content |
| Google | Yes | Extracted from thinking_config response |
| Fireworks AI | Yes | Extracted from `<think>` tags in content (Qwen3 models) |
| OpenAI | No | Reasoning is internal; not exposed in response |

  Fireworks AI Qwen3 models populate `reasoning_content` by default, without `extensions.thinking`.

---

## Page: Extensions and Thinking > Section: Use provider passthrough

For provider-specific features beyond thinking, use provider-keyed extensions. Auriko forwards these as-is after security sanitization:

```python Python
import os
from auriko import Client

client = Client(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

response = client.chat.completions.create(
    model="claude-sonnet-4-20250514",
    messages=[{"role": "user", "content": "Hello!"}],
    extensions={
        "thinking": {"enabled": True, "budget_tokens": 10000},
        "anthropic": {
            "custom_metadata": {"session_id": "abc123"}
        }
    }
)
```

```typescript TypeScript
import { Client } from "@auriko/sdk";

const client = new Client({
    apiKey: process.env.AURIKO_API_KEY,
    baseUrl: "https://api.auriko.ai/v1",
});

const response = await client.chat.completions.create({
    model: "claude-sonnet-4-20250514",
    messages: [{ role: "user", content: "Hello!" }],
    extensions: {
        thinking: { enabled: true, budget_tokens: 10000 },
        anthropic: {
            custom_metadata: { session_id: "abc123" },
        },
    },
});
```

Auriko normalizes provider aliases automatically. The aliases `google`, `google_ai`, `googleai`, and `gemini` all map to `google_ai_studio`.

### Precedence

When both normalized features and provider passthrough contain the same field, the provider passthrough wins. For example, if you set `extensions.thinking.budget_tokens: 10000` and `extensions.anthropic.thinking.budget_tokens: 15000`, Anthropic receives `15000`.

### Security filtering

Auriko blocks authentication-related keys (`api_key`, `authorization`, `token`, etc.) at all nesting levels in passthrough extensions. Auriko also blocks core request fields (`model`, `messages`, `temperature`, etc.) at the top level to prevent routing bypass.

---

## Page: Extensions and Thinking > Section: Cost and latency

Thinking tokens count toward output tokens and increase both cost and latency. Use `budget_tokens` to cap the reasoning budget for your use case. For cost-sensitive workloads, see [Cost optimization](/guides/cost-optimization). See [Check reasoning token availability](#check-reasoning-token-availability) for which providers report a breakdown.

---

## Page: Extensions and Thinking > Section: Check reasoning token availability

The `completion_tokens_details.reasoning_tokens` field reports how many tokens the model spent on reasoning. Auriko passes through what the upstream provider reports.

| Provider | Model examples | `reasoning_tokens` reported? | Notes |
|----------|---------------|----------------------------|-------|
| OpenAI | o1, o3, o4-mini | Yes | Native field |
| DeepSeek | deepseek-v3.2-thinking | Yes | Native field (routed to DeepSeek API) |
| xAI | grok-4-fast-reasoning | Yes | Native field |
| Google | Gemini 2.5 Flash | Yes | Mapped from `thoughtsTokenCount` |
| Anthropic | All Claude models | No | Upstream returns combined `output_tokens` only |
| Moonshot | kimi-k2-thinking, kimi-k2-thinking-turbo | No | Upstream doesn't include token details |
| Fireworks | deepseek-v3.2 | No | Upstream doesn't include token details for hosted models |

When the provider doesn't report a reasoning token breakdown, Auriko doesn't include `completion_tokens_details` in the response.

Check for the field before accessing it:

```python Python
if response.usage.completion_tokens_details:
    print(f"Reasoning: {response.usage.completion_tokens_details.reasoning_tokens}")
```

```typescript TypeScript
if (response.usage?.completion_tokens_details) {
    console.log(`Reasoning: ${response.usage.completion_tokens_details.reasoning_tokens}`);
}
```

When `completion_tokens_details` isn't available, `completion_tokens` reflects the combined total of reasoning and content tokens. You can still use it for cost tracking.

---

## Page: Budget Management

Set spending limits at the workspace, API key, or BYOK provider level. Budgets enforce hard limits that block requests when exceeded.

---

## Page: Budget Management > Section: Prerequisites

- An [Auriko API key](https://auriko.ai/signup) for read operations, or a [session token](/api-reference/authentication#session-authentication) for full access
- Workspace owner or admin role

---

## Page: Budget Management > Section: Authentication

| Operation | API key (`ak_*`) | Session JWT |
|-----------|:-:|:-:|
| List / Get budgets | Yes | Yes |
| Create / Update / Delete | No | Yes |

Read operations accept API keys. Write operations require session authentication.

---

## Page: Budget Management > Section: Create a budget

Budget endpoints aren't wrapped by the SDK. Use cURL or any HTTP client with a session token:

```bash
curl -X POST https://api.auriko.ai/v1/workspaces/{workspace_id}/budgets \
  -H "Authorization: Bearer $SESSION_JWT" \
  -H "Content-Type: application/json" \
  -d '{
    "scope_type": "workspace",
    "period": "monthly",
    "limit_usd": 500,
    "enforce": true
  }'
```

Response:

```json
{
  "id": "bdgt_abc123",
  "workspace_id": "7c9e6679-7425-40de-944b-e07fc1f90ae7",
  "scope_type": "workspace",
  "period": "monthly",
  "limit_usd": 500.0,
  "enforce": true,
  "include_byok": false,
  "spend_usd": 42.50,
  "percent_used": 8.5,
  "created_at": "2026-03-20T10:00:00Z",
  "updated_at": "2026-03-20T10:00:00Z"
}
```

---

## Page: Budget Management > Section: Budget scopes

The `scope_type` field determines what spending the budget tracks:

| `scope_type` | Description | Required extra field |
|--------------|-------------|----------------------|
| `workspace` | Total workspace spend | `include_byok` (optional, default `false`) |
| `api_key` | Per-key spend | `scope_id` (API key ID, required) |
| `byok_provider` | Per-BYOK-provider spend | `scope_provider` (provider name, required) |

### Workspace budget with BYOK

To include BYOK usage in a workspace budget, set `include_byok: true`:

```bash
curl -X POST https://api.auriko.ai/v1/workspaces/{workspace_id}/budgets \
  -H "Authorization: Bearer $SESSION_JWT" \
  -H "Content-Type: application/json" \
  -d '{
    "scope_type": "workspace",
    "period": "monthly",
    "limit_usd": 1000,
    "enforce": true,
    "include_byok": true
  }'
```

### API key budget

```bash
curl -X POST https://api.auriko.ai/v1/workspaces/{workspace_id}/budgets \
  -H "Authorization: Bearer $SESSION_JWT" \
  -H "Content-Type: application/json" \
  -d '{
    "scope_type": "api_key",
    "scope_id": "key_abc123",
    "period": "daily",
    "limit_usd": 50,
    "enforce": true
  }'
```

---

## Page: Budget Management > Section: Periods

Budgets reset on a fixed schedule (UTC):

| Period | Resets at |
|--------|----------|
| `daily` | 00:00 UTC |
| `weekly` | Monday 00:00 UTC |
| `monthly` | 1st of month 00:00 UTC |

---

## Page: Budget Management > Section: Check budget status

```bash
curl https://api.auriko.ai/v1/workspaces/{workspace_id}/budgets \
  -H "Authorization: Bearer $AURIKO_API_KEY"
```

Each budget in the response shows current spend:

```json
{
  "id": "bdgt_abc123",
  "scope_type": "workspace",
  "period": "monthly",
  "limit_usd": 500.0,
  "spend_usd": 127.50,
  "percent_used": 25.5,
  "enforce": true
}
```

---

## Page: Budget Management > Section: Update and delete

Update a budget (at least one field required):

```bash
curl -X PATCH https://api.auriko.ai/v1/workspaces/{workspace_id}/budgets/{budget_id} \
  -H "Authorization: Bearer $SESSION_JWT" \
  -H "Content-Type: application/json" \
  -d '{"limit_usd": 750}'
```

Delete a budget:

```bash
curl -X DELETE https://api.auriko.ai/v1/workspaces/{workspace_id}/budgets/{budget_id} \
  -H "Authorization: Bearer $SESSION_JWT"
```

---

## Page: Budget Management > Section: Enforcement

When `enforce` is `true` and spending reaches the enforcement threshold, subsequent inference requests return a `402` error with code `budget_exceeded`.

The enforcement threshold has a buffer to account for in-flight requests:

```
enforcement_limit = limit_usd - min($10, 10% of limit_usd)
```

For example, a $100 budget enforces at $90. A $500 budget enforces at $490.

Auriko triggers alerts at 50%, 75%, 90%, and 100% of the budget limit.

For handling `budget_exceeded` errors, see [Error handling](/guides/error-handling).

---

## Page: Budget Management > Section: Rate limiting

Auriko rate-limits budget management writes to 10 per minute per user and API key reads to 60 per minute per IP. See [Rate limits](/platform/rate-limits) for details.

---

## Page: Advanced Routing

Fine-tune routing with suffix shortcuts, multi-model requests, quality constraints, and data policies. For basic routing, see [Routing options](/guides/routing-options).

---

## Page: Advanced Routing > Section: Prerequisites

- An [Auriko API key](https://auriko.ai/signup)
- A [session token](/api-reference/authentication#session-authentication) (for routing defaults)
- Python 3.10+ with `auriko` SDK installed (`pip install auriko`)
  - OR Node.js 18+ with `@auriko/sdk` installed (`npm install @auriko/sdk`)
- Familiarity with [Routing options](/guides/routing-options)

---

## Page: Advanced Routing > Section: How routing works

When you send a request, Auriko's router:

1. **Enumerates candidates** — finds all providers offering the requested model(s)
2. **Filters by constraints** — removes providers that violate your routing options (data policy, Bring Your Own Key (BYOK) requirement, min success rate, excluded providers)
3. **Scores by strategy** — ranks remaining candidates using your `optimize` strategy:
   - `cost` / `cheapest`: lowest price per token
   - `ttft` / `speed`: lowest latency to first token
   - `throughput`: highest tokens per second
   - `balanced`: weighted combination of cost, latency, and throughput
4. **Selects and routes** — selects from the ranked list, favoring higher-scored providers
5. **Falls back if needed** — if the provider fails and `allow_fallbacks` is true, retries with the next candidate (up to `max_fallback_attempts`)

See [Python SDK](/sdk/python#with-routing-options) or [TypeScript SDK](/sdk/typescript#with-routing-options) for routing code examples.

---

## Page: Advanced Routing > Section: Use suffix shortcuts

Append a suffix to any model name for quick routing configuration:

| Suffix | Strategy | Effect |
|--------|----------|--------|
| `:floor` | `cheapest` | Absolute lowest cost |
| `:cost` | `cost` | Cost-optimized with more spread |
| `:nitro` | `speed` | Fastest overall provider |
| `:fast` | `ttft` | Fastest time to first token |
| `:balanced` | `balanced` | Weighted combination |

```python Python
import os
from auriko import Client

client = Client(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

# Cheapest provider for gpt-4o
response = client.chat.completions.create(
    model="gpt-4o:floor",
    messages=[{"role": "user", "content": "Hello!"}]
)

# Fastest time to first token
response = client.chat.completions.create(
    model="claude-sonnet-4-20250514:fast",
    messages=[{"role": "user", "content": "Hello!"}]
)
```

```typescript TypeScript
import { Client } from "@auriko/sdk";

const client = new Client({
    apiKey: process.env.AURIKO_API_KEY,
    baseUrl: "https://api.auriko.ai/v1",
});

// Cheapest provider for gpt-4o
const response = await client.chat.completions.create({
    model: "gpt-4o:floor",
    messages: [{ role: "user", content: "Hello!" }],
});

// Fastest time to first token
const fast = await client.chat.completions.create({
    model: "claude-sonnet-4-20250514:fast",
    messages: [{ role: "user", content: "Hello!" }],
});
```

The router parses suffixes only when the model ID contains exactly one colon. Fine-tuned models with multiple colons (for example, `ft:gpt-4o:org:custom`) pass through unchanged.

---

## Page: Advanced Routing > Section: Route across models

Pass `models` instead of `model` to route across multiple models (mutually exclusive with `model`, max 10):

```python Python
import os
from auriko import Client

client = Client(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

# Pool mode (default): best provider across all models
response = client.chat.completions.create(
    models=["gpt-4o", "claude-sonnet-4-20250514", "gemini-2.0-flash"],
    messages=[{"role": "user", "content": "Hello!"}],
    routing={"mode": "pool"}
)

# Fallback mode: try models in order
response = client.chat.completions.create(
    models=["gpt-4o", "claude-sonnet-4-20250514"],
    messages=[{"role": "user", "content": "Hello!"}],
    routing={"mode": "fallback"}
)
```

```typescript TypeScript
import { Client } from "@auriko/sdk";

const client = new Client({
    apiKey: process.env.AURIKO_API_KEY,
    baseUrl: "https://api.auriko.ai/v1",
});

// Pool mode (default): best provider across all models
const response = await client.chat.completions.create({
    models: ["gpt-4o", "claude-sonnet-4-20250514", "gemini-2.0-flash"],
    messages: [{ role: "user", content: "Hello!" }],
    routing: { mode: "pool" },
});

// Fallback mode: try models in order
const fallback = await client.chat.completions.create({
    models: ["gpt-4o", "claude-sonnet-4-20250514"],
    messages: [{ role: "user", content: "Hello!" }],
    routing: { mode: "fallback" },
});
```

| Mode | Behavior |
|------|----------|
| `pool` (default) | Select the best-scoring provider across all requested models |
| `fallback` | Try all providers for the first model, then the second model, and so on |

---

## Page: Advanced Routing > Section: Set quality constraints

Filter providers by performance requirements:

| Constraint | Type | Description |
|-----------|------|-------------|
| `min_throughput_tps` | number | Minimum tokens per second |
| `min_success_rate` | number (0–1) | Minimum success rate |
| `max_cost_per_1m` | number | Maximum cost per 1M tokens (average of input + output) |
| `max_ttft_ms` | number | Maximum time to first token in milliseconds |
| `weights` | object | Custom scoring weights: `{ cost, ttft, throughput, reliability }`. Overrides preset. |

```python Python
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    routing={
        "optimize": "balanced",
        "min_throughput_tps": 50,
        "min_success_rate": 0.95,
        "weights": {"cost": 0.6, "ttft": 0.4}
    }
)
```

```typescript TypeScript
const response = await client.chat.completions.create({
    model: "gpt-4o",
    messages: [{ role: "user", content: "Hello!" }],
    routing: {
        optimize: "balanced",
        min_throughput_tps: 50,
        min_success_rate: 0.95,
        weights: { cost: 0.6, ttft: 0.4 },
    },
});
```

Custom weights let you control the exact tradeoff between cost, latency, throughput, and reliability. When provided, they override the preset coefficients. The server normalizes weights to sum to 1.0.

---

## Page: Advanced Routing > Section: Data policy

Control how providers handle your data:

| Policy | Description |
|--------|-------------|
| `none` (default) | No restrictions |
| `no_training` | Provider must not use data for training |
| `zdr` | Zero data retention — strictest policy |

The hierarchy is `zdr` > `no_training` > `none`. When a per-request policy intersects with an account-level policy, the most restrictive one wins.

```python Python
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Sensitive financial data..."}],
    routing={"data_policy": "zdr"}
)
```

```typescript TypeScript
const response = await client.chat.completions.create({
    model: "gpt-4o",
    messages: [{ role: "user", content: "Sensitive financial data..." }],
    routing: { data_policy: "zdr" },
});
```

---

## Page: Advanced Routing > Section: Provider alias normalization

Provider names in `providers` and `exclude_providers` are case-insensitive and support aliases:

| Alias | Canonical name |
|-------|----------------|
| `google`, `google_ai`, `googleai`, `gemini` | `google_ai_studio` |
| `fireworks` | `fireworks_ai` |
| `together` | `together_ai` |

Unrecognized names pass through as-is (lowercased).

---

## Page: Advanced Routing > Section: Configure fallbacks

By default, Auriko retries with alternative providers on 429 (rate limit), 5xx (server error), and timeout responses.

| Setting | Default | Description |
|---------|---------|-------------|
| `allow_fallbacks` | `true` | Enable automatic fallback to alternative providers |
| `max_fallback_attempts` | 3 | Maximum fallback attempts (not counting the primary attempt) |

Timeouts: 10 seconds for the first byte on streaming requests, 60 seconds total for non-streaming requests.

```python Python
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    routing={
        "allow_fallbacks": True,
        "max_fallback_attempts": 5
    }
)
```

```typescript TypeScript
const response = await client.chat.completions.create({
    model: "gpt-4o",
    messages: [{ role: "user", content: "Hello!" }],
    routing: {
        allow_fallbacks: true,
        max_fallback_attempts: 5,
    },
});
```

---

## Page: Advanced Routing > Section: Set workspace defaults

Set default routing options for all requests in a workspace:

```bash
# Get current defaults
curl https://api.auriko.ai/v1/workspaces/{workspace_id}/routing-defaults \
  -H "Authorization: Bearer $SESSION_JWT"

# Set defaults
curl -X PATCH https://api.auriko.ai/v1/workspaces/{workspace_id}/routing-defaults \
  -H "Authorization: Bearer $SESSION_JWT" \
  -H "Content-Type: application/json" \
  -d '{
    "optimize": "cost",
    "data_policy": "no_training"
  }'
```

Routing defaults use session authentication, not API key authentication. See [Authentication](/api-reference/authentication).

To clear all routing defaults, send an empty object `{}` as the PATCH body.

Per-request `routing` options override workspace defaults. Model suffix overrides sit between workspace defaults and per-request options.

### Precedence

1. Per-request `routing` options (highest)
2. Model suffix overrides (for example, `:floor`)
3. Workspace routing defaults (lowest)

===

# Auriko API Reference


---

## Page: Introduction

Auriko is an LLM routing layer that applies quantitative trading methodology to inference optimization. You can access a growing list of models across providers through a single API, define your own routing strategy, and switch models without changing application code.

  
    Get your first response in under 2 minutes
  
  
    Complete API documentation
  
  
    Native Python SDK with OpenAI compatibility
  
  
    Native TypeScript SDK with full typing

---

## Page: Introduction > Section: What Auriko Provides

- **[Routing and arbitrage](/guides/routing-options)** — Cost, latency, and quality optimization across models and providers. Auriko runs deep [prompt-caching optimization](/guides/prompt-caching).
- **[Automatic failover](/guides/error-handling)** — Redundancy and provider-aware rate limit management.
- **[Budget controls](/guides/budget-management)** — Spending limits and alerts at the workspace or API key level.
- **[BYOK](/platform/byok)** — Use your own provider keys, platform keys, or both.

Auriko provides native SDKs for [Python](/sdk/python) and [TypeScript](/sdk/typescript). It's also OpenAI-compatible — any existing OpenAI client or [framework](/frameworks/langchain) works without modification.

---

## Page: Introduction > Section: Resources

- [Available Models](https://optimal-inference.vercel.app/models) — Supported models and providers
- [Pricing](https://optimal-inference.vercel.app/pricing) — Pricing information

---

## Page: Introduction > Section: Machine-readable sources

You can access Auriko's documentation in machine-readable formats for AI agents and programmatic use.

- [llms.txt](/llms.txt) — Index of all documentation sections in plaintext, following the [llms.txt standard](https://llmstxt.org/)
- [llms-full.txt](/llms-full.txt) — Complete documentation in a single file
- [OpenAPI spec](/openapi.yaml) — OpenAPI 3.1 specification for all API endpoints

---

## Page: Quickstart

Get your first LLM response through Auriko in under 2 minutes.

---

## Page: Quickstart > Section: Prerequisites

- An [Auriko account](https://auriko.ai/signup) with an API key

---

## Page: Quickstart > Section: 1. Get an API Key

Create your account and get an API key from the dashboard

  **Base URL:** Use `https://api.auriko.ai/v1` as your base URL.
  This value matches `servers[0].url` in our
  [OpenAPI spec](https://github.com/zxyaction/optimal_inference/blob/main/api_gateway/openapi/auriko-api.yaml)
  and is the canonical endpoint for all API requests.

---

## Page: Quickstart > Section: 2. Install

```bash Python
pip install auriko
```

```bash TypeScript
npm install @auriko/sdk
```

```bash OpenAI SDK (Alternative)
pip install openai
```

---

## Page: Quickstart > Section: 3. Make Your First Request

```python Python (Auriko SDK)
import os
from auriko import Client

client = Client(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)
if response.routing_metadata:
    print(f"Provider: {response.routing_metadata.provider}")
    if response.routing_metadata.cost:
        print(f"Cost: ${response.routing_metadata.cost.billable_cost_usd:.6f}")
```

```typescript TypeScript
import { Client } from "@auriko/sdk";

const client = new Client({
    apiKey: process.env.AURIKO_API_KEY,
    baseUrl: "https://api.auriko.ai/v1",
});

const response = await client.chat.completions.create({
    model: "gpt-5.4",
    messages: [{ role: "user", content: "Hello!" }],
});

console.log(response.choices[0].message.content);
```

```python OpenAI SDK (Drop-in)
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)
```

---

## Page: Quickstart > Section: 4. Enable Routing Features (Optional)

```python Python
response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello!"}],
    routing={
        "optimize": "cost",        # Optimize for cost
        "max_ttft_ms": 200,        # Max 200ms to first token
    }
)
```

```typescript TypeScript
const response = await client.chat.completions.create({
    model: "gpt-5.4",
    messages: [{ role: "user", content: "Hello!" }],
    routing: {
        optimize: "cost",
        max_ttft_ms: 200,
    },
});
```

---

## Page: Quickstart > Section: Next Steps

Full API documentation
  
  
    Configure cost/speed optimization
  
  
    Real-time streaming responses
  
  
    Use with LangChain

---

## Page: Switching from OpenAI

Switch from OpenAI to Auriko in 3 lines of code. Your existing chat completions, streaming, and tool calling code works without changes.

---

## Page: Switching from OpenAI > Section: Before and after

```python Python
# Before (OpenAI)
from openai import OpenAI
client = OpenAI(api_key="sk-...")

# After (Auriko)
import os
from auriko import Client
client = Client(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

# Everything else stays the same
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)
```

```typescript TypeScript
// Before (OpenAI)
import OpenAI from "openai";
const client = new OpenAI({ apiKey: "sk-..." });

// After (Auriko)
import { Client } from "@auriko/sdk";
const client = new Client({
    apiKey: process.env.AURIKO_API_KEY,
    baseUrl: "https://api.auriko.ai/v1",
});

// Everything else stays the same
const response = await client.chat.completions.create({
    model: "gpt-4o",
    messages: [{ role: "user", content: "Hello!" }],
});
```

---

## Page: Switching from OpenAI > Section: What works identically

All standard OpenAI API features work through Auriko with no code changes:

| Feature | Status |
|---------|--------|
| Chat completions | Fully compatible |
| Streaming | Fully compatible |
| Tool calling | Fully compatible |
| Structured output | Fully compatible |
| Models list | Fully compatible |
| Async client | Fully compatible |
| Error classes | Fully compatible |
| Retry logic | Built-in (max 2 retries, exponential backoff) |

---

## Page: Switching from OpenAI > Section: What's new

Auriko adds capabilities on top of the OpenAI-compatible interface:

- **Routing options** — optimize for cost, speed, or throughput across providers. See [Routing options](/guides/routing-options).
- **Cost optimization** — save 30-70% by routing to the cheapest provider. See [Cost optimization](/guides/cost-optimization).
- **Prompt caching** — automatic cache injection for all supported providers. See [Prompt caching](/guides/prompt-caching).
- **Budget management** — set spending limits per workspace, API key, or BYOK provider. See [Budget management](/guides/budget-management).
- **Response headers** — every response carries `request_id`, rate limit headers, and credit usage. See [Python SDK](/sdk/python#read-response-headers).

---

## Page: Switching from OpenAI > Section: Use OpenAI SDK directly

You don't need the `auriko` package at all. The OpenAI SDK works with a `base_url` override:

```python Python
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

response = client.chat.completions.create(
    model="claude-sonnet-4-20250514",
    messages=[{"role": "user", "content": "Hello from Claude via Auriko!"}]
)
print(response.choices[0].message.content)
```

```typescript TypeScript
import OpenAI from "openai";

const client = new OpenAI({
    apiKey: process.env.AURIKO_API_KEY,
    baseURL: "https://api.auriko.ai/v1",
});

const response = await client.chat.completions.create({
    model: "claude-sonnet-4-20250514",
    messages: [{ role: "user", content: "Hello from Claude via Auriko!" }],
});
console.log(response.choices[0].message.content);
```

This approach lets you access any model from any provider (Anthropic, Google, Meta, and more) through the familiar OpenAI client.

---

## Page: Switching from OpenAI > Section: Error mapping

When using the OpenAI SDK directly, convert errors to typed Auriko errors with `map_openai_error()`:

```python
import os
import openai
from auriko import map_openai_error, RateLimitError, BudgetExceededError

client = openai.OpenAI(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

try:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}]
    )
except openai.APIStatusError as e:
    auriko_error = map_openai_error(e)
    if isinstance(auriko_error, RateLimitError):
        print(f"Rate limited. Retry after: {auriko_error.response_headers.rate_limit_reset}")
    elif isinstance(auriko_error, BudgetExceededError):
        print(f"Budget exceeded: {auriko_error.message}")
    else:
        raise auriko_error
```

  `map_openai_error()` is Python-only. See [Error Handling](/guides/error-handling) for the full error handling guide.

---

## Page: Switching from OpenAI > Section: Access routing metadata

When using the OpenAI SDK directly, extract routing metadata from responses with `parse_routing_metadata()`:

```python
from auriko.route_types import parse_routing_metadata

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)
metadata = parse_routing_metadata(response)
if metadata:
    print(f"Provider: {metadata.provider}")
    if metadata.cost:
        print(f"Cost: ${metadata.cost.billable_cost_usd}")
```

  `parse_routing_metadata()` is Python-only. The Auriko SDK exposes `routing_metadata` as a typed property on `ChatCompletion` and `Stream` directly. Use the native SDK for the best experience.

For the full native SDK experience with typed responses and errors, see the [Python SDK Guide](/sdk/python) or [TypeScript SDK Guide](/sdk/typescript).

---

## Page: API Overview

The Auriko API is OpenAI-compatible, meaning you can use it as a drop-in replacement for OpenAI's API.

---

## Page: API Overview > Section: Base URL

```
https://api.auriko.ai/v1
```

---

## Page: API Overview > Section: Endpoints

### Inference API

| Endpoint | Method | Description |
|----------|--------|-------------|
| [`/v1/chat/completions`](/api-reference/chat-completions) | POST | Create a chat completion |
| [`/v1/me`](/api-reference/get-api-key-identity) | GET | Get API key identity |

### Discovery API

| Endpoint | Method | Description |
|----------|--------|-------------|
| [`/v1/registry/providers`](/api-reference/list-providers) | GET | List available providers |
| [`/v1/registry/models`](/api-reference/list-registry-models) | GET | List canonical models |
| [`/v1/directory/models`](/api-reference/list-directory-models) | GET | Full model directory with pricing |

### Management API

| Resource | Endpoints | Description |
|----------|-----------|-------------|
| [Workspaces](/api-reference/create-workspace) | 4 | Create, list, get, update workspaces |
| [Routing](/api-reference/get-routing-defaults) | 2 | Get and update workspace routing defaults |
| [Budgets](/api-reference/list-budgets) | 5 | CRUD for spending limits |
| [API Keys](/api-reference/create-api-key) | 4 | Create, list, revoke keys; usage stats |
| [Billing](/api-reference/get-credit-balance) | 1 | Credit balance |
| [Provider Keys](/api-reference/add-provider-key) | 7 | BYOK key management |

Management API endpoints use session authentication. Workspace and budget reads also accept API keys. See [Authentication](/api-reference/authentication#session-authentication).

See also: [Team management](/platform/team-management), [Budget management](/guides/budget-management), [Bring Your Own Key](/platform/byok).

---

## Page: API Overview > Section: OpenAI Compatibility

Auriko supports the same request/response format as OpenAI. Existing code using OpenAI's API can switch to Auriko by changing:

1. **Base URL:** `https://api.openai.com/v1` → `https://api.auriko.ai/v1`
2. **API Key:** Use your Auriko API key (starts with `ak_`)

---

## Page: API Overview > Section: Auriko Extensions

In addition to OpenAI-compatible fields, Auriko responses carry:

- **`routing_metadata`** - Information about which provider handled the request
- **`routing`** (request) - Options to optimize for cost, speed, or throughput
- **`auriko_metadata`** (request) - Custom tags and trace IDs for request tracking

### Response Extensions

Every chat completion response carries a `routing_metadata` object:

```json
{
  "routing_metadata": {
    "provider": "openai",
    "provider_model_id": "gpt-5.4",
    "tier": "standard",
    "model_canonical": "gpt-5.4",
    "routing_strategy": "balanced",
    "total_latency_ms": 1234,
    "ttft_ms": 312,
    "candidates_total": 5,
    "candidates_viable": 3,
    "routing_decision_ms": 2.1,
    "cost": {
      "input_tokens": 100,
      "output_tokens": 50,
      "provider_cost_usd": 0.000135,
      "billable_cost_usd": 0.00015
    }
  }
}
```

#### Field reference

| Field | Type | Always present | Description |
|-------|------|----------------|-------------|
| `provider` | string | yes | Provider that served the request (e.g., `openai`, `anthropic`) |
| `provider_model_id` | string | yes | Provider's internal model ID |
| `tier` | string | no | Pricing tier when applicable (e.g., `flex`, `standard`). Omitted for providers without tiers. |
| `model_canonical` | string | yes | Canonical model ID from the request |
| `routing_strategy` | string | yes | Strategy used: `cost`, `speed`, `balanced`, `ttft`, `throughput`, `cheapest`, or `custom`. `custom` is returned when explicit `routing.weights` are provided. |
| `candidates_total` | number | yes | Total provider candidates before filtering |
| `candidates_viable` | number | yes | Candidates remaining after constraint filtering |
| `routing_decision_ms` | number | yes | Time spent on the routing decision (ms) |
| `ttft_ms` | number | no | Time to first token (ms). Streaming responses only. |
| `total_latency_ms` | number | yes | Total request latency (ms) |
| `cost` | object | no | Cost breakdown. Present when token usage is available and pricing is token-based. |
| `fallback_chain` | array | no | Fallback attempt history. Present only when the primary provider failed and a fallback was used. |
| `warnings` | string[] | no | Warnings about ignored or unsupported routing configuration. Omitted when empty. |

**`cost` fields:** `input_tokens` (number), `output_tokens` (number), `provider_cost_usd` (number), `billable_cost_usd` (number).

**`fallback_chain` entries:** `provider` (string), `status` (`"success"` or `"failed"`), `reason` (string, present on failed entries only).

### Request Extensions

Pass routing options to optimize your requests:

```json
{
  "model": "gpt-5.4",
  "messages": [...],
  "routing": {
    "optimize": "cost",
    "max_ttft_ms": 200
  }
}
```

### Request metadata

Attach custom metadata to requests for tracking and observability. Auriko strips `auriko_metadata` before forwarding to the provider. The `auriko_` prefix avoids collision with OpenAI/Anthropic native `metadata` fields.

```json
{
  "model": "gpt-5.4",
  "messages": [...],
  "auriko_metadata": {
    "tags": ["production", "chatbot"],
    "user_id": "user_abc123",
    "trace_id": "trace_xyz789",
    "custom_fields": {
      "environment": "production",
      "feature": "customer-support"
    }
  }
}
```

| Field | Type | Limits |
|-------|------|--------|
| `tags` | `string[]` | Max 100 tags, each max 50 chars |
| `user_id` | `string` | Max 255 chars |
| `trace_id` | `string` | Max 255 chars |
| `custom_fields` | `Record<string, string>` | Max 10 fields, keys max 50 chars, values max 200 chars |

---

## Page: API Overview > Section: Response headers

Every response carries custom headers organized by category.

### Request tracing

| Header | Description |
|--------|-------------|
| `X-Request-ID` | Unique request identifier for support and debugging |

### Routing headers

| Header | Description |
|--------|-------------|
| `X-Provider-Used` | Provider that served the request |
| `X-Model-Requested` | Model ID from the original request |
| `X-Model-Canonical` | Canonical model ID after alias resolution |
| `X-Model-Used` | Actual model ID sent to the provider |
| `X-Routing-Strategy` | Strategy used (`cost`, `speed`, `balanced`, etc.) |
| `X-Routing-Time-Ms` | Time spent on routing decision |
| `X-Api-Key-Source` | Key type used (`platform` or `byok`) |
| `X-Multi-Model-Count` | Number of models in a multi-model request |

### Fallback headers

| Header | Description |
|--------|-------------|
| `X-Fallback-Enabled` | Whether fallback is enabled for this request |
| `X-Fallback-Used` | Whether a fallback was triggered |
| `X-Fallback-Depth` | Number of fallback attempts made |
| `X-Fallback-Original-Provider` | Provider of the first (failed) attempt |
| `X-Fallback-Attempted-Providers` | Comma-separated list of all attempted providers |
| `X-Fallback-Reason` | Reason the primary provider failed |
| `X-Fallback-Total-Time-Ms` | Total time across all attempts |
| `X-Fallback-Max-Attempts` | Maximum fallback attempts configured |

### Error diagnostic headers

| Header | Description |
|--------|-------------|
| `X-Error-Provider` | Provider that returned the error |
| `X-Error-Type` | Error classification |
| `X-Error-Retryable` | Whether the error is safe to retry |

### Billing headers

| Header | Description |
|--------|-------------|
| `X-Credits-Balance-Microdollars` | Current credit balance in microdollars |
| `X-Credits-Tier` | Current billing tier |

Per-key rate limit headers are covered in [Rate limits](/platform/rate-limits#rate-limit-headers).

### Budget headers

Returned when budgets are configured. Error headers appear on 402 responses when a budget is exceeded. Spend and limit headers appear on successful responses.

| Header | Description |
|--------|-------------|
| `X-Budget-Exceeded` | `true` when the request was rejected for exceeding a budget (402 only) |
| `X-Budget-Exceeded-Period` | Budget period that was exceeded: `daily`, `weekly`, or `monthly` (402 only) |
| `X-Budget-Exceeded-Scope` | Scope of the exceeded budget: `workspace`, `api_key`, or `byok_provider` (402 only) |
| `X-Budget-Daily-Spend` | Current daily spend in USD |
| `X-Budget-Daily-Limit` | Configured daily budget limit in USD |
| `X-Budget-Weekly-Spend` | Current weekly spend in USD |
| `X-Budget-Weekly-Limit` | Configured weekly budget limit in USD |
| `X-Budget-Monthly-Spend` | Current monthly spend in USD |
| `X-Budget-Monthly-Limit` | Configured monthly budget limit in USD |

Spend and limit headers appear only for budget periods that are configured. If a workspace has only a monthly budget, daily and weekly headers are omitted.

### Cache headers

| Header | Description |
|--------|-------------|
| `X-Cache-Savings-Percent` | Percentage of tokens saved via prompt caching |

  Learn how to authenticate your requests

---

## Page: Authentication

All API requests require authentication using a Bearer token.

---

## Page: Authentication > Section: API Keys

API keys are prefixed with `ak_` and can be created in your [dashboard](https://auriko.ai/dashboard).

  Keep your API key secret. Do not share it or commit it to version control.

---

## Page: Authentication > Section: Use your API key

Include your API key in the `Authorization` header:

```bash
curl https://api.auriko.ai/v1/chat/completions \
  -H "Authorization: Bearer $AURIKO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello"}]}'
```

---

## Page: Authentication > Section: SDK Authentication

```python Python
import os
from auriko import Client

# Option 1: Pass via environment variable (recommended)
client = Client(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

# Option 2: Auto-detect from AURIKO_API_KEY env var
client = Client(base_url="https://api.auriko.ai/v1")
```

```typescript TypeScript
import { Client } from "@auriko/sdk";

// Option 1: Pass via environment variable (recommended)
const client = new Client({
    apiKey: process.env.AURIKO_API_KEY,
    baseUrl: "https://api.auriko.ai/v1",
});

// Option 2: Auto-detect from AURIKO_API_KEY env var
const client = new Client({
    baseUrl: "https://api.auriko.ai/v1",
});
```

---

## Page: Authentication > Section: Environment Variables

Set your API key as an environment variable for security:

```bash
export AURIKO_API_KEY=ak_your_api_key_here
```

Then use the SDK without passing the key directly:

```python
from auriko import Client

client = Client(base_url="https://api.auriko.ai/v1")
```

---

## Page: Authentication > Section: Error Responses

| Status | Code | Description |
|--------|------|-------------|
| 401 | `invalid_api_key` | API key is invalid or missing |

```json 401 Response
{
  "error": {
    "message": "Invalid API key provided",
    "type": "invalid_request_error",
    "code": "invalid_api_key"
  }
}
```

---

## Page: Authentication > Section: Session authentication

Management API endpoints use session tokens for authentication. Workspace and budget read endpoints also accept API keys.

The dashboard handles session authentication automatically. For programmatic access, use the returned `access_token` from your sign-in flow as a Bearer token:

```bash
curl https://api.auriko.ai/v1/workspaces \
  -H "Authorization: Bearer $SESSION_JWT"
```

---

## Page: Authentication > Section: Authentication summary

| Category | Auth method | Token prefix |
|----------|-------------|-------------|
| Inference (`/v1/chat/completions`, `/v1/models`, `/v1/me`) | API key | `ak_` |
| Discovery (`/v1/registry/*`, `/v1/directory/*`) | None required | — |
| Workspace & budget reads | API key or Session token | `ak_` / JWT |
| All other management | Session token | JWT |

---

## Page: Errors

Auriko uses standard HTTP status codes and returns detailed error information in the response body.

---

## Page: Errors > Section: Error response format

All errors follow this format:

```json
{
  "error": {
    "message": "Human-readable error message",
    "type": "error_type",
    "code": "error_code",
    "param": "optional_parameter_name"
  }
}
```

---

## Page: Errors > Section: HTTP status codes

| Status | Description |
|--------|-------------|
| 400 | Bad Request — Invalid parameters or missing required fields |
| 401 | Unauthorized — Invalid API key or BYOK provider key |
| 402 | Payment Required — Insufficient credits or budget exceeded |
| 403 | Forbidden — Insufficient permissions for this action |
| 404 | Not Found — Model not in catalog |
| 429 | Too Many Requests — Rate limit exceeded |
| 500 | Internal Server Error — Unexpected Auriko error |
| 502 | Bad Gateway — Upstream provider error |
| 503 | Service Unavailable — No providers available or service down |
| 504 | Gateway Timeout — Upstream provider timeout |

---

## Page: Errors > Section: Error codes

### Authentication (401)

| Code | Description |
|------|-------------|
| `invalid_api_key` | The API key is invalid or missing |
| `provider_auth_error` | BYOK key authentication failed at the upstream provider |

  `provider_auth_error` means your own provider key (BYOK) was rejected. This is distinct from `invalid_api_key`, which means your Auriko API key is invalid.

### Request errors (400)

| Code | Description |
|------|-------------|
| `invalid_request` | The request body is malformed or a parameter has an invalid value |
| `missing_required_parameter` | A required parameter is missing |

The `param` field in the error response identifies the offending field (for example, `messages` or `routing.only_byok`).

### Billing errors (402)

| Code | Description |
|------|-------------|
| `insufficient_quota` | Account has insufficient credits |
| `budget_exceeded` | Budget limit hit (workspace, key, or BYOK scope) |

### Authorization (403)

| Code | Description |
|------|-------------|
| `forbidden` | You don't have permission for this action |

API keys are read-only for management endpoints (budgets, workspaces). Write operations require session authentication. See [Budget management](/guides/budget-management#authentication).

### Not found (404)

| Code | Description |
|------|-------------|
| `model_not_found` | The specified model is not in the catalog |

### Rate limiting (429)

| Code | Description |
|------|-------------|
| `rate_limit_exceeded` | Too many requests — back off and retry |

### Server errors (500)

| Code | Description |
|------|-------------|
| `internal_error` | An unexpected error occurred on Auriko's side |

### Provider errors (502, 503, 504)

| Code | Status | Description |
|------|--------|-------------|
| `provider_error` | 502 | Upstream provider returned an error |
| `provider_error` | 504 | Upstream provider timed out |
| `no_providers_available` | 503 | No providers can serve this model/request |
| `service_unavailable` | 503 | Auriko service temporarily unavailable |

---

## Page: Errors > Section: Retry guidance

| Status | Retryable | Recommended action |
|--------|-----------|-------------------|
| 400 | No | Fix the request parameters |
| 401 (`invalid_api_key`) | No | Check your Auriko API key |
| 401 (`provider_auth_error`) | No | Check your BYOK provider key |
| 402 | No | Add credits or raise budget limit |
| 403 | No | Check permissions or use session authentication |
| 404 | No | Use a valid model name (see [Models](https://optimal-inference.vercel.app/models)) |
| 429 | Yes | Back off using `Retry-After` header or exponential backoff |
| 500 | Yes | Retry with backoff |
| 502 | Yes | Retry — different provider may be selected |
| 503 | Yes | Retry with backoff |
| 504 | Yes | Retry — upstream provider timed out |

---

## Page: Errors > Section: Provider error mapping

When an upstream provider returns an error, Auriko maps it to a client-facing response:

| Upstream condition | Client status | Client code |
|-------------------|---------------|-------------|
| 504 timeout | 504 | `provider_error` |
| Other 5xx | 502 | `provider_error` |
| 401 auth failure | 401 | `provider_auth_error` |
| 429 rate limit | 429 | `rate_limit_exceeded` |
| 400 bad request | 400 | `invalid_request` |
| Other 4xx | 502 | `provider_error` |

---

## Page: Errors > Section: Handle SDK errors

```python Python
import os
from auriko import (
    Client,
    AurikoAPIError,
    AuthenticationError,
    RateLimitError,
    BudgetExceededError,
    ModelNotFoundError,
    ProviderError,
    # Also available: InvalidRequestError, InsufficientCreditsError,
    # InternalError, ProviderAuthError, ServiceUnavailableError
)

client = Client(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

try:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}]
    )
except AuthenticationError as e:
    print(f"Check your API key: {e}")
except RateLimitError as e:
    print(f"Rate limited, retry later: {e}")
except BudgetExceededError as e:
    print(f"Budget exceeded: {e}")
except ModelNotFoundError as e:
    print(f"Model not found: {e}")
except ProviderError as e:
    print(f"Provider error: {e}")
except AurikoAPIError as e:
    # Catches all other Auriko errors
    print(f"API error ({e.status_code}): {e}")
```

```typescript TypeScript
import {
    Client,
    AurikoAPIError,
    AuthenticationError,
    RateLimitError,
    BudgetExceededError,
    ModelNotFoundError,
    ProviderError,
    // Also available: InvalidRequestError, InsufficientCreditsError,
    // InternalError, ProviderAuthError, ServiceUnavailableError
} from "@auriko/sdk";

const client = new Client({
    apiKey: process.env.AURIKO_API_KEY,
    baseUrl: "https://api.auriko.ai/v1",
});

try {
    const response = await client.chat.completions.create({
        model: "gpt-4o",
        messages: [{ role: "user", content: "Hello!" }],
    });
} catch (e) {
    if (e instanceof AuthenticationError) {
        console.log(`Check your API key: ${e.message}`);
    } else if (e instanceof RateLimitError) {
        console.log(`Rate limited, retry later: ${e.message}`);
    } else if (e instanceof BudgetExceededError) {
        console.log(`Budget exceeded: ${e.message}`);
    } else if (e instanceof ModelNotFoundError) {
        console.log(`Model not found: ${e.message}`);
    } else if (e instanceof ProviderError) {
        console.log(`Provider error: ${e.message}`);
    } else if (e instanceof AurikoAPIError) {
        console.log(`API error (${e.statusCode}): ${e.message}`);
    }
}
```

---

## Page: Errors > Section: Retry logic

For transient errors (429, 500, 502, 503, 504), the SDK includes built-in retries with exponential backoff. See the [Error Handling guide](/guides/error-handling) for retry configuration and custom retry patterns.

```python
import os
from auriko import Client

# Default: 2 retries with exponential backoff
client = Client(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

# Disable retries for manual control
client = Client(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1",
    max_retries=0
)
```

  The Auriko SDK includes built-in retry logic with configurable `max_retries` parameter. The SDK respects `Retry-After` headers when present.

---

## Page: Create Chat Completion

Creates a model response for the given chat conversation.

Auriko routes the request to the optimal provider based on your routing preferences (cost, speed, throughput, etc.).

---

## Page: Create Chat Completion > Section: Auriko Extensions

Beyond OpenAI compatibility, this endpoint supports:

- **Multi-model routing**: Use `models[]` instead of `model` to route across multiple models
- **Routing options**: Control provider selection with the `routing` object
- **Provider extensions**: Pass provider-specific parameters with `extensions`
- **Cost transparency**: Response includes `routing_metadata` with cost breakdown

All parameters, request/response schemas, and examples are auto-generated from the [OpenAPI specification](/api-reference/overview).

---

## Page: Get API key identity

Returns the identity associated with your API key. Use this to discover your `workspace_id` for management API calls.

---

## Page: List Available Providers

Returns all LLM providers available on the Auriko platform.

This endpoint is public and does not require authentication.

All parameters, request/response schemas, and examples are auto-generated from the [OpenAPI specification](/api-reference/overview).

---

## Page: List Canonical Models

Returns all canonical models in the Auriko registry with provider availability and metadata.

This endpoint is public and does not require authentication.

All parameters, request/response schemas, and examples are auto-generated from the [OpenAPI specification](/api-reference/overview).

---

## Page: Full Model Directory

Returns the complete model directory with detailed provider information including context windows, capabilities, pricing tiers, and modalities. This is the richest model metadata endpoint.

This endpoint is public and does not require authentication.

All parameters, request/response schemas, and examples are auto-generated from the [OpenAPI specification](/api-reference/overview).

---

## Page: Create a Workspace

Creates a new workspace. The authenticated user becomes the owner.

This endpoint uses session authentication, not API key authentication. See [Authentication](/api-reference/authentication).

See also: [Team management](/platform/team-management).

All parameters, request/response schemas, and examples are auto-generated from the [OpenAPI specification](/api-reference/overview).

---

## Page: List Workspaces

Lists all workspaces the authenticated user is a member of.

This endpoint accepts both API key and session authentication. See [Authentication](/api-reference/authentication).

All parameters, request/response schemas, and examples are auto-generated from the [OpenAPI specification](/api-reference/overview).

---

## Page: Get Workspace Details

Returns details for a workspace the authenticated user is a member of.

This endpoint accepts both API key and session authentication. See [Authentication](/api-reference/authentication).

All parameters, request/response schemas, and examples are auto-generated from the [OpenAPI specification](/api-reference/overview).

---

## Page: Update Workspace

Updates workspace settings. Requires owner role.

This endpoint uses session authentication, not API key authentication. See [Authentication](/api-reference/authentication).

All parameters, request/response schemas, and examples are auto-generated from the [OpenAPI specification](/api-reference/overview).

---

## Page: Get Routing Defaults

Returns the workspace routing defaults. Any workspace member can read.

This endpoint uses session authentication, not API key authentication. See [Authentication](/api-reference/authentication).

See also: [Routing options](/guides/routing-options).

All parameters, request/response schemas, and examples are auto-generated from the [OpenAPI specification](/api-reference/overview).

---

## Page: Update Routing Defaults

Updates workspace routing defaults. Requires owner or admin role.

- Present fields are updated; omitted fields retain their current value
- Fields set to `null` are cleared
- Empty object `{}` clears all routing defaults

This endpoint uses session authentication, not API key authentication. See [Authentication](/api-reference/authentication).

See also: [Routing options](/guides/routing-options).

All parameters, request/response schemas, and examples are auto-generated from the [OpenAPI specification](/api-reference/overview).

---

## Page: List Budgets

Lists all budgets for the workspace. Any workspace member can read.

This endpoint accepts both API key and session authentication. See [Authentication](/api-reference/authentication).

See also: [Budget management](/guides/budget-management).

All parameters, request/response schemas, and examples are auto-generated from the [OpenAPI specification](/api-reference/overview).

---

## Page: Create a Budget

Creates a budget for the workspace. Requires owner or admin role.

- **`workspace`**: Applies to all usage in the workspace
- **`api_key`**: Applies to a specific API key (requires `scope_id`)
- **`byok_provider`**: Applies to a BYOK provider (requires `scope_provider`)

This endpoint uses session authentication, not API key authentication. See [Authentication](/api-reference/authentication).

See also: [Budget management](/guides/budget-management).

All parameters, request/response schemas, and examples are auto-generated from the [OpenAPI specification](/api-reference/overview).

---

## Page: Get Budget Details

Returns a single budget with current spend. Any workspace member can read.

This endpoint accepts both API key and session authentication. See [Authentication](/api-reference/authentication).

See also: [Budget management](/guides/budget-management).

All parameters, request/response schemas, and examples are auto-generated from the [OpenAPI specification](/api-reference/overview).

---

## Page: Update a Budget

Updates a budget. Requires owner or admin role. At least one field must be provided.

This endpoint uses session authentication, not API key authentication. See [Authentication](/api-reference/authentication).

See also: [Budget management](/guides/budget-management).

All parameters, request/response schemas, and examples are auto-generated from the [OpenAPI specification](/api-reference/overview).

---

## Page: Delete a Budget

Deletes a budget. Requires owner or admin role.

This action is irreversible. The budget and its spend history will be permanently removed.

This endpoint uses session authentication, not API key authentication. See [Authentication](/api-reference/authentication).

See also: [Budget management](/guides/budget-management).

All parameters, request/response schemas, and examples are auto-generated from the [OpenAPI specification](/api-reference/overview).

---

## Page: Create an API Key

Creates a new API key for the workspace. Requires owner or admin role.

The full API key is returned exactly once in the response. It is never stored or retrievable again. Save it securely.

This endpoint uses session authentication, not API key authentication. See [Authentication](/api-reference/authentication).

All parameters, request/response schemas, and examples are auto-generated from the [OpenAPI specification](/api-reference/overview).

---

## Page: List API Keys

Lists API keys for the workspace. Any workspace member can read. Keys are returned with prefixes only — full keys are never retrievable.

This endpoint uses session authentication, not API key authentication. See [Authentication](/api-reference/authentication).

All parameters, request/response schemas, and examples are auto-generated from the [OpenAPI specification](/api-reference/overview).

---

## Page: Revoke an API Key

Revokes an API key. Owners and admins can revoke any key in the workspace. Members can revoke keys they created.

This endpoint uses session authentication, not API key authentication. See [Authentication](/api-reference/authentication).

All parameters, request/response schemas, and examples are auto-generated from the [OpenAPI specification](/api-reference/overview).

---

## Page: Get API Key Usage

Returns usage statistics for an API key. Any workspace member can read.

This endpoint uses session authentication, not API key authentication. See [Authentication](/api-reference/authentication).

All parameters, request/response schemas, and examples are auto-generated from the [OpenAPI specification](/api-reference/overview).

---

## Page: Get Credit Balance

Returns the workspace credit balance, tier, and billing configuration. Requires owner role.

This endpoint uses session authentication, not API key authentication. See [Authentication](/api-reference/authentication).

All parameters, request/response schemas, and examples are auto-generated from the [OpenAPI specification](/api-reference/overview).

---

## Page: List Supported Providers

Returns the list of providers that support bring-your-own-key (BYOK).

This endpoint is public and does not require authentication.

See also: [Bring Your Own Key](/platform/byok).

All parameters, request/response schemas, and examples are auto-generated from the [OpenAPI specification](/api-reference/overview).

---

## Page: Add Provider Key

Adds a provider API key (BYOK) to the workspace.

This endpoint uses session authentication, not API key authentication. See [Authentication](/api-reference/authentication).

See also: [Bring Your Own Key](/platform/byok).

All parameters, request/response schemas, and examples are auto-generated from the [OpenAPI specification](/api-reference/overview).

---

## Page: List Provider Keys

Lists all provider API keys (BYOK) for the workspace.

This endpoint uses session authentication, not API key authentication. See [Authentication](/api-reference/authentication).

See also: [Bring Your Own Key](/platform/byok).

All parameters, request/response schemas, and examples are auto-generated from the [OpenAPI specification](/api-reference/overview).

---

## Page: Delete Provider Key

Deletes a provider API key from the workspace.

This endpoint uses session authentication, not API key authentication. See [Authentication](/api-reference/authentication).

See also: [Bring Your Own Key](/platform/byok).

All parameters, request/response schemas, and examples are auto-generated from the [OpenAPI specification](/api-reference/overview).

---

## Page: Validate Provider Key

Re-validates a provider API key to check if it is still active.

This endpoint uses session authentication, not API key authentication. See [Authentication](/api-reference/authentication).

See also: [Bring Your Own Key](/platform/byok).

All parameters, request/response schemas, and examples are auto-generated from the [OpenAPI specification](/api-reference/overview).

---

## Page: Set Default Provider Key

Sets the specified provider key as the default for its provider.

This endpoint uses session authentication, not API key authentication. See [Authentication](/api-reference/authentication).

See also: [Bring Your Own Key](/platform/byok).

All parameters, request/response schemas, and examples are auto-generated from the [OpenAPI specification](/api-reference/overview).

---

## Page: Update Provider Key Tier

Updates the tier associated with a provider key.

This endpoint uses session authentication, not API key authentication. See [Authentication](/api-reference/authentication).

See also: [Bring Your Own Key](/platform/byok).

All parameters, request/response schemas, and examples are auto-generated from the [OpenAPI specification](/api-reference/overview).

---

# OpenAPI Specification Reference

The following is extracted from the OpenAPI spec — the canonical source of truth for endpoint parameters, request/response schemas, and error codes.


---

## Authentication (ApiKeyAuth)

API key authentication.
Keys start with `ak_` prefix.
Example: `Authorization: Bearer ak_live_xxxxxxxxxxxx`

## Authentication (UserAuth)

User session authentication for management endpoints.
Use a Supabase session JWT as the bearer token.
Example: `Authorization: Bearer eyJhbGciOiJIUzI1NiIs...`

## Endpoint: POST /v1/chat/completions

**Create a chat completion**

Creates a model response for the given chat conversation.

Auriko routes the request to the optimal provider based on your
routing preferences (cost, speed, throughput, etc.).

## Streaming

When `stream: true`, responses are delivered as Server-Sent Events (SSE).
The final event contains `routing_metadata` with routing decision details.

## Multi-Model Routing

Use `models[]` instead of `model` to enable multi-model routing:
- `routing.mode: "pool"` (default): Best provider across all models
- `routing.mode: "fallback"`: Try models in order


### Request Parameters


- `model` (string, optional): Model ID to use. Mutually exclusive with `models`. Providing both `model` and `models` returns 400.  Examples: `gpt-4o`, `claude-3-5-sonnet`, `llama-3.1-70b`
- `models` (array[string], optional): Auriko extension: Multi-model routing.  Mutually exclusive with `model`. Providing both returns 400. Allows routing across multiple models. Use with `routing.mode`: - `pool` (default): Route to best provider across all models - `fallback`: Try models in order until one succeeds  See `routing_metadata.fallback_chain` in the response for the sequence of providers attempted when using `fallback` mode.
- `messages` (array[Message], required): The messages to generate a completion for
- `temperature` (number, optional): Sampling temperature (0-2)
- `top_p` (number, optional): Nucleus sampling parameter
- `max_tokens` (integer, optional): Maximum tokens to generate (legacy, use max_completion_tokens)
- `max_completion_tokens` (integer, optional): Maximum tokens to generate (preferred for o1/o3 models)
- `stop` (string | array, optional): Stop sequences
- `presence_penalty` (number, optional): Presence penalty (-2 to 2)
- `frequency_penalty` (number, optional): Frequency penalty (-2 to 2)
- `logit_bias` (object, optional): Token logit bias
- `seed` (integer, optional): Random seed for reproducibility
- `tools` (array[Tool], optional): Tools the model can call
- `tool_choice` (ToolChoice, optional): 
- `parallel_tool_calls` (boolean, optional): Allow parallel tool calls
- `functions` (array[FunctionDefinition], optional): **Deprecated.** Deprecated. Use `tools` instead. Auto-converted.
- `function_call` (string | object, optional): **Deprecated.** Deprecated. Use `tool_choice` instead. Auto-converted.
- `response_format` (ResponseFormat, optional): 
  - `type` (string, required): Response format type: - `text`: Plain text response (default) - `json_object`: JSON mode - model outputs valid JSON - `json_schema`: Structured output - model follows provided schema Values: `text`, `json_object`, `json_schema`.
  - `json_schema` (object, optional): Required when type is `json_schema`
- `stream` (boolean, optional): Enable streaming responses Default: `False`.
- `stream_options` (StreamOptions, optional): 
  - `include_usage` (boolean, optional): Include token usage in final streaming chunk
- `user` (string, optional): User identifier for abuse detection
- `n` (integer, optional): Number of completions to generate Default: `1`.
- `logprobs` (boolean, optional): Return log probabilities
- `top_logprobs` (integer, optional): Number of top logprobs to return
- `routing` (RoutingOptions, optional): Auriko routing configuration (15 fields).  Controls how Auriko selects providers for your request. All fields are optional. Setting a field to `null` is equivalent to omitting it.
- `extensions` (Extensions, optional): Auriko extensions for normalized features and provider-specific passthrough.  ## Normalized Features  These are translated to provider-native format automatically: - `thinking`: Enable thinking/reasoning mode  ## Provider Passthrough  Pass provider-specific parameters directly: - `anthropic`: Anthropic-specific parameters - `openai`: OpenAI-specific parameters - `google`: Google/Gemini-specific parameters - `deepseek`: DeepSeek-specific parameters  Passthrough parameters are forwarded as-is to the target provider.
- `auriko_metadata` (RequestMetadata, optional): Optional request metadata for tracking and observability. Attached via the `auriko_metadata` field on chat completion requests. Field name uses `auriko_metadata` (not `metadata`) to avoid collision with OpenAI's native metadata field.
  - `tags` (array[string], optional): Tags for categorizing requests (max 100 items, each ≤50 chars)
  - `user_id` (string, optional): Your application's user identifier for per-user analytics
  - `trace_id` (string, optional): Distributed tracing identifier to correlate with your observability stack
  - `custom_fields` (object, optional): Arbitrary key-value pairs (max 10 keys, keys ≤50 chars, values ≤200 chars)
Variants: `variant`, `variant`


### Request Examples


**Basic completion**:
```json
{
  "model": "gpt-4o",
  "messages": [
    {
      "role": "user",
      "content": "Hello!"
    }
  ]
}
```

**With routing options**:
```json
{
  "model": "claude-3-5-sonnet",
  "messages": [
    {
      "role": "user",
      "content": "Explain quantum computing"
    }
  ],
  "routing": {
    "optimize": "cost",
    "max_cost_per_1m": 5.0
  }
}
```

**Multi-model routing**:
```json
{
  "models": [
    "gpt-4o",
    "claude-3-5-sonnet"
  ],
  "messages": [
    {
      "role": "user",
      "content": "Hello!"
    }
  ],
  "routing": {
    "mode": "pool",
    "optimize": "cost"
  }
}
```


### Response (200)

Successful completion.

For streaming (`stream: true`), responses are Server-Sent Events.
Each event is a `ChatCompletionChunk`. The final chunk has `choices: []`
(empty) and contains `usage` and `routing_metadata`. Stream ends with
`data: [DONE]`.


### Response Properties


- `id` (string, required): Unique completion identifier
- `object` (string, required): 
- `created` (integer, required): Unix timestamp of creation
- `model` (string, required): Model used for completion
- `choices` (array[Choice], required): Completion choices
- `usage` (Usage, optional): 
  - `prompt_tokens` (integer, required): Input tokens used
  - `completion_tokens` (integer, required): Output tokens generated
  - `total_tokens` (integer, required): Total tokens (prompt + completion)
  - `prompt_tokens_details` (PromptTokensDetails, optional): Detailed breakdown of prompt tokens
  - `completion_tokens_details` (CompletionTokensDetails, optional): Breakdown of completion tokens. Provider-dependent: present when the upstream provider reports token-level details, absent otherwise.
- `system_fingerprint` (string, optional): System fingerprint for reproducibility
- `routing_metadata` (RoutingMetadata, optional): Routing decision metadata included in all responses.  Provides transparency into how Auriko selected the provider.


### Error Responses


- **400**: Bad request - invalid parameters
- **401**: Authentication failed
- **402**: Insufficient credits. - `insufficient_quota`: workspace balance too low
- **404**: Model not found
- **429**: Rate limit exceeded
- **500**: Internal server error
- **502**: Upstream provider failure (Bad Gateway).  All non-timeout 5xx errors from upstream providers are normalized to 502. Provider errors may also surface as: - 400: Invalid request passthrough (code: `invalid_request`) - 401: BYOK key auth failure (code: `provider_auth_error`) - 429: Provider rate limit (code: `rate_limit_exceeded`)  These use their respective status codes with the same `ErrorResponse` body format.
- **503**: Service unavailable — transient issue. Possible causes: - All providers for the model are rate-limited or unhealthy (`no_providers_available`) - Transient infrastructure issue such as KV outage (`service_unavailable`)  Note: If the model doesn't support a requested capability (e.g., reasoning), the response is 400 `capability_mismatch`, not 503. If routing constraints excluded all providers, the response is 400 `routing_constraint_unsatisfiable`.
- **504**: Upstream provider timed out. The client may retry with a longer timeout.

## Endpoint: GET /v1/models

**List available models**

Lists all models available through Auriko, including provider
availability and pricing information.

The response includes Auriko-specific extensions:
- `providers[]`: Available providers with pricing
- `catalog_version`: Version of the model catalog
- `catalog_age_seconds`: Age of the catalog data


### Response (200)

List of available models


### Response Properties


- `object` (string, required): 
- `data` (array[Model], required): 
- `catalog_version` (string, optional): Version of the model catalog
- `catalog_age_seconds` (number, optional): Age of catalog in seconds


### Error Responses


- **401**: Authentication failed
- **429**: Rate limit exceeded
- **500**: Internal server error

## Endpoint: GET /v1/registry/providers

**List available providers**

Returns all LLM providers available on the Auriko platform.


### Response (200)

List of providers


### Response Properties


- `providers` (array[ProviderResponse], required): 
- `count` (integer, required): Total number of providers


### Error Responses


- **429**: Rate limit exceeded
- **500**: Internal server error
- **502**: API gateway is unavailable. The edge worker could not reach the backend gateway.
- **503**: Service unavailable — transient issue. Possible causes: - All providers for the model are rate-limited or unhealthy (`no_providers_available`) - Transient infrastructure issue such as KV outage (`service_unavailable`)  Note: If the model doesn't support a requested capability (e.g., reasoning), the response is 400 `capability_mismatch`, not 503. If routing constraints excluded all providers, the response is 400 `routing_constraint_unsatisfiable`.

## Endpoint: GET /v1/registry/models

**List canonical models**

Returns all canonical models in the Auriko registry with provider
availability and metadata.


### Response (200)

List of canonical models


### Response Properties


- `models` (array[CanonicalModelResponse], required): 
- `count` (integer, required): Total number of models
- `source` (string, required): Data source (supabase or cache)


### Error Responses


- **429**: Rate limit exceeded
- **500**: Internal server error
- **502**: API gateway is unavailable. The edge worker could not reach the backend gateway.
- **503**: Service unavailable — transient issue. Possible causes: - All providers for the model are rate-limited or unhealthy (`no_providers_available`) - Transient infrastructure issue such as KV outage (`service_unavailable`)  Note: If the model doesn't support a requested capability (e.g., reasoning), the response is 400 `capability_mismatch`, not 503. If routing constraints excluded all providers, the response is 400 `routing_constraint_unsatisfiable`.

## Endpoint: GET /v1/directory/models

**Full model directory**

Returns the complete model directory with detailed provider information
including context windows, capabilities, pricing tiers, and modalities.
This is the richest model metadata endpoint.


### Response (200)

Full model directory


### Response Properties


- `models` (object, required): Map of canonical model ID to model entry
- `generated_at` (string, required): ISO 8601 timestamp when the directory was generated


### Error Responses


- **429**: Rate limit exceeded
- **500**: Internal server error
- **502**: API gateway is unavailable. The edge worker could not reach the backend gateway.
- **503**: Service unavailable — transient issue. Possible causes: - All providers for the model are rate-limited or unhealthy (`no_providers_available`) - Transient infrastructure issue such as KV outage (`service_unavailable`)  Note: If the model doesn't support a requested capability (e.g., reasoning), the response is 400 `capability_mismatch`, not 503. If routing constraints excluded all providers, the response is 400 `routing_constraint_unsatisfiable`.

## Endpoint: POST /v1/workspaces

**Create a workspace**

Creates a new workspace. The authenticated user becomes the owner.


### Request Parameters


- `name` (string, required): Workspace display name
- `slug` (string, optional): URL-friendly workspace slug. Auto-generated from name if omitted. Must be lowercase alphanumeric with hyphens, cannot start or end with a hyphen.


### Error Responses


- **400**: Bad request - invalid parameters
- **401**: Authentication failed
- **500**: Internal server error
- **502**: API gateway is unavailable. The edge worker could not reach the backend gateway.

## Endpoint: GET /v1/workspaces

**List workspaces**

Lists all workspaces the authenticated user is a member of.
When authenticated with an API key, returns only the key's workspace.


### Response (200)

List of workspaces


### Response Properties


- `workspaces` (array[WorkspaceResponse], required): 
- `count` (integer, required): Total number of workspaces


### Error Responses


- **401**: Authentication failed
- **500**: Internal server error
- **502**: API gateway is unavailable. The edge worker could not reach the backend gateway.

## Endpoint: GET /v1/workspaces/{workspace_id}

**Get workspace details**

Returns details for a workspace the authenticated user is a member of.
API keys can only access their own workspace.


### Response (200)

Workspace details


### Response Properties


- `id` (string, required): Workspace identifier
- `name` (string, required): Workspace display name
- `slug` (string, required): URL-friendly workspace slug
- `tier` (string, required): Current billing tier
- `billing_email` (['string', 'null'], optional): Billing contact email
- `created_at` (string, required): When the workspace was created
- `updated_at` (string, required): When the workspace was last updated
- `member_count` (['integer', 'null'], optional): Number of members in the workspace
- `user_role` (['string', 'null'], optional): Current user's role in this workspace (owner, admin, member)
- `can_use_paid_models` (boolean, required): Whether this workspace has credits for paid models


### Error Responses


- **400**: Bad request - invalid parameters
- **401**: Authentication failed
- **404**: Resource not found
- **500**: Internal server error
- **502**: API gateway is unavailable. The edge worker could not reach the backend gateway.

## Endpoint: PATCH /v1/workspaces/{workspace_id}

**Update workspace**

Updates workspace settings. Requires owner role.


### Request Parameters


- `name` (string, optional): Updated workspace display name
- `billing_email` (string, optional): Billing contact email


### Response (200)

Workspace updated


### Response Properties


- `id` (string, required): Workspace identifier
- `name` (string, required): Workspace display name
- `slug` (string, required): URL-friendly workspace slug
- `tier` (string, required): Current billing tier
- `billing_email` (['string', 'null'], optional): Billing contact email
- `created_at` (string, required): When the workspace was created
- `updated_at` (string, required): When the workspace was last updated
- `member_count` (['integer', 'null'], optional): Number of members in the workspace
- `user_role` (['string', 'null'], optional): Current user's role in this workspace (owner, admin, member)
- `can_use_paid_models` (boolean, required): Whether this workspace has credits for paid models


### Error Responses


- **400**: Bad request - invalid parameters
- **401**: Authentication failed
- **403**: You do not have permission to perform this action
- **404**: Resource not found
- **500**: Internal server error
- **502**: API gateway is unavailable. The edge worker could not reach the backend gateway.

## Endpoint: GET /v1/workspaces/{workspace_id}/routing-defaults

**Get routing defaults**

Returns the workspace routing defaults. Any workspace member can read.


### Response (200)

Current routing defaults


### Response Properties


- `workspace_id` (string, required): Workspace identifier
- `routing_defaults` (RoutingDefaults | null, optional): Current routing defaults, or null if none are configured.


### Error Responses


- **400**: Bad request - invalid parameters
- **401**: Authentication failed
- **404**: Resource not found
- **500**: Internal server error
- **502**: API gateway is unavailable. The edge worker could not reach the backend gateway.

## Endpoint: PATCH /v1/workspaces/{workspace_id}/routing-defaults

**Update routing defaults**

Updates workspace routing defaults. Requires owner or admin role.

**Merge semantics:**
- Present fields are updated
- Omitted fields retain their current value
- Fields set to `null` are cleared
- Empty object `{}` clears all routing defaults


### Request Parameters


- `optimize` (['string', 'null'], optional): Default optimization strategy Values: `cost`, `ttft`, `speed`, `throughput`, `balanced`, `cheapest`, `None`.
- `data_policy` (['string', 'null'], optional): Default data retention policy Values: `none`, `no_training`, `zdr`, `None`.
- `allow_fallbacks` (['boolean', 'null'], optional): Whether to allow automatic fallbacks
- `max_fallback_attempts` (['integer', 'null'], optional): Maximum number of fallback attempts
- `providers` (['array', 'null'], optional): Default provider allowlist
- `exclude_providers` (['array', 'null'], optional): Default provider blocklist


### Response (200)

Routing defaults updated


### Response Properties


- `workspace_id` (string, required): Workspace identifier
- `routing_defaults` (RoutingDefaults | null, optional): Current routing defaults, or null if none are configured.


### Error Responses


- **400**: Bad request - invalid parameters
- **401**: Authentication failed
- **403**: You do not have permission to perform this action
- **404**: Resource not found
- **500**: Internal server error
- **502**: API gateway is unavailable. The edge worker could not reach the backend gateway.

## Endpoint: GET /v1/workspaces/{workspace_id}/budgets

**List budgets**

Lists all budgets for the workspace. Any workspace member can read.


### Response (200)

List of budgets with current spend


### Response Properties


- `budgets` (array[BudgetResponse], required): 
- `count` (integer, required): Total number of budgets


### Error Responses


- **400**: Bad request - invalid parameters
- **401**: Authentication failed
- **404**: Resource not found
- **500**: Internal server error
- **502**: API gateway is unavailable. The edge worker could not reach the backend gateway.

## Endpoint: POST /v1/workspaces/{workspace_id}/budgets

**Create a budget**

Creates a budget for the workspace. Requires owner or admin role.

**Scope types:**
- `workspace`: Applies to all usage in the workspace
- `api_key`: Applies to a specific API key (requires `scope_id`)
- `byok_provider`: Applies to a BYOK provider (requires `scope_provider`)


### Request Parameters


- `scope_type` (string, required): Budget scope: - `workspace`: Applies to all usage - `api_key`: Scoped to a specific API key (requires `scope_id`) - `byok_provider`: Scoped to a BYOK provider (requires `scope_provider`) Values: `workspace`, `api_key`, `byok_provider`.
- `scope_id` (['string', 'null'], optional): API key ID (required when scope_type is `api_key`)
- `scope_provider` (['string', 'null'], optional): Provider ID (required when scope_type is `byok_provider`)
- `period` (string, required): Budget period Values: `monthly`, `weekly`, `daily`.
- `limit_usd` (number, required): Budget limit in USD
- `enforce` (boolean, optional): Whether to block requests when budget is exceeded Default: `True`.
- `include_byok` (boolean, optional): Whether to include BYOK usage in budget tracking (workspace scope only) Default: `False`.


### Error Responses


- **400**: Bad request - invalid parameters
- **401**: Authentication failed
- **403**: You do not have permission to perform this action
- **404**: Resource not found
- **500**: Internal server error
- **502**: API gateway is unavailable. The edge worker could not reach the backend gateway.

## Endpoint: GET /v1/workspaces/{workspace_id}/budgets/{budget_id}

**Get budget details**

Returns a single budget with current spend. Any workspace member can read.


### Response (200)

Budget details with current spend


### Response Properties


- `id` (string, required): Budget identifier
- `workspace_id` (string, required): Workspace this budget belongs to
- `scope_type` (string, required): Budget scope type Values: `workspace`, `api_key`, `byok_provider`.
- `scope_id` (['string', 'null'], optional): Scoped API key ID (when scope_type is api_key)
- `scope_provider` (['string', 'null'], optional): Scoped provider ID (when scope_type is byok_provider)
- `period` (string, required): Budget period Values: `monthly`, `weekly`, `daily`.
- `limit_usd` (number, required): Budget limit in USD
- `enforce` (boolean, required): Whether requests are blocked when exceeded
- `include_byok` (boolean, required): Whether BYOK usage is included
- `created_by` (['string', 'null'], optional): User ID who created the budget
- `created_at` (string, required): When the budget was created
- `updated_at` (string, required): When the budget was last updated
- `spend_microdollars` (integer, required): Current period spend in microdollars
- `spend_usd` (number, required): Current period spend in USD
- `percent_used` (number, required): Percentage of budget used (0-100+)


### Error Responses


- **400**: Bad request - invalid parameters
- **401**: Authentication failed
- **404**: Resource not found
- **500**: Internal server error
- **502**: API gateway is unavailable. The edge worker could not reach the backend gateway.

## Endpoint: PATCH /v1/workspaces/{workspace_id}/budgets/{budget_id}

**Update a budget**

Updates a budget. Requires owner or admin role.
At least one field must be provided.


### Request Parameters


- `limit_usd` (number, optional): Updated budget limit in USD
- `enforce` (boolean, optional): Whether to block requests when budget is exceeded
- `include_byok` (boolean, optional): Whether to include BYOK usage in budget tracking


### Response (200)

Budget updated


### Response Properties


- `id` (string, required): Budget identifier
- `workspace_id` (string, required): Workspace this budget belongs to
- `scope_type` (string, required): Budget scope type Values: `workspace`, `api_key`, `byok_provider`.
- `scope_id` (['string', 'null'], optional): Scoped API key ID (when scope_type is api_key)
- `scope_provider` (['string', 'null'], optional): Scoped provider ID (when scope_type is byok_provider)
- `period` (string, required): Budget period Values: `monthly`, `weekly`, `daily`.
- `limit_usd` (number, required): Budget limit in USD
- `enforce` (boolean, required): Whether requests are blocked when exceeded
- `include_byok` (boolean, required): Whether BYOK usage is included
- `created_by` (['string', 'null'], optional): User ID who created the budget
- `created_at` (string, required): When the budget was created
- `updated_at` (string, required): When the budget was last updated
- `spend_microdollars` (integer, required): Current period spend in microdollars
- `spend_usd` (number, required): Current period spend in USD
- `percent_used` (number, required): Percentage of budget used (0-100+)


### Error Responses


- **400**: Bad request - invalid parameters
- **401**: Authentication failed
- **403**: You do not have permission to perform this action
- **404**: Resource not found
- **500**: Internal server error
- **502**: API gateway is unavailable. The edge worker could not reach the backend gateway.

## Endpoint: DELETE /v1/workspaces/{workspace_id}/budgets/{budget_id}

**Delete a budget**

Deletes a budget. Requires owner or admin role.


### Error Responses


- **400**: Bad request - invalid parameters
- **401**: Authentication failed
- **403**: You do not have permission to perform this action
- **404**: Resource not found
- **500**: Internal server error
- **502**: API gateway is unavailable. The edge worker could not reach the backend gateway.

## Endpoint: POST /v1/workspaces/{workspace_id}/keys

**Create an API key**

Creates a new API key for the workspace. Requires owner or admin role.

**Important:** The full API key is returned exactly once in the response.
It is never stored or retrievable again. Save it securely.


### Request Parameters


- `name` (string, optional): Display name for the API key Default: `Default Key`.
- `rate_limit_rpm` (integer, optional): Custom rate limit in requests per minute (overrides tier default)
- `expires_at` (string, optional): Optional expiration time for the API key


### Error Responses


- **400**: Bad request - invalid parameters
- **401**: Authentication failed
- **403**: You do not have permission to perform this action
- **404**: Resource not found
- **500**: Internal server error
- **502**: API gateway is unavailable. The edge worker could not reach the backend gateway.

## Endpoint: GET /v1/workspaces/{workspace_id}/keys

**List API keys**

Lists API keys for the workspace. Any workspace member can read.
Keys are returned with prefixes only — full keys are never retrievable.


### Response (200)

List of API keys


### Response Properties


- `keys` (array[ApiKeyResponse], required): 
- `count` (integer, required): Total number of keys returned
- `workspace_id` (string, required): Workspace the keys belong to


### Error Responses


- **400**: Bad request - invalid parameters
- **401**: Authentication failed
- **404**: Resource not found
- **500**: Internal server error
- **502**: API gateway is unavailable. The edge worker could not reach the backend gateway.

## Endpoint: DELETE /v1/workspaces/{workspace_id}/keys/{key_id}

**Revoke an API key**

Revokes an API key. Owners and admins can revoke any key in the
workspace. Members can revoke keys they created.


### Response (200)

API key revoked


### Response Properties


- `success` (boolean, required): Whether the revocation succeeded
- `message` (string, required): Human-readable result message
- `revoked_at` (string, required): When the key was revoked


### Error Responses


- **400**: Bad request - invalid parameters
- **401**: Authentication failed
- **403**: You do not have permission to perform this action
- **404**: Resource not found
- **500**: Internal server error
- **502**: API gateway is unavailable. The edge worker could not reach the backend gateway.

## Endpoint: GET /v1/workspaces/{workspace_id}/keys/{key_id}/usage

**Get API key usage**

Returns usage statistics for an API key. Any workspace member can read.


### Response (200)

Key usage statistics


### Response Properties


- `key_id` (string, required): API key identifier
- `workspace_id` (string, required): Workspace identifier
- `period` (string, required): Usage period (day, week, month)
- `start_date` (string, required): Period start date (ISO 8601)
- `end_date` (string, required): Period end date (ISO 8601)
- `request_count` (integer, required): Total requests in the period
- `error_count` (integer, required): Total errors in the period
- `rate_limit_count` (integer, required): Rate-limited requests in the period
- `tokens_input` (integer, required): Total input tokens consumed
- `tokens_output` (integer, required): Total output tokens generated
- `tokens_total` (integer, required): Total tokens (input + output)
- `cost_usd` (number, required): Total cost in USD


### Error Responses


- **400**: Bad request - invalid parameters
- **401**: Authentication failed
- **404**: Resource not found
- **500**: Internal server error
- **502**: API gateway is unavailable. The edge worker could not reach the backend gateway.

## Endpoint: GET /v1/workspaces/{workspace_id}/billing/balance

**Get credit balance**

Returns the workspace credit balance, tier, and billing configuration.
Requires owner role.


### Response (200)

Credit balance and billing details


### Response Properties


- `balance_microdollars` (integer, required): Current balance in microdollars (1 USD = 1,000,000 μ$)
- `balance_cents` (integer, required): Current balance in cents (computed)
- `balance_dollars` (string, required): Current balance in dollars (computed, string for precision)
- `lifetime_purchased_microdollars` (integer, required): Total credits ever purchased in microdollars
- `lifetime_purchased_cents` (integer, required): Total credits ever purchased in cents (computed)
- `lifetime_used_microdollars` (integer, required): Total credits ever consumed in microdollars
- `lifetime_used_cents` (integer, required): Total credits ever consumed in cents (computed)
- `auto_reload_enabled` (boolean, required): Whether auto-reload is enabled
- `auto_reload_threshold_microdollars` (['integer', 'null'], optional): Balance threshold triggering auto-reload
- `auto_reload_threshold_cents` (['integer', 'null'], optional): Balance threshold in cents (computed)
- `auto_reload_amount_microdollars` (['integer', 'null'], optional): Target balance amount for auto-reload
- `auto_reload_amount_cents` (['integer', 'null'], optional): Target balance amount in cents (computed)
- `current_tier` (string, required): Current billing tier
- `platform_fee_rate` (['string', 'null'], optional): Current platform fee rate as decimal string
- `tier_volume_usd` (['string', 'null'], optional): Lifetime volume in USD for tier calculation
- `next_tier_threshold_usd` (['string', 'null'], optional): Volume needed to reach next tier in USD
- `byok_monthly_cap` (['integer', 'null'], optional): Monthly BYOK request cap (null if unlimited)
- `byok_monthly_remaining` (['integer', 'null'], optional): Remaining BYOK requests this month
- `has_payment_method` (boolean, required): Whether a payment method is on file
- `payment_method_last4` (['string', 'null'], optional): Last 4 digits of payment method
- `payment_method_brand` (['string', 'null'], optional): Payment method brand (visa, mastercard, etc.)


### Error Responses


- **400**: Bad request - invalid parameters
- **401**: Authentication failed
- **403**: You do not have permission to perform this action
- **404**: Resource not found
- **500**: Internal server error
- **502**: API gateway is unavailable. The edge worker could not reach the backend gateway.

## Endpoint: GET /v1/workspaces/providers

**List supported BYOK providers**

Returns the list of providers that support bring-your-own-key (BYOK).


### Response (200)

List of supported BYOK providers


### Response Properties


- `providers` (array[SupportedProviderItem], required): 


### Error Responses


- **500**: Internal server error
- **502**: API gateway is unavailable. The edge worker could not reach the backend gateway.

## Endpoint: POST /v1/workspaces/{workspace_id}/provider-keys

**Add provider API key**

Adds a provider API key (BYOK) to the workspace.


### Request Parameters


- `provider` (string, required): Provider identifier (e.g., openai, anthropic)
- `api_key` (string, required): The API key to store
- `label` (['string', 'null'], optional): Friendly name for the key
- `is_default` (boolean, optional): Whether this is the default key for the provider Default: `True`.
- `validate_before_save` (boolean, optional): Whether to validate the key before saving Default: `True`.
- `is_enterprise` (boolean, optional): User asserts this is an enterprise key Default: `False`.
- `selected_tier` (['string', 'null'], optional): User-selected tier for providers requiring manual selection


### Error Responses


- **400**: Bad request - invalid parameters
- **401**: Authentication failed
- **403**: You do not have permission to perform this action
- **500**: Internal server error
- **502**: API gateway is unavailable. The edge worker could not reach the backend gateway.

## Endpoint: GET /v1/workspaces/{workspace_id}/provider-keys

**List provider API keys**

Lists all provider API keys (BYOK) for the workspace.


### Response (200)

List of provider keys


### Response Properties


- `keys` (array[ProviderKeyResponse], required): 
- `count` (integer, required): 


### Error Responses


- **400**: Bad request - invalid parameters
- **401**: Authentication failed
- **500**: Internal server error
- **502**: API gateway is unavailable. The edge worker could not reach the backend gateway.

## Endpoint: DELETE /v1/workspaces/{workspace_id}/provider-keys/{key_id}

**Delete provider API key**

Deletes a provider API key from the workspace.


### Response (200)

Provider key deleted


### Response Properties


- `success` (boolean, required): 
- `message` (string, required): 


### Error Responses


- **400**: Bad request - invalid parameters
- **401**: Authentication failed
- **403**: You do not have permission to perform this action
- **404**: Resource not found
- **500**: Internal server error
- **502**: API gateway is unavailable. The edge worker could not reach the backend gateway.

## Endpoint: POST /v1/workspaces/{workspace_id}/provider-keys/{key_id}/validate

**Re-validate provider API key**

Re-validates a provider API key to check if it is still active.


### Response (200)

Validation result


### Response Properties


- `status` (string, required):  Values: `valid`, `invalid`, `error`.
- `message` (string, required): 
- `provider` (string, required): 


### Error Responses


- **400**: Bad request - invalid parameters
- **401**: Authentication failed
- **404**: Resource not found
- **500**: Internal server error
- **502**: API gateway is unavailable. The edge worker could not reach the backend gateway.

## Endpoint: POST /v1/workspaces/{workspace_id}/provider-keys/{key_id}/set-default

**Set provider key as default**

Sets the specified provider key as the default for its provider.


### Response (200)

Provider key set as default


### Response Properties


- `id` (string, required): 
- `workspace_id` (string, required): 
- `provider` (string, required): 
- `provider_name` (string, required): 
- `name` (string, required): 
- `key_prefix` (string, required): 
- `is_default` (boolean, required): 
- `validation_status` (string, required):  Values: `pending`, `valid`, `invalid`, `error`.
- `last_validated_at` (['string', 'null'], optional): 
- `validation_error` (['string', 'null'], optional): 
- `detected_tier` (['string', 'null'], optional): 
- `tier_source` (['string', 'null'], optional):  Values: `auto_detected`, `user_specified`, `fallback`, `None`.
- `tier_detected_at` (['string', 'null'], optional): 
- `created_at` (string, required): 
- `updated_at` (string, required): 


### Error Responses


- **400**: Bad request - invalid parameters
- **401**: Authentication failed
- **404**: Resource not found
- **500**: Internal server error
- **502**: API gateway is unavailable. The edge worker could not reach the backend gateway.

## Endpoint: PATCH /v1/workspaces/{workspace_id}/provider-keys/{key_id}/tier

**Update provider key tier**

Updates the tier associated with a provider key.


### Request Parameters


- `tier` (string, required): The tier name to set


### Response (200)

Provider key tier updated


### Response Properties


- `id` (string, required): 
- `workspace_id` (string, required): 
- `provider` (string, required): 
- `provider_name` (string, required): 
- `name` (string, required): 
- `key_prefix` (string, required): 
- `is_default` (boolean, required): 
- `validation_status` (string, required):  Values: `pending`, `valid`, `invalid`, `error`.
- `last_validated_at` (['string', 'null'], optional): 
- `validation_error` (['string', 'null'], optional): 
- `detected_tier` (['string', 'null'], optional): 
- `tier_source` (['string', 'null'], optional):  Values: `auto_detected`, `user_specified`, `fallback`, `None`.
- `tier_detected_at` (['string', 'null'], optional): 
- `created_at` (string, required): 
- `updated_at` (string, required): 


### Error Responses


- **400**: Bad request - invalid parameters
- **401**: Authentication failed
- **404**: Resource not found
- **500**: Internal server error
- **502**: API gateway is unavailable. The edge worker could not reach the backend gateway.

## Endpoint: GET /v1/me

**Get API key identity**

Returns the identity associated with your API key.
Use this to discover your workspace_id for management API calls.


### Response (200)

API key identity


### Response Properties


- `object` (string, required):  Values: `api_key_identity`.
- `user_id` (['string', 'null'], required): 
- `workspace_id` (['string', 'null'], required): 
- `tier` (['string', 'null'], required): 
- `rate_limit_rpm` (['integer', 'null'], required): 


### Error Responses


- **401**: Authentication failed
- **429**: Rate limit exceeded


## Schema: Message


Variants (discriminator: `role`): `SystemMessage`, `UserMessage`, `AssistantMessage`, `ToolMessage`
  **SystemMessage**:
  - `role` (string, required): 
  - `content` (string, required): 
  - `name` (string, optional): 
  **UserMessage**:
  - `role` (string, required): 
  - `content` (string | array, required): 
  - `name` (string, optional): 
  **AssistantMessage**:
  - `role` (string, required): 
  - `content` (['string', 'null'], optional): 
  - `name` (string, optional): 
  - `tool_calls` (array[ToolCall], optional): 
  **ToolMessage**:
  - `role` (string, required): 
  - `content` (string, required): 
  - `tool_call_id` (string, required): 


## Schema: RoutingOptions


Auriko routing configuration (15 fields).

Controls how Auriko selects providers for your request.
All fields are optional. Setting a field to `null` is equivalent to omitting it.

- `optimize` (['string', 'null'], optional): Optimization strategy: - `cost`: Minimize cost per token - `cheapest`: Absolute lowest cost (ignores other dimensions) - `ttft`: Minimize time to first token - `speed`: Minimize total latency + maximize throughput - `throughput`: Maximize tokens per second - `balanced`: Weighted combination (default) Values: `cost`, `ttft`, `speed`, `throughput`, `balanced`, `cheapest`, `None`. Default: `balanced`.
- `weights` (['object', 'null'], optional): Custom scoring weights for routing optimization. When provided, overrides the `optimize` preset coefficients. All values must be non-negative. At least one dimension must be > 0. Unspecified dimensions default to 0. Server normalizes to sum to 1.0.
  - `cost` (['number', 'null'], optional): Weight for cost minimization.
  - `ttft` (['number', 'null'], optional): Weight for time-to-first-token optimization.
  - `throughput` (['number', 'null'], optional): Weight for tokens-per-second optimization.
  - `reliability` (['number', 'null'], optional): Weight for provider reliability.
- `max_cost_per_1m` (['number', 'null'], optional): Maximum cost per 1M tokens (USD)
- `max_ttft_ms` (['integer', 'null'], optional): Maximum time to first token (milliseconds)
- `min_throughput_tps` (['number', 'null'], optional): Minimum throughput (tokens per second)
- `min_success_rate` (['number', 'null'], optional): Minimum provider success rate (0-1)
- `providers` (['array', 'null'], optional): Provider allowlist. Only consider these providers.  Examples: `["openai", "anthropic", "fireworks_ai"]`
- `exclude_providers` (['array', 'null'], optional): Provider blocklist. Exclude these providers.  Examples: `["together_ai"]`
- `prefer` (['string', 'null'], optional): Preference boost for this provider. Provider will be selected if it meets constraints.
- `mode` (['string', 'null'], optional): How to interpret `models[]` array: - `pool` (default): Route to best provider across all models - `fallback`: Try models in order until one succeeds Values: `pool`, `fallback`, `None`. Default: `pool`.
- `allow_fallbacks` (['boolean', 'null'], optional): Enable automatic fallback to alternative providers on failure Default: `True`.
- `max_fallback_attempts` (['integer', 'null'], optional): Maximum fallback attempts before giving up Default: `3`.
- `data_policy` (['string', 'null'], optional): Data retention policy requirement: - `none`: No restrictions (default) - `no_training`: Provider must not use data for training - `zdr`: Zero Data Retention (strictest) Values: `none`, `no_training`, `zdr`, `None`. Default: `none`.
- `only_byok` (['boolean', 'null'], optional): Only use Bring Your Own Key (BYOK) providers. Mutually exclusive with `only_platform`. Returns 400 if both are set. Default: `False`.
- `only_platform` (['boolean', 'null'], optional): Only use platform-managed API keys. Mutually exclusive with `only_byok`. Returns 400 if both are set. Default: `False`.


## Schema: Extensions


Auriko extensions for normalized features and provider-specific passthrough.

## Normalized Features

These are translated to provider-native format automatically:
- `thinking`: Enable thinking/reasoning mode

## Provider Passthrough

Pass provider-specific parameters directly:
- `anthropic`: Anthropic-specific parameters
- `openai`: OpenAI-specific parameters
- `google`: Google/Gemini-specific parameters
- `deepseek`: DeepSeek-specific parameters

Passthrough parameters are forwarded as-is to the target provider.

- `thinking` (ThinkingConfig, optional): Normalized thinking/reasoning configuration.  Translates to provider-native format: - Anthropic: thinking block configuration - OpenAI o1/o3: reasoning_effort based on budget - DeepSeek R1: Native support - Gemini 2.0 Flash Thinking: thinking_config
- `anthropic` (object, optional): Anthropic-specific parameters (passed through)
- `openai` (object, optional): OpenAI-specific parameters (passed through)
- `google` (object, optional): Google/Gemini-specific parameters (passed through)
- `deepseek` (object, optional): DeepSeek-specific parameters (passed through)


## Schema: ThinkingConfig


Normalized thinking/reasoning configuration.

Translates to provider-native format:
- Anthropic: thinking block configuration
- OpenAI o1/o3: reasoning_effort based on budget
- DeepSeek R1: Native support
- Gemini 2.0 Flash Thinking: thinking_config

- `enabled` (boolean, optional): Enable thinking/reasoning mode
- `budget_tokens` (integer, optional): Token budget for thinking (provider-dependent minimum)


## Schema: RoutingMetadata


Routing decision metadata included in all responses.

Provides transparency into how Auriko selected the provider.

- `provider` (string, required): Provider name (e.g., "fireworks_ai", "anthropic")
- `provider_model_id` (string, required): Provider's model ID
- `tier` (string, optional): Pricing tier if applicable (e.g., "flex", "standard")
- `model_canonical` (string, required): Canonical model ID requested
- `routing_strategy` (string, required): Strategy used for routing. Known values: `cost`, `ttft`, `speed`, `throughput`, `balanced`, `cheapest`, `custom`. `custom` is returned when explicit `routing.weights` are provided. Additional strategies may be added in future versions.
- `candidates_total` (integer, required): Total catalog offerings before filtering (offering-level)
- `candidates_viable` (integer, required): Source-level candidates after filtering and dedup (offering x key source pairs entering ranking)
- `routing_decision_ms` (number, required): Time spent in routing decision (ms)
- `ttft_ms` (number, optional): Time to first token (streaming only)
- `total_latency_ms` (number, required): Total request latency (ms)
- `cost` (CostInfo, optional): Cost breakdown for the request
- `fallback_chain` (array[FallbackChainEntry], optional): Providers attempted during fallback, in order. Only includes providers that were actually called. Providers skipped due to cooldown, missing keys, or policy constraints are NOT included. Absent if no fallback was needed (primary succeeded).
- `warnings` (array[string], optional): Warnings about ignored/unsupported configuration


## Schema: CostInfo


Cost breakdown for the request

- `input_tokens` (integer, required): Input tokens used
- `output_tokens` (integer, required): Output tokens generated
- `provider_cost_usd` (number, required): Cost at provider rates (USD)
- `billable_cost_usd` (number, required): Billable cost including margin (USD)


## Error Response Details


- **BadRequest**: Bad request - invalid parameters — example: "Missing required parameter: 'model'."

- **Unauthorized**: Authentication failed — code: `invalid_api_key`, message: "Invalid API key or unauthorized access."

- **InsufficientCredits**: Insufficient credits. - `insufficient_quota`: workspace balance too low — example: "Insufficient credits to complete this request. Please add credits to your account."

- **ModelNotFound**: Model not found — code: `model_not_found`, message: "Model 'unknown-model' not found."

- **RateLimited**: Rate limit exceeded — code: `rate_limit_exceeded`, message: "Rate limit exceeded. Please retry after 60 seconds."

- **InternalError**: Internal server error — code: `internal_error`, message: "An internal server error occurred."

- **ServiceUnavailable**: Service unavailable — transient issue. Possible causes: - All providers for the model are rate-limited or unhealthy (`no_providers_available`) - Transient infrastructure issue such as KV outage (`service_unavailable`)  Note: If the model doesn't support a requested capability (e.g., reasoning), the response is 400 `capability_mismatch`, not 503. If routing constraints excluded all providers, the response is 400 `routing_constraint_unsatisfiable`. — example: "No providers available for model 'gpt-4o'."

- **ProviderError**: Upstream provider failure (Bad Gateway).  All non-timeout 5xx errors from upstream providers are normalized to 502. Provider errors may also surface as: - 400: Invalid request passthrough (code: `invalid_request`) - 401: BYOK key auth failure (code: `provider_auth_error`) - 429: Provider rate limit (code: `rate_limit_exceeded`)  These use their respective status codes with the same `ErrorResponse` body format. — code: `provider_error`, message: "All providers failed for model gpt-4o (attempted: openai, azure). Last error: Bad Gateway"

- **ProviderTimeout**: Upstream provider timed out. The client may retry with a longer timeout. — code: `provider_error`, message: "All providers failed for model gpt-4o (attempted: openai). Last error: Gateway Timeout"

- **Forbidden**: You do not have permission to perform this action — code: `forbidden`, message: "You do not have permission to perform this action."

- **NotFound**: Resource not found — code: `not_found`, message: "The requested resource was not found."

- **GatewayUnavailable**: API gateway is unavailable. The edge worker could not reach the backend gateway. — code: `gateway_unavailable`, message: "API gateway is temporarily unavailable. Please retry."

===

# Auriko Python SDK Reference

## Page: Python SDK

The `auriko` Python package provides an OpenAI-compatible client for the Auriko API.

  Complete API reference with all types, parameters, and examples

---

## Page: Python SDK > Section: Installation

```bash
pip install auriko
```

Requires Python 3.10 or later.

---

## Page: Python SDK > Section: Get started

```python
from auriko import Client

client = Client()  # reads AURIKO_API_KEY from environment

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)
```

---

## Page: Python SDK > Section: Configure

### API Key

```python
import os

# Option 1: Auto-detect from AURIKO_API_KEY env var (recommended)
client = Client()

# Option 2: Pass explicitly
client = Client(api_key=os.environ["AURIKO_API_KEY"])
```

### Base URL

```python
# Default: https://api.auriko.ai/v1
# Override for self-hosted or proxy setups:
client = Client(base_url="https://your-proxy.example.com/v1")
```

### Timeout

```python
client = Client(timeout=60.0)  # seconds
```

### Retries

```python
client = Client(max_retries=3)  # default is 2
```

---

## Page: Python SDK > Section: Create chat completions

### Basic request

Send a chat completion request:

```python
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is 2+2?"}
    ]
)

print(response.choices[0].message.content)
```

### With routing options

```python
response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello!"}],
    routing={
        "optimize": "cost",
        "max_ttft_ms": 200,
    }
)

# Access routing metadata
print(f"Provider: {response.routing_metadata.provider}")
if response.routing_metadata.cost:
    print(f"Cost: ${response.routing_metadata.cost.billable_cost_usd:.6f}")
```

You can also pass a `RoutingOptions` object for IDE autocomplete and validation:

```python
from auriko.route_types import RoutingOptions, Optimize

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello!"}],
    routing=RoutingOptions(optimize=Optimize.COST, max_ttft_ms=200)
)
```

**All routing fields:**

| Field | Type | Description |
|-------|------|-------------|
| `optimize` | `Optimize` | Strategy: `"cost"`, `"speed"`, `"ttft"`, `"throughput"`, `"balanced"`, `"cheapest"` |
| `weights` | `dict[str, float]` | Custom scoring weights: `cost`, `ttft`, `throughput`, `reliability`. Overrides preset. |
| `max_cost_per_1m` | `float` | Max cost per 1M tokens |
| `max_ttft_ms` | `int` | Max time to first token (ms) |
| `min_throughput_tps` | `float` | Min tokens per second |
| `min_success_rate` | `float` | Min provider success rate (0.0–1.0) |
| `providers` | `list[str]` | Allowlist of providers |
| `exclude_providers` | `list[str]` | Blocklist of providers |
| `prefer` | `str` | Preferred provider (soft preference) |
| `mode` | `Mode` | `"pool"` (default) or `"fallback"` |
| `allow_fallbacks` | `bool` | Enable fallback on failure |
| `max_fallback_attempts` | `int` | Max fallback retries |
| `data_policy` | `DataPolicy` | `"none"`, `"no_training"`, `"zdr"` |
| `only_byok` | `bool` | Only use BYOK providers |
| `only_platform` | `bool` | Only use platform providers |

See [Advanced Routing](/guides/advanced-routing) for detailed strategy guides.

### Multi-model routing

Route a request across multiple models. The router picks the best option based on your routing strategy:

```python
response = client.chat.completions.create(
    models=["gpt-4o", "claude-sonnet-4-20250514", "gemini-2.5-flash"],
    messages=[{"role": "user", "content": "Explain quantum computing briefly."}],
    routing={"optimize": "cost"}
)

print(f"Model used: {response.model}")
print(f"Provider: {response.routing_metadata.provider}")
print(response.choices[0].message.content)
```

  `model` and `models` are mutually exclusive. Specify exactly one. Passing both raises `InvalidRequestError`.

### Extended thinking

Enable extended reasoning for complex tasks using the `extensions` parameter:

```python
response = client.chat.completions.create(
    model="claude-sonnet-4-20250514",
    messages=[{"role": "user", "content": "Solve step by step: what is 23! / 20!?"}],
    extensions={"thinking": {"enabled": True, "budget_tokens": 10000}}
)

# Access the reasoning output (if the model returns it)
if response.choices[0].message.reasoning_content:
    print(f"Reasoning: {response.choices[0].message.reasoning_content}")
print(f"Answer: {response.choices[0].message.content}")
```

You can also pass provider-specific parameters through `extensions`:

```python
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    extensions={"openai": {"logit_bias": {"1234": -100}}}
)
```

See [Extensions and Thinking](/guides/extensions-and-thinking) for provider details and streaming thinking output.

### Request metadata

Attach metadata to requests for tracking and analytics:

```python
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    auriko_metadata={"session_id": "abc-123", "user_tier": "premium"}
)
```

The Auriko dashboard logs and displays your metadata.

### Stream responses

```python
stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Count to 10"}],
    stream=True
)

for chunk in stream:
    if chunk.choices and chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
```

After consuming all chunks, access stream-level metadata:

```python
print(f"\nProvider: {stream.routing_metadata.provider}")
print(f"Tokens: {stream.usage.total_tokens}")
print(f"Request ID: {stream.response_headers.request_id}")
```

Use a context manager for automatic cleanup:

```python
with client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Count to 10"}],
    stream=True
) as stream:
    for chunk in stream:
        if chunk.choices and chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="", flush=True)
# stream is automatically closed
```

Or close manually with `stream.close()`.

  Routing metadata, usage, and response headers are available only after consuming all chunks.

See [Streaming Guide](/guides/streaming) for full patterns including tool call streaming.

### Tool calling

```python
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string"}
                },
                "required": ["city"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What's the weather in Paris?"}],
    tools=tools
)

if response.choices[0].message.tool_calls:
    tool_call = response.choices[0].message.tool_calls[0]
    print(f"Function: {tool_call.function.name}")
    print(f"Arguments: {tool_call.function.arguments}")
```

See [Tool Calling Guide](/guides/tool-calling) for multi-turn tool conversations.

---

## Page: Python SDK > Section: Read response headers

Every response and error includes a `response_headers` object with typed accessors:

```python
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

response.response_headers.request_id                  # str | None
response.response_headers.rate_limit_remaining         # int | None
response.response_headers.rate_limit_limit             # int | None
response.response_headers.rate_limit_reset             # str | None
response.response_headers.credits_balance_microdollars # int | None
response.response_headers.provider_used                # str | None
response.response_headers.routing_strategy             # str | None
response.response_headers.get("x-custom-header")       # generic lookup
```

| Property | Header | Type |
|----------|--------|------|
| `request_id` | `x-request-id` | `str \| None` |
| `rate_limit_remaining` | `x-ratelimit-remaining-requests` | `int \| None` |
| `rate_limit_limit` | `x-ratelimit-limit-requests` | `int \| None` |
| `rate_limit_reset` | `x-ratelimit-reset-requests` | `str \| None` |
| `credits_balance_microdollars` | `x-credits-balance-microdollars` | `int \| None` |
| `provider_used` | `x-provider-used` | `str \| None` |
| `routing_strategy` | `x-routing-strategy` | `str \| None` |

Error objects also carry `response_headers`. Use `e.response_headers.request_id` when filing support tickets to correlate with server logs.

See the [Python SDK Reference](/sdk/python-reference#response-headers) for the complete `ResponseHeaders` API.

---

## Page: Python SDK > Section: Read token usage

The `Usage` object on every response carries optional detail breakdowns:

```python
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

usage = response.usage

# Prompt token breakdown
if usage.prompt_tokens_details:
    print(f"Cached: {usage.prompt_tokens_details.cached_tokens}")
    print(f"Text: {usage.prompt_tokens_details.text_tokens}")
    print(f"Image: {usage.prompt_tokens_details.image_tokens}")
    print(f"Audio: {usage.prompt_tokens_details.audio_tokens}")

# Completion token breakdown
if usage.completion_tokens_details:
    print(f"Reasoning: {usage.completion_tokens_details.reasoning_tokens}")
    print(f"Text: {usage.completion_tokens_details.text_tokens}")
```

| Field | Sub-fields | Type |
|-------|-----------|------|
| `prompt_tokens_details` | `cached_tokens`, `text_tokens`, `image_tokens`, `audio_tokens` | `Optional[int]` each |
| `completion_tokens_details` | `reasoning_tokens`, `text_tokens`, `image_tokens`, `audio_tokens` | `Optional[int]` each |

Availability depends on the provider. `completion_tokens_details.reasoning_tokens` is present for OpenAI o-series, DeepSeek, xAI, and Google Gemini. It's `None` for providers that don't report reasoning token counts (Anthropic, Moonshot, Fireworks).

See [Check reasoning token availability](/guides/extensions-and-thinking#check-reasoning-token-availability) for the full breakdown.

---

## Page: Python SDK > Section: Handle errors

Catch typed exceptions:

```python
from auriko import (
    Client,
    AurikoAPIError,
    AuthenticationError,
    RateLimitError,
    BudgetExceededError,
    ModelNotFoundError,
    ProviderError,
    # Also available: InvalidRequestError, InsufficientCreditsError,
    # InternalError, ProviderAuthError, ServiceUnavailableError
)

client = Client()

try:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}]
    )
except AuthenticationError as e:
    print(f"Check your API key: {e}")
except RateLimitError as e:
    print(f"Rate limited: {e}")
except BudgetExceededError as e:
    print(f"Budget exceeded: {e}")
except ModelNotFoundError as e:
    print(f"Model not found: {e}")
except ProviderError as e:
    print(f"Provider error: {e}")
except AurikoAPIError as e:
    print(f"API error ({e.status_code}): {e}")
```

See [Error Handling Guide](/guides/error-handling) for retry patterns and `map_openai_error()`.

---

## Page: Python SDK > Section: Use management APIs

Query workspace, budget, and model information:

```python
# Identity (discover your workspace)
identity = client.me.get()
print(f"Workspace: {identity.workspace_id}")

# Workspaces
workspaces = client.workspaces.list()
workspace = client.workspaces.get("ws-123")

# Budgets
budgets = client.budgets.list("ws-123")
budget = client.budgets.get("ws-123", "budget-456")

# Models
registry = client.models.list_registry()
directory = client.models.list_directory()
providers = client.models.list_providers()
```

### Model listing choices

| Method | Returns | Use when |
|--------|---------|----------|
| `list_registry()` | Flat list: `id`, `family`, `display_name` | You need a quick model ID lookup |
| `list_directory()` | Rich detail: provider entries, context windows, capabilities, pricing tiers | You need to compare providers or check capabilities |
| `list_providers()` | Provider catalog: display name, description, data policy | You need to see available providers |

See the [Python SDK Reference](/sdk/python-reference) for the complete API.

---

## Page: Python SDK > Section: Use async client

Use the async client for non-blocking requests:

```python
from auriko import AsyncClient

async def main():
    client = AsyncClient()

    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}]
    )

    print(response.choices[0].message.content)

import asyncio
asyncio.run(main())
```

### Async streaming

Stream responses asynchronously:

```python
from auriko import AsyncClient

async def stream_response():
    client = AsyncClient()

    stream = await client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Count to 10"}],
        stream=True
    )

    async for chunk in stream:
        if chunk.choices and chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="", flush=True)
```

### Async context manager

Use `async with` for automatic connection cleanup:

```python
from auriko import AsyncClient

async def main():
    async with AsyncClient() as client:
        response = await client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": "Hello!"}]
        )
        print(response.choices[0].message.content)
    # client.close() called automatically
```

Or close explicitly: `await client.close()`

---

## Page: Python SDK > Section: Use context managers

Use a context manager for automatic cleanup:

```python
with Client() as client:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}]
    )
    print(response.choices[0].message.content)
```

---

## Page: Python SDK > Section: SDK scope

The Auriko SDK covers: inference (chat completions with routing), read-only management (workspaces, budgets, identity), and model discovery. For full platform operations (workspace creation, budget management, API key rotation), use the [REST API](/api-reference/overview) directly.

---

## Page: Python SDK > Section: Use type hints

The SDK provides typed responses, errors, and routing configuration. Use your IDE's autocomplete for the best experience:

```python
from auriko import Client
from auriko.models.chat import ChatCompletion, ChatCompletionChunk

client = Client()

response: ChatCompletion = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)
```

===

# Auriko TypeScript SDK Reference

## Page: TypeScript SDK

The `@auriko/sdk` package provides a typed TypeScript client for the Auriko API.

  Complete API reference with all types, parameters, and examples

---

## Page: TypeScript SDK > Section: Installation

```bash
npm install @auriko/sdk
# or
yarn add @auriko/sdk
# or
pnpm add @auriko/sdk
```

---

## Page: TypeScript SDK > Section: Get started

```typescript
import { Client } from "@auriko/sdk";

const client = new Client(); // reads AURIKO_API_KEY from environment

const response = await client.chat.completions.create({
    model: "gpt-5.4",
    messages: [{ role: "user", content: "Hello!" }],
});

console.log(response.choices[0].message.content);
```

---

## Page: TypeScript SDK > Section: Configure

### API Key

```typescript
// Option 1: Auto-detect from AURIKO_API_KEY env var (recommended)
const client = new Client();

// Option 2: Pass explicitly
const client = new Client({
    apiKey: process.env.AURIKO_API_KEY,
});
```

### Base URL

```typescript
// Default: https://api.auriko.ai/v1
// Override for self-hosted or proxy setups:
const client = new Client({
    baseUrl: "https://your-proxy.example.com/v1",
});
```

### Timeout

```typescript
const client = new Client({
    timeout: 60000, // milliseconds
});
```

### Retries

```typescript
const client = new Client({
    maxRetries: 3, // default is 2
});
```

---

## Page: TypeScript SDK > Section: Create chat completions

### Basic request

Send a chat completion request:

```typescript
const response = await client.chat.completions.create({
    model: "gpt-4o",
    messages: [
        { role: "system", content: "You are a helpful assistant." },
        { role: "user", content: "What is 2+2?" },
    ],
});

console.log(response.choices[0].message.content);
```

### With routing options

```typescript
import { Optimize } from "@auriko/sdk";

const response = await client.chat.completions.create({
    model: "gpt-5.4",
    messages: [{ role: "user", content: "Hello!" }],
    routing: {
        optimize: "cost",
        max_ttft_ms: 200,
    },
});

// Access routing metadata
console.log(`Provider: ${response.routing_metadata?.provider}`);
if (response.routing_metadata?.cost) {
    console.log(`Cost: $${response.routing_metadata.cost.billable_cost_usd}`);
}
```

You can also use the `RoutingOptions` type with enum constants for IDE autocomplete:

```typescript
import { Optimize } from "@auriko/sdk";
import type { RoutingOptions } from "@auriko/sdk";

const routing: RoutingOptions = {
    optimize: Optimize.COST,
    max_ttft_ms: 200,
};
```

**All routing fields:**

| Field | Type | Description |
|-------|------|-------------|
| `optimize` | `Optimize` | Strategy: `"cost"`, `"speed"`, `"ttft"`, `"throughput"`, `"balanced"`, `"cheapest"` |
| `weights` | `RoutingWeights` | Custom scoring weights: `cost`, `ttft`, `throughput`, `reliability`. Overrides preset. |
| `max_cost_per_1m` | `number` | Max cost per 1M tokens |
| `max_ttft_ms` | `number` | Max time to first token (ms) |
| `min_throughput_tps` | `number` | Min tokens per second |
| `min_success_rate` | `number` | Min provider success rate (0.0–1.0) |
| `providers` | `string[]` | Allowlist of providers |
| `exclude_providers` | `string[]` | Blocklist of providers |
| `prefer` | `string` | Preferred provider (soft preference) |
| `mode` | `Mode` | `"pool"` (default) or `"fallback"` |
| `allow_fallbacks` | `boolean` | Enable fallback on failure |
| `max_fallback_attempts` | `number` | Max fallback retries |
| `data_policy` | `DataPolicy` | `"none"`, `"no_training"`, `"zdr"` |
| `only_byok` | `boolean` | Only use BYOK providers |
| `only_platform` | `boolean` | Only use platform providers |

See [Advanced Routing](/guides/advanced-routing) for detailed strategy guides.

### Multi-model routing

Route a request across multiple models. The router picks the best option based on your routing strategy:

```typescript
const response = await client.chat.completions.create({
    models: ["gpt-4o", "claude-sonnet-4-20250514", "gemini-2.5-flash"],
    messages: [{ role: "user", content: "Explain quantum computing briefly." }],
    routing: { optimize: "cost" },
});

console.log(`Model used: ${response.model}`);
console.log(`Provider: ${response.routing_metadata?.provider}`);
console.log(response.choices[0].message.content);
```

  `model` and `models` are mutually exclusive. Specify exactly one. Passing both raises `InvalidRequestError`.

### Extended thinking

Enable extended reasoning for complex tasks using the `extensions` parameter:

```typescript
const response = await client.chat.completions.create({
    model: "claude-sonnet-4-20250514",
    messages: [{ role: "user", content: "Solve step by step: what is 23! / 20!?" }],
    extensions: { thinking: { enabled: true, budget_tokens: 10000 } },
});

// Access the reasoning output (if the model returns it)
if (response.choices[0].message.reasoning_content) {
    console.log(`Reasoning: ${response.choices[0].message.reasoning_content}`);
}
console.log(`Answer: ${response.choices[0].message.content}`);
```

You can also pass provider-specific parameters through `extensions`:

```typescript
const response = await client.chat.completions.create({
    model: "gpt-4o",
    messages: [{ role: "user", content: "Hello!" }],
    extensions: { openai: { logit_bias: { "1234": -100 } } },
});
```

See [Extensions and Thinking](/guides/extensions-and-thinking) for provider details and streaming thinking output.

### Request metadata

Attach metadata to requests for tracking and analytics:

```typescript
const response = await client.chat.completions.create({
    model: "gpt-4o",
    messages: [{ role: "user", content: "Hello!" }],
    auriko_metadata: { session_id: "abc-123", user_tier: "premium" },
});
```

The Auriko dashboard logs and displays your metadata.

### Stream responses

```typescript
const stream = await client.chat.completions.create({
    model: "gpt-4o",
    messages: [{ role: "user", content: "Count to 10" }],
    stream: true,
});

for await (const chunk of stream) {
    if (chunk.choices[0]?.delta?.content) {
        process.stdout.write(chunk.choices[0].delta.content);
    }
}
```

After consuming all chunks, access stream-level metadata:

```typescript
console.log(`\nProvider: ${stream.routing_metadata?.provider}`);
console.log(`Tokens: ${stream.usage?.total_tokens}`);
console.log(`Request ID: ${stream.responseHeaders.requestId}`);
console.log(`Closed: ${stream.isClosed}`);
```

Close a stream manually with `stream.close()`.

  Routing metadata, usage, and response headers are available only after consuming all chunks.

See [Streaming Guide](/guides/streaming) for full patterns including tool call streaming.

### Tool calling

```typescript
const tools = [
    {
        type: "function" as const,
        function: {
            name: "get_weather",
            description: "Get weather for a city",
            parameters: {
                type: "object",
                properties: {
                    city: { type: "string" },
                },
                required: ["city"],
            },
        },
    },
];

const response = await client.chat.completions.create({
    model: "gpt-4o",
    messages: [{ role: "user", content: "What's the weather in Paris?" }],
    tools,
});

if (response.choices[0].message.tool_calls) {
    const toolCall = response.choices[0].message.tool_calls[0];
    console.log(`Function: ${toolCall.function.name}`);
    console.log(`Arguments: ${toolCall.function.arguments}`);
}
```

See [Tool Calling Guide](/guides/tool-calling) for multi-turn tool conversations.

---

## Page: TypeScript SDK > Section: Read response headers

Every response and error includes a `responseHeaders` object with typed accessors:

```typescript
const response = await client.chat.completions.create({
    model: "gpt-4o",
    messages: [{ role: "user", content: "Hello!" }],
});

response.responseHeaders.requestId;                  // string | undefined
response.responseHeaders.rateLimitRemaining;          // number | undefined
response.responseHeaders.rateLimitLimit;              // number | undefined
response.responseHeaders.rateLimitReset;              // string | undefined
response.responseHeaders.creditsBalanceMicrodollars;  // number | undefined
response.responseHeaders.providerUsed;                // string | undefined
response.responseHeaders.routingStrategy;             // string | undefined
response.responseHeaders.get("x-custom-header");      // generic lookup
response.responseHeaders.getAll("x-multi-header");    // string[] for multi-value headers
```

| Property | Header | Type |
|----------|--------|------|
| `requestId` | `x-request-id` | `string \| undefined` |
| `rateLimitRemaining` | `x-ratelimit-remaining-requests` | `number \| undefined` |
| `rateLimitLimit` | `x-ratelimit-limit-requests` | `number \| undefined` |
| `rateLimitReset` | `x-ratelimit-reset-requests` | `string \| undefined` |
| `creditsBalanceMicrodollars` | `x-credits-balance-microdollars` | `number \| undefined` |
| `providerUsed` | `x-provider-used` | `string \| undefined` |
| `routingStrategy` | `x-routing-strategy` | `string \| undefined` |

Error objects also carry `responseHeaders`. Use `e.responseHeaders.requestId` when filing support tickets to correlate with server logs.

See the [TypeScript SDK Reference](/sdk/typescript-reference#response-headers) for the complete `ResponseHeaders` API.

---

## Page: TypeScript SDK > Section: Read token usage

The `Usage` object on every response carries optional detail breakdowns:

```typescript
const response = await client.chat.completions.create({
    model: "gpt-4o",
    messages: [{ role: "user", content: "Hello!" }],
});

const usage = response.usage;

// Prompt token breakdown
if (usage?.prompt_tokens_details) {
    console.log(`Cached: ${usage.prompt_tokens_details.cached_tokens}`);
    console.log(`Text: ${usage.prompt_tokens_details.text_tokens}`);
    console.log(`Image: ${usage.prompt_tokens_details.image_tokens}`);
    console.log(`Audio: ${usage.prompt_tokens_details.audio_tokens}`);
}

// Completion token breakdown
if (usage?.completion_tokens_details) {
    console.log(`Reasoning: ${usage.completion_tokens_details.reasoning_tokens}`);
    console.log(`Text: ${usage.completion_tokens_details.text_tokens}`);
}
```

| Field | Sub-fields | Type |
|-------|-----------|------|
| `prompt_tokens_details` | `cached_tokens`, `text_tokens`, `image_tokens`, `audio_tokens` | `number \| undefined` each |
| `completion_tokens_details` | `reasoning_tokens`, `text_tokens`, `image_tokens`, `audio_tokens` | `number \| undefined` each |

Availability depends on the provider. `completion_tokens_details.reasoning_tokens` is present for OpenAI o-series, DeepSeek, xAI, and Google Gemini. It's `undefined` for providers that don't report reasoning token counts (Anthropic, Moonshot, Fireworks).

See [Check reasoning token availability](/guides/extensions-and-thinking#check-reasoning-token-availability) for the full breakdown.

---

## Page: TypeScript SDK > Section: Handle errors

Catch typed exceptions:

```typescript
import {
    Client,
    AurikoAPIError,
    AuthenticationError,
    RateLimitError,
    BudgetExceededError,
    ModelNotFoundError,
    ProviderError,
    // Also available: InvalidRequestError, InsufficientCreditsError,
    // InternalError, ProviderAuthError, ServiceUnavailableError
} from "@auriko/sdk";

const client = new Client();

try {
    const response = await client.chat.completions.create({
        model: "gpt-4o",
        messages: [{ role: "user", content: "Hello!" }],
    });
} catch (e) {
    if (e instanceof AuthenticationError) {
        console.log(`Check your API key: ${e.message}`);
    } else if (e instanceof RateLimitError) {
        console.log(`Rate limited: ${e.message}`);
    } else if (e instanceof BudgetExceededError) {
        console.log(`Budget exceeded: ${e.message}`);
    } else if (e instanceof ModelNotFoundError) {
        console.log(`Model not found: ${e.message}`);
    } else if (e instanceof ProviderError) {
        console.log(`Provider error: ${e.message}`);
    } else if (e instanceof AurikoAPIError) {
        console.log(`API error (${e.statusCode}): ${e.message}`);
    }
}
```

See [Error Handling Guide](/guides/error-handling) for retry patterns.

---

## Page: TypeScript SDK > Section: Use management APIs

Query workspace, budget, and model information:

```typescript
// Identity (discover your workspace)
const identity = await client.me.get();

// Workspaces
const workspaces = await client.workspaces.list();
const workspace = await client.workspaces.get("ws-123");

// Budgets
const budgets = await client.budgets.list("ws-123");
const budget = await client.budgets.get("ws-123", "budget-456");

// Models
const registry = await client.models.listRegistry();
const directory = await client.models.listDirectory();
const providers = await client.models.listProviders();
```

### Model listing choices

| Method | Returns | Use when |
|--------|---------|----------|
| `listRegistry()` | Flat list: `id`, `family`, `display_name` | You need a quick model ID lookup |
| `listDirectory()` | Rich detail: provider entries, context windows, capabilities, pricing tiers | You need to compare providers or check capabilities |
| `listProviders()` | Provider catalog: display name, description, data policy | You need to see available providers |

See the [TypeScript SDK Reference](/sdk/typescript-reference) for the complete API.

---

## Page: TypeScript SDK > Section: SDK scope

The Auriko SDK covers: inference (chat completions with routing), read-only management (workspaces, budgets, identity), and model discovery. For full platform operations (workspace creation, budget management, API key rotation), use the [REST API](/api-reference/overview) directly.

---

## Page: TypeScript SDK > Section: Use TypeScript types

The SDK provides typed responses, errors, and routing configuration. Import types directly:

```typescript
import type {
    ChatCompletion,
    ChatCompletionChunk,
    ChoiceMessage,
    Choice,
    Usage,
    RoutingMetadata,
    RoutingOptions,
    Extensions,
} from "@auriko/sdk";
```

---

## Page: TypeScript SDK > Section: Node.js, Deno, and Browser

The SDK works in multiple environments:

### Node.js

```typescript
import { Client } from "@auriko/sdk";

const client = new Client(); // reads AURIKO_API_KEY from env
```

### Deno

```typescript
import { Client } from "npm:@auriko/sdk";

const client = new Client({
    apiKey: Deno.env.get("AURIKO_API_KEY"),
});
```

### Browser (with bundler)

```typescript
import { Client } from "@auriko/sdk";

// Pass API key from your backend - never expose in client-side code!
const client = new Client({
    apiKey: apiKeyFromBackend,
});
```

  Never expose your API key in client-side code. Use a backend proxy instead.

===

# Auriko Framework Integrations

## Page: LangChain + Auriko

Use Auriko as your LLM provider in LangChain with a drop-in `ChatOpenAI` replacement.

---

## Page: LangChain + Auriko > Section: Prerequisites

- An [Auriko API key](https://auriko.ai/signup)

---

## Page: LangChain + Auriko > Section: Installation

```bash
pip install "auriko[langchain]"
```

---

## Page: LangChain + Auriko > Section: Use SDK adapter

Use the `AurikoChatOpenAI` adapter:

```python
from auriko.frameworks.langchain import AurikoChatOpenAI

llm = AurikoChatOpenAI(model="gpt-5.4")
```

`AurikoChatOpenAI` extends LangChain's `ChatOpenAI` with:
- Automatic `use_responses_api=False` (LangChain >=1.1 auto-routes GPT-5/Codex to the Responses API, which Auriko doesn't implement)
- Routing injection via `extra_body`
- OpenAI error mapping to typed Auriko error classes

```python
from auriko.frameworks.langchain import AurikoChatOpenAI

llm = AurikoChatOpenAI(model="gpt-5.4")

# Simple invoke
response = llm.invoke("What is 2+2?")
print(response.content)

# Streaming
for chunk in llm.stream("Count to 5"):
    print(chunk.content, end="", flush=True)

# With messages
from langchain_core.messages import HumanMessage, SystemMessage

messages = [
    SystemMessage(content="You are a helpful assistant."),
    HumanMessage(content="Explain quantum computing briefly."),
]
response = llm.invoke(messages)
print(response.content)
```

---

## Page: LangChain + Auriko > Section: Configure options

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `model` | `str` | (required, via parent) | Model ID |
| `api_key` | `str \| None` | `AURIKO_API_KEY` env | API key |
| `routing` | `RoutingOptions \| None` | `None` | Routing configuration |
| `base_url` | `str` | `"https://api.auriko.ai/v1"` | API base URL |
| `**kwargs` | | | Passed through to `ChatOpenAI` (e.g., `temperature`, `max_tokens`) |

---

## Page: LangChain + Auriko > Section: Configure routing

Configure routing options:

```python
from auriko.frameworks.langchain import AurikoChatOpenAI
from auriko.route_types import RoutingOptions

llm = AurikoChatOpenAI(
    model="gpt-5.4",
    routing=RoutingOptions(optimize="cost", max_ttft_ms=200),
)

response = llm.invoke("Hello!")
print(response.content)
```

Routing metadata is available through response generation info when using `generate()`:

```python
result = llm.generate([[HumanMessage(content="Hello!")]])
info = result.generations[0][0].generation_info
if info and "routing_metadata" in info:
    print(f"Provider: {info['routing_metadata']['provider']}")
```

---

## Page: LangChain + Auriko > Section: Configure manually

If you prefer to use `ChatOpenAI` directly:

```python
import os
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="gpt-5.4",
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1",
    use_responses_api=False,  # required for Auriko
)
```

Note: you must set `use_responses_api=False` manually, and routing options aren't available without `extra_body` configuration.

---

## Page: LangChain + Auriko > Section: Notes

- `AurikoChatOpenAI` inherits all `ChatOpenAI` capabilities: chains, agents, tool calling, async, streaming.
- OpenAI API errors are automatically mapped to typed Auriko error classes (`RateLimitError`, `BudgetExceededError`, etc.).
- The `use_responses_api=False` flag is set automatically — you don't need to remember it.

---

## Page: OpenAI Agents SDK + Auriko

Use Auriko as your LLM provider in the OpenAI Agents SDK.

---

## Page: OpenAI Agents SDK + Auriko > Section: Prerequisites

- An [Auriko API key](https://auriko.ai/signup)

---

## Page: OpenAI Agents SDK + Auriko > Section: Installation

```bash
pip install "auriko[agents]"
```

---

## Page: OpenAI Agents SDK + Auriko > Section: Use SDK adapter

Use the `AurikoModel` adapter:

```python
from auriko.frameworks.agents import AurikoModel

model = AurikoModel(model="gpt-5.4")
```

`AurikoModel` replaces 4 lines of global client configuration with a single model parameter. It extends `OpenAIChatCompletionsModel` with routing injection, error mapping, and per-task metadata isolation via `ContextVar`.

```python
import asyncio
from auriko.frameworks.agents import AurikoModel
from agents import Agent, Runner

model = AurikoModel(model="gpt-5.4")

agent = Agent(
    name="assistant",
    instructions="You are a helpful assistant.",
    model=model,
)

async def main():
    result = await Runner.run(agent, input="What is the capital of France?")
    print(result.final_output)

asyncio.run(main())
```

---

## Page: OpenAI Agents SDK + Auriko > Section: Configure options

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `model` | `str` | (required) | Model ID |
| `api_key` | `str \| None` | `AURIKO_API_KEY` env | API key |
| `routing` | `RoutingOptions \| None` | `None` | Routing configuration |
| `base_url` | `str` | `"https://api.auriko.ai/v1"` | API base URL |

---

## Page: OpenAI Agents SDK + Auriko > Section: Configure routing

Configure routing options:

```python
import asyncio
from auriko.frameworks.agents import AurikoModel
from auriko.route_types import RoutingOptions
from agents import Agent, Runner

model = AurikoModel(
    model="gpt-5.4",
    routing=RoutingOptions(optimize="cost"),
)

agent = Agent(name="assistant", instructions="You are helpful.", model=model)

async def main():
    result = await Runner.run(agent, input="Hello!")
    print(result.final_output)

asyncio.run(main())
```

Routing metadata is isolated per async task using `ContextVar`, so concurrent `Runner.run()` calls sharing the same `AurikoModel` instance don't interfere with each other.

---

## Page: OpenAI Agents SDK + Auriko > Section: Configure manually

If you prefer to configure the SDK's client directly:

```python
import asyncio
import os
from openai import AsyncOpenAI
from agents import Agent, Runner, set_default_openai_client, set_default_openai_api, set_tracing_disabled

set_default_openai_api("chat_completions")
set_tracing_disabled(True)

client = AsyncOpenAI(
    base_url="https://api.auriko.ai/v1",
    api_key=os.environ["AURIKO_API_KEY"],
)
set_default_openai_client(client, use_for_tracing=False)

agent = Agent(name="assistant", instructions="You are helpful.", model="gpt-5.4")

async def main():
    result = await Runner.run(agent, input="Hello!")
    print(result.final_output)

asyncio.run(main())
```

Note: `set_default_openai_api("chat_completions")` is required because Auriko implements the Chat Completions API, not the Responses API. Routing options, error mapping, and per-task metadata isolation aren't available with manual configuration.

---

## Page: OpenAI Agents SDK + Auriko > Section: Notes

- `AurikoModel` extends `OpenAIChatCompletionsModel` — it works with all Agents SDK features: tools, handoffs, streaming, guardrails.
- OpenAI API errors are automatically mapped to typed Auriko error classes (`RateLimitError`, `BudgetExceededError`, etc.).
- Concurrent agent runs using the same `AurikoModel` instance have isolated routing metadata via `ContextVar`.

---

## Page: Google ADK + Auriko

Use Auriko as your LLM provider in Google's Agent Development Kit (ADK).

---

## Page: Google ADK + Auriko > Section: Prerequisites

- An [Auriko API key](https://auriko.ai/signup)

---

## Page: Google ADK + Auriko > Section: Installation

```bash
pip install "auriko[adk]"
```

---

## Page: Google ADK + Auriko > Section: Use SDK adapter

Use the `AurikoLlm` adapter:

```python
from auriko.frameworks.adk import AurikoLlm

llm = AurikoLlm(model="gpt-5.4")
```

`AurikoLlm` is a native `BaseLlm` implementation that doesn't use LiteLLM as an intermediary. It converts directly between ADK (Gemini) types and OpenAI message format, supporting text and function calling.

```python
import asyncio
from auriko.frameworks.adk import AurikoLlm
from google.adk import Agent, Runner
from google.adk.sessions import InMemorySessionService
from google.genai import types

llm = AurikoLlm(model="gpt-5.4")

agent = Agent(
    model=llm,
    name="assistant",
    instruction="You are a helpful assistant.",
)

session_service = InMemorySessionService()
runner = Runner(agent=agent, app_name="my_app", session_service=session_service, auto_create_session=True)

user_message = types.Content(
    role="user", parts=[types.Part(text="What is 2+2?")]
)

async def main():
    async for event in runner.run_async(user_id="user-1", session_id="session-1", new_message=user_message):
        if event.content and event.content.parts:
            for part in event.content.parts:
                if part.text:
                    print(part.text, end="", flush=True)

asyncio.run(main())
```

---

## Page: Google ADK + Auriko > Section: Configure options

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `model` | `str` | (required) | Model ID |
| `api_key` | `str` | `""` (reads `AURIKO_API_KEY` at first use) | API key |
| `routing` | `RoutingOptions \| None` | `None` | Routing configuration |
| `base_url` | `str` | `"https://api.auriko.ai/v1"` | API base URL |

---

## Page: Google ADK + Auriko > Section: Configure routing

Configure routing options:

```python
from auriko.frameworks.adk import AurikoLlm
from auriko.route_types import RoutingOptions

llm = AurikoLlm(
    model="gpt-5.4",
    routing=RoutingOptions(optimize="cost"),
)
```

---

## Page: Google ADK + Auriko > Section: Configure manually

If you prefer to use Google's `LiteLlm` class directly:

```python
import os
from google.adk.models.lite_llm import LiteLlm

llm = LiteLlm(
    model="openai/gpt-5.4",
    api_key=os.environ["AURIKO_API_KEY"],
    api_base="https://api.auriko.ai/v1",
    custom_llm_provider="openai",
)
```

  LiteLLM ignores `api_base` for model names containing provider keywords (like `gpt` or `claude`). Always include `custom_llm_provider="openai"` to force LiteLLM to respect your custom base URL.

Note: routing options and Auriko error mapping aren't available with manual configuration.

---

## Page: Google ADK + Auriko > Section: Notes

- Supports text and function calling. Inline data (`inline_data`) and file data (`file_data`) aren't yet supported and raise `NotImplementedError`.
- OpenAI API errors are automatically mapped to typed Auriko error classes.
- The adapter uses `AsyncOpenAI` internally; the client is lazily initialized on first use.

---

## Page: CrewAI + Auriko

Use Auriko as your LLM provider in CrewAI for cost-effective multi-agent workflows.

---

## Page: CrewAI + Auriko > Section: Prerequisites

- An [Auriko API key](https://auriko.ai/signup)

---

## Page: CrewAI + Auriko > Section: Installation

```bash
pip install "auriko[crewai]"
```

---

## Page: CrewAI + Auriko > Section: Use SDK adapter

Use the `AurikoCrewAILLM` adapter:

```python
from auriko.frameworks.crewai import AurikoCrewAILLM

auriko_llm = AurikoCrewAILLM(model="gpt-5.4")
```

`AurikoCrewAILLM` adds an `openai/` prefix internally so all models (including Claude) route through Auriko. Without this wrapper, CrewAI detects `claude-` model names and silently routes to the native Anthropic SDK, bypassing Auriko entirely.

```python
from crewai import Agent, Task, Crew

researcher = Agent(
    role="Researcher",
    goal="Find accurate and comprehensive information",
    backstory="You are an expert researcher with attention to detail.",
    llm=auriko_llm.llm,
    verbose=True,
)

writer = Agent(
    role="Writer",
    goal="Write clear, engaging content based on research",
    backstory="You are a skilled technical writer.",
    llm=auriko_llm.llm,
    verbose=True,
)

research_task = Task(
    description="Research the latest trends in AI agents",
    agent=researcher,
    expected_output="A detailed summary of AI agent trends with sources",
)

writing_task = Task(
    description="Write a blog post based on the research findings",
    agent=writer,
    expected_output="A 500-word blog post about AI agent trends",
    context=[research_task],
)

crew = Crew(
    agents=[researcher, writer],
    tasks=[research_task, writing_task],
    verbose=True,
)

result = crew.kickoff()
print(result)
```

---

## Page: CrewAI + Auriko > Section: Configure options

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `model` | `str` | (required) | Model ID (e.g., `"gpt-5.4"`, `"claude-sonnet-4-20250514"`) |
| `api_key` | `str \| None` | `AURIKO_API_KEY` env | API key |
| `routing` | `RoutingOptions \| None` | `None` | Routing configuration |
| `base_url` | `str` | `"https://api.auriko.ai/v1"` | API base URL |
| `**kwargs` | | | Passed through to `crewai.LLM` |

---

## Page: CrewAI + Auriko > Section: Configure routing

Configure routing options:

```python
from auriko.frameworks.crewai import AurikoCrewAILLM
from auriko.route_types import RoutingOptions

auriko_llm = AurikoCrewAILLM(
    model="gpt-5.4",
    routing=RoutingOptions(optimize="cost"),
)

# After crew.kickoff(), access routing metadata from the last request
metadata = auriko_llm.last_routing_metadata
if metadata:
    print(f"Provider: {metadata.provider}")
```

Different agents can use different models and routing strategies:

```python
fast_llm = AurikoCrewAILLM(model="gpt-4o", routing=RoutingOptions(optimize="speed"))
smart_llm = AurikoCrewAILLM(model="gpt-5.4", routing=RoutingOptions(optimize="balanced"))

researcher = Agent(role="Researcher", goal="Find information", backstory="Expert", llm=smart_llm.llm)
writer = Agent(role="Writer", goal="Write content", backstory="Skilled writer", llm=fast_llm.llm)
```

---

## Page: CrewAI + Auriko > Section: Configure manually

If you prefer not to use the SDK adapter, you can configure CrewAI's `LLM` directly. You must add the `openai/` prefix to the model name manually:

```python
import os
from crewai import LLM

llm = LLM(
    model="openai/gpt-5.4",  # openai/ prefix required
    base_url="https://api.auriko.ai/v1",
    api_key=os.environ["AURIKO_API_KEY"],
)
```

Note: routing options and metadata access aren't available with manual configuration.

---

## Page: CrewAI + Auriko > Section: Notes

- `AurikoCrewAILLM` wraps `crewai.LLM` — pass the `.llm` property to `Agent`, not the wrapper itself.
- The `openai/` prefix is added automatically for all models, including Claude.
- `last_routing_metadata` returns metadata from the most recent non-streaming response only.

---

## Page: LlamaIndex + Auriko

Use Auriko as your LLM provider in LlamaIndex.

---

## Page: LlamaIndex + Auriko > Section: Prerequisites

- An [Auriko API key](https://auriko.ai/signup)

---

## Page: LlamaIndex + Auriko > Section: Installation

```bash
pip install "auriko[llamaindex]"
```

---

## Page: LlamaIndex + Auriko > Section: Use SDK adapter

Use the `AurikoLlamaIndexLLM` adapter:

```python
from auriko.frameworks.llamaindex import AurikoLlamaIndexLLM

llm = AurikoLlamaIndexLLM(model="gpt-5.4")
```

`AurikoLlamaIndexLLM` extends LlamaIndex's `OpenAI` LLM class with routing injection, per-call routing overrides, and Auriko error mapping.

```python
from auriko.frameworks.llamaindex import AurikoLlamaIndexLLM
from llama_index.core.llms import ChatMessage

llm = AurikoLlamaIndexLLM(model="gpt-5.4")

# Simple chat
response = llm.chat([ChatMessage(role="user", content="What is 2+2?")])
print(response.message.content)

# Streaming
for chunk in llm.stream_chat([ChatMessage(role="user", content="Count to 5")]):
    print(chunk.delta, end="", flush=True)
```

---

## Page: LlamaIndex + Auriko > Section: Configure options

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `model` | `str` | (required, via parent) | Model ID |
| `api_key` | `str \| None` | `AURIKO_API_KEY` env | API key |
| `routing` | `RoutingOptions \| None` | `None` | Default routing configuration |
| `api_base` | `str` | `"https://api.auriko.ai/v1"` | API base URL |
| `**kwargs` | | | Passed through to LlamaIndex's `OpenAI` (e.g., `temperature`, `max_tokens`) |

---

## Page: LlamaIndex + Auriko > Section: Configure routing

Instance-level routing applies to all requests:

```python
from auriko.frameworks.llamaindex import AurikoLlamaIndexLLM
from auriko.route_types import RoutingOptions

llm = AurikoLlamaIndexLLM(
    model="gpt-5.4",
    routing=RoutingOptions(optimize="cost"),
)
```

Per-call routing overrides the instance default:

```python
from auriko.route_types import RoutingOptions

# Use cost optimization for this call only
response = llm.chat(
    [ChatMessage(role="user", content="Hello!")],
    routing=RoutingOptions(optimize="speed"),
)
```

Access routing metadata from the response:

```python
response = llm.chat([ChatMessage(role="user", content="Hello!")])
metadata = response.additional_kwargs.get("routing_metadata")
if metadata:
    print(f"Provider: {metadata['provider']}")
```

---

## Page: LlamaIndex + Auriko > Section: Configure manually

If you prefer to use LlamaIndex's `OpenAI` class directly:

```python
import os
from llama_index.llms.openai import OpenAI

llm = OpenAI(
    model="gpt-5.4",
    api_key=os.environ["AURIKO_API_KEY"],
    api_base="https://api.auriko.ai/v1",
)
```

Note: routing options, per-call overrides, and Auriko error mapping aren't available with manual configuration.

---

## Page: LlamaIndex + Auriko > Section: Notes

- `AurikoLlamaIndexLLM` inherits all LlamaIndex OpenAI capabilities: chat, completion, streaming, async.
- OpenAI API errors are automatically mapped to typed Auriko error classes (`RateLimitError`, `BudgetExceededError`, etc.).
- Per-call routing overrides are unique to this adapter — pass `routing=RoutingOptions(...)` to any chat/complete call.

===

# Auriko Platform

## Page: Rate Limits

Inference rate limits apply only to BYOK (Bring Your Own Key) requests and scale with your usage tier. Platform keys have no inference rate limits. Management endpoints have separate per-user limits.

---

## Page: Rate Limits > Section: Prerequisites

- An [Auriko API key](https://auriko.ai/signup)

---

## Page: Rate Limits > Section: Inference rate limits

Your rate limit tier is determined by rolling 30-day inference spend and recalculates every 60 minutes:

| Tier | 30-day spend | BYOK RPM | BYOK monthly cap | Platform fee |
|------|-------------|----------|-------------------|--------------|
| Starter | $0 – $500 | 30 | 1,000 | 2.0% |
| Growth | $500 – $10,000 | 120 | 50,000 | 1.0% |
| Scale | $10,000+ | 600 | Unlimited | 0.5% |
| Enterprise | Custom | 1,200 | Unlimited | Custom |

Enterprise tier is assigned manually — it is not auto-detected from spend.

  The limits above apply only to BYOK requests. See [BYOK](/platform/byok) for details.

---

## Page: Rate Limits > Section: Rate limit headers

Every response carries OpenAI-compatible rate limit headers:

| Header | Description |
|--------|-------------|
| `Retry-After` | Seconds until rate limit resets (RFC 7231) |
| `X-RateLimit-Limit-Requests` | Requests allowed per window |
| `X-RateLimit-Remaining-Requests` | Requests remaining in current window |
| `X-RateLimit-Reset-Requests` | ISO 8601 timestamp when the window resets |

---

## Page: Rate Limits > Section: Management API rate limits

Management endpoints have separate per-user rate limits:

| Endpoint | Limit |
|----------|-------|
| API key creation | 10/min |
| Billing checkout | 5/min |
| Billing portal | 5/min |
| Team invites | 20/min |
| BYOK operations | 20/min |
| Workspace creation | 5/min |
| Account deletion | 2/min |
| Budget writes | 10/min |
| Management reads (API key) | 60/min (per IP) |
| Public registry | 60/min (per IP) |

---

## Page: Rate Limits > Section: Handle 429 responses

When you exceed a rate limit, the API returns a `429 Too Many Requests` response with a `Retry-After` header indicating when to retry.

```json
{
  "error": {
    "message": "Rate limit exceeded. Retry after 12 seconds.",
    "type": "rate_limit_error",
    "code": "rate_limit_exceeded"
  }
}
```

The Auriko SDK handles retries automatically with exponential backoff (up to 2 retries by default). For manual handling, see [Error handling — Retry manually](/guides/error-handling#retry-manually).

---

## Page: Team Management

Create workspaces, invite members, and manage roles.

Member and invite endpoints aren't yet available through the public API (`api.auriko.ai`). Manage team members through the [dashboard](https://auriko.ai/dashboard) instead.

---

## Page: Team Management > Section: Prerequisites

- A [session token](/api-reference/authentication#session-authentication)
- Workspace owner or admin role (for member management)

---

## Page: Team Management > Section: Roles

Workspace permissions are role-based:

| Action | Owner | Admin | Member |
|--------|-------|-------|--------|
| Invite members | Yes | Yes | — |
| Change roles | Yes | — | — |
| Remove members | Yes | Yes | — |
| Transfer ownership | Yes | — | — |
| Update workspace | Yes | — | — |
| Delete workspace | Yes | — | — |
| Cancel invites | Yes | Yes | — |
| View members | Yes | Yes | Yes |
| Use API keys | Yes | Yes | Yes |
| Leave workspace | Yes | Yes | Yes |

---

## Page: Team Management > Section: Create a workspace

Workspace management uses session authentication. See [Authentication](/api-reference/authentication#session-authentication) for details.

Any authenticated user can create a workspace and becomes the owner:

```bash
curl -X POST https://api.auriko.ai/v1/workspaces \
  -H "Authorization: Bearer $SESSION_JWT" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "My Team",
    "slug": "my-team"
  }'
```

Response:

```json
{
  "id": "7c9e6679-7425-40de-944b-e07fc1f90ae7",
  "name": "My Team",
  "slug": "my-team",
  "tier": "explorer",
  "user_role": "owner",
  "member_count": 1,
  "can_use_paid_models": false,
  "created_at": "2026-03-20T10:00:00Z"
}
```

---

## Page: Team Management > Section: Invite a member

Owners and admins can invite new members:

```bash
curl -X POST https://api.auriko.ai/v1/workspaces/{workspace_id}/members/invite \
  -H "Authorization: Bearer $SESSION_JWT" \
  -H "Content-Type: application/json" \
  -d '{
    "email": "teammate@example.com",
    "role": "member"
  }'
```

Invitations expire after 7 days and can be resent.

---

## Page: Team Management > Section: Accept an invitation

The invited user accepts by authenticating and calling the accept endpoint with the invite token:

```bash
curl -X POST https://api.auriko.ai/v1/invites/{token}/accept \
  -H "Authorization: Bearer $SESSION_JWT"
```

The invite token acts as a secret — it is sent to the invitee's email and is not exposed to workspace admins.

---

## Page: Team Management > Section: Change a member's role

Only workspace owners can change roles:

```bash
curl -X PATCH https://api.auriko.ai/v1/workspaces/{workspace_id}/members/{user_id} \
  -H "Authorization: Bearer $SESSION_JWT" \
  -H "Content-Type: application/json" \
  -d '{"role": "admin"}'
```

Assignable roles: `admin`, `member`. The `owner` role can only be transferred (see below).

---

## Page: Team Management > Section: Remove a member

Owners and admins can remove members:

```bash
curl -X DELETE https://api.auriko.ai/v1/workspaces/{workspace_id}/members/{user_id} \
  -H "Authorization: Bearer $SESSION_JWT"
```

Members can leave voluntarily:

```bash
curl -X POST https://api.auriko.ai/v1/workspaces/{workspace_id}/leave \
  -H "Authorization: Bearer $SESSION_JWT"
```

---

## Page: Team Management > Section: Transfer ownership

Only the current owner can transfer ownership:

```bash
curl -X POST https://api.auriko.ai/v1/workspaces/{workspace_id}/transfer-ownership \
  -H "Authorization: Bearer $SESSION_JWT" \
  -H "Content-Type: application/json" \
  -d '{"new_owner_id": "550e8400-e29b-41d4-a716-446655440000"}'
```

Auriko demotes the previous owner to admin after the transfer.

---

## Page: Team Management > Section: List and manage invites

```bash
# List pending invites
curl https://api.auriko.ai/v1/workspaces/{workspace_id}/invites \
  -H "Authorization: Bearer $SESSION_JWT"

# Cancel an invite
curl -X DELETE https://api.auriko.ai/v1/workspaces/{workspace_id}/invites/{invite_id} \
  -H "Authorization: Bearer $SESSION_JWT"

# Resend an invite
curl -X POST https://api.auriko.ai/v1/invites/{invite_id}/resend \
  -H "Authorization: Bearer $SESSION_JWT"
```

---

## Page: Bring Your Own Key

Use your own provider API keys with Auriko's routing, monitoring, and fallback capabilities.

---

## Page: Bring Your Own Key > Section: Prerequisites

- An [Auriko API key](https://auriko.ai/signup) for inference
- A [session token](/api-reference/authentication#session-authentication) for key management
- Workspace owner or admin role (for key management)
- A valid API key from a supported provider

---

## Page: Bring Your Own Key > Section: Find your workspace ID

Your API key is scoped to a workspace. To discover your workspace ID, call `/v1/me`:

```bash
curl https://api.auriko.ai/v1/me \
  -H "Authorization: Bearer $AURIKO_API_KEY"
```

The response includes your `workspace_id`:

```json
{
  "object": "api_key_identity",
  "workspace_id": "7c9e6679-7425-40de-944b-e07fc1f90ae7",
  "user_id": "550e8400-e29b-41d4-a716-446655440000",
  "tier": "explorer",
  "rate_limit_rpm": 60
}
```

You can also find your workspace ID in the [dashboard](https://auriko.ai/dashboard) under Settings.

`workspace_id` is `null` for keys created before workspace support.

---

## Page: Bring Your Own Key > Section: Add a provider key

Provider key management uses [session authentication](/api-reference/authentication#session-authentication). Get a session token from the [dashboard](https://auriko.ai/dashboard), then register a provider key:

```bash
curl -X POST https://api.auriko.ai/v1/workspaces/{workspace_id}/provider-keys \
  -H "Authorization: Bearer $SESSION_JWT" \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "openai",
    "api_key": "sk-...",
    "label": "Production OpenAI",
    "validate_before_save": true
  }'
```

Response:

```json
{
  "id": "pk_abc123",
  "provider": "openai",
  "provider_name": "OpenAI",
  "key_prefix": "sk-...wxyz",
  "is_default": true,
  "validation_status": "valid",
  "detected_tier": "tier-5",
  "tier_source": "auto_detected",
  "created_at": "2026-03-20T10:00:00Z"
}
```

When `validate_before_save` is `true` (default), Auriko makes a lightweight probe request to the provider to verify the key works before saving it.

---

## Page: Bring Your Own Key > Section: Supported providers

| Provider identifier | Provider name |
|---------------------|---------------|
| `openai` | OpenAI |
| `anthropic` | Anthropic Claude |
| `google_ai_studio` | Google AI Studio |
| `deepseek` | DeepSeek |
| `xai` | xAI Grok |
| `fireworks_ai` | Fireworks AI |
| `together_ai` | Together AI |
| `z_ai` | Z.AI |
| `minimax` | MiniMax |
| `moonshot` | Moonshot AI |

---

## Page: Bring Your Own Key > Section: Tier detection

Auriko auto-detects your provider account tier from rate limit headers on first use. The detected tier affects available RPM and TPM limits for routing decisions.

Override auto-detection:

- **Enterprise flag** — set `is_enterprise: true` when adding a key to mark it as enterprise tier
- **Manual tier** — for providers that require tier selection (for example, Google AI Studio), pass `selected_tier` at key creation
- **Update later** — `PATCH /v1/workspaces/{workspace_id}/provider-keys/{id}/tier` to change the tier after creation

Once a tier is manually set (`tier_source: "user_specified"`), auto-detection is disabled for that key.

---

## Page: Bring Your Own Key > Section: Use BYOK in requests

Control key source with routing constraints:

```python Python
import os
from auriko import Client

client = Client(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

# Use only your own keys
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    routing={"only_byok": True}
)

# Use only platform keys (no BYOK)
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    routing={"only_platform": True}
)
```

```typescript TypeScript
import { Client } from "@auriko/sdk";

const client = new Client({
    apiKey: process.env.AURIKO_API_KEY,
    baseUrl: "https://api.auriko.ai/v1",
});

// Use only your own keys
const response = await client.chat.completions.create({
    model: "gpt-4o",
    messages: [{ role: "user", content: "Hello!" }],
    routing: { only_byok: true },
});

// Use only platform keys (no BYOK)
const platform = await client.chat.completions.create({
    model: "gpt-4o",
    messages: [{ role: "user", content: "Hello!" }],
    routing: { only_platform: true },
});
```

---

## Page: Bring Your Own Key > Section: Routing behavior

The router **prefers your BYOK key** when one exists for the requested provider. You get direct billing control and your provider tier applies.

The router falls back to platform keys in two cases:

1. **Exhausted:** your BYOK key has zero remaining rate-limit headroom and the platform key has capacity.
2. **Fetch failure:** your BYOK key can't be retrieved or decrypted at request time and a platform key is available.

Override the default with routing constraints:

- `only_byok: true`: use only your BYOK key and fail the request if unavailable.
- `only_platform: true`: ignore BYOK keys entirely.

---

## Page: Bring Your Own Key > Section: Manage keys

These endpoints also use session authentication:

```bash
# List all provider keys
curl https://api.auriko.ai/v1/workspaces/{workspace_id}/provider-keys \
  -H "Authorization: Bearer $SESSION_JWT"

# Delete a key
curl -X DELETE https://api.auriko.ai/v1/workspaces/{workspace_id}/provider-keys/{id} \
  -H "Authorization: Bearer $SESSION_JWT"

# Re-validate a key
curl -X POST https://api.auriko.ai/v1/workspaces/{workspace_id}/provider-keys/{id}/validate \
  -H "Authorization: Bearer $SESSION_JWT"

# Set as default for provider
curl -X POST https://api.auriko.ai/v1/workspaces/{workspace_id}/provider-keys/{id}/set-default \
  -H "Authorization: Bearer $SESSION_JWT"
```

---

## Page: Bring Your Own Key > Section: Security

Auriko encrypts your provider keys and isolates them per workspace.

- **Encrypted at rest:** XSalsa20-Poly1305 with per-workspace HKDF-SHA256 key derivation.
- **Masked in responses:** API responses return keys as `sk-xxxxx...****` with only the first 8 characters visible.
- **Decrypted at request time only:** the edge router decrypts your key when calling the provider, then discards it.
- **Never logged:** Auriko never logs or persists decrypted keys.
- **Key rotation supported:** encryption key versions are tracked per key for zero-downtime master key rotation.

---

## Page: Bring Your Own Key > Section: Data policies

BYOK keys inherit the account-level data policy. Options: `none`, `no_training`, and `zdr` (zero data retention). When a per-request data policy intersects with the account-level policy, the most restrictive one wins.

For more on data policies, see [Advanced routing — Data policy](/guides/advanced-routing#data-policy).

---

## Page: Bring Your Own Key > Section: Rate limiting

Auriko rate-limits BYOK management endpoints to 20 operations per minute per user. Permissions: owner and admin for add, delete, and tier changes; all members can list keys and use them in requests.