# AUTO-GENERATED — DO NOT EDIT MANUALLY # Source: scripts/docs/generate_llms_txt.py # Regenerated on every deploy # Auriko > Intelligent LLM routing API with OpenAI-compatible interface. Route requests > across multiple AI providers to optimize for cost, latency, and reliability. ## Sections in this document 1. Guides — streaming, tool calling, routing, cost optimization, error handling, prompt caching, budget management, advanced routing 2. API Reference — endpoints, parameters, schemas, error codes 3. Python SDK Reference 4. TypeScript SDK Reference 5. Framework Integrations — LangChain, CrewAI, OpenAI Agents SDK, Google ADK, LlamaIndex 6. Platform — rate limits, team management, BYOK === # Auriko Guides ## Page: Streaming Stream responses in real-time for a better user experience. Auriko supports Server-Sent Events (SSE) for streaming. --- ## Page: Streaming > Section: Prerequisites - An [Auriko API key](https://auriko.ai/signup) - Python 3.10+ with `auriko` SDK installed (`pip install auriko`) - OR Node.js 18+ with `@auriko/sdk` installed (`npm install @auriko/sdk`) --- ## Page: Streaming > Section: Stream responses Stream a chat completion response: ```python Python import os from auriko import Client client = Client( api_key=os.environ["AURIKO_API_KEY"], base_url="https://api.auriko.ai/v1" ) stream = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Write a short story"}], stream=True ) for chunk in stream: if chunk.choices and chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="", flush=True) ``` ```typescript TypeScript import { Client } from "@auriko/sdk"; const client = new Client({ apiKey: process.env.AURIKO_API_KEY, baseUrl: "https://api.auriko.ai/v1", }); const stream = await client.chat.completions.create({ model: "gpt-4o", messages: [{ role: "user", content: "Write a short story" }], stream: true, }); for await (const chunk of stream) { if (chunk.choices[0]?.delta?.content) { process.stdout.write(chunk.choices[0].delta.content); } } ``` ```bash cURL curl https://api.auriko.ai/v1/chat/completions \ -H "Authorization: Bearer $AURIKO_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-4o", "messages": [{"role": "user", "content": "Write a short story"}], "stream": true }' ``` --- ## Page: Streaming > Section: Stream asynchronously (Python) Stream with the async client: ```python import os from auriko import AsyncClient import asyncio async def stream_response(): client = AsyncClient( api_key=os.environ["AURIKO_API_KEY"], base_url="https://api.auriko.ai/v1" ) stream = await client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Write a short story"}], stream=True ) async for chunk in stream: if chunk.choices and chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="", flush=True) asyncio.run(stream_response()) ``` --- ## Page: Streaming > Section: Stream events Each chunk contains: ```python # ChatCompletionChunk chunk.id # "chatcmpl-abc123" chunk.model # "gpt-4o" chunk.created # 1234567890 chunk.choices[0].delta.content # Token content (may be None) chunk.choices[0].delta.role # "assistant" (first chunk only) chunk.choices[0].finish_reason # None until last chunk ("stop") ``` --- ## Page: Streaming > Section: Handle final chunks The last chunk carries `finish_reason` and usage. Auriko forces `include_usage: true` on all streaming requests. You don't need to set `stream_options` manually. ```python stream = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Hello!"}], stream=True ) full_content = "" usage = None for chunk in stream: if chunk.choices: if chunk.choices[0].delta.content: full_content += chunk.choices[0].delta.content if chunk.choices[0].finish_reason: print(f"\n\nFinished: {chunk.choices[0].finish_reason}") if chunk.usage: usage = chunk.usage if usage: print(f"Tokens used: {usage.total_tokens}") ``` Auriko forces `stream_options.include_usage` to `true` for accurate billing. Setting it explicitly is harmless but unnecessary. --- ## Page: Streaming > Section: Stream properties The stream object exposes usage, routing metadata, and response headers after iteration completes. | Property | Python | TypeScript | Available | |----------|--------|------------|-----------| | Token usage | `stream.usage` | `stream.usage` | After iteration | | Routing info | `stream.routing_metadata` | `stream.routing_metadata` | After iteration | | Response headers | `stream.response_headers` | `stream.responseHeaders` | Immediately | | Close connection | `stream.close()` | `stream.close()` | Any time | Use the stream as a context manager to ensure the connection is released: ```python Python with client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Hello!"}], stream=True ) as stream: for chunk in stream: if chunk.choices and chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="", flush=True) # Available after iteration if stream.usage: print(f"Tokens: {stream.usage.total_tokens}") if stream.routing_metadata: print(f"Provider: {stream.routing_metadata.provider}") ``` ```typescript TypeScript const stream = await client.chat.completions.create({ model: "gpt-4o", messages: [{ role: "user", content: "Hello!" }], stream: true, }); for await (const chunk of stream) { if (chunk.choices[0]?.delta?.content) { process.stdout.write(chunk.choices[0].delta.content); } } // Available after iteration console.log(`Tokens: ${stream.usage?.total_tokens}`); console.log(`Provider: ${stream.routing_metadata?.provider}`); ``` Use an async context manager for automatic cleanup: ```python async with await client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Hello!"}], stream=True ) as stream: async for chunk in stream: if chunk.choices and chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="", flush=True) ``` `routing_metadata` and `usage` are only present in the **final chunk** (with `choices: []`). Consume the stream to completion to access them. In TypeScript, you can only iterate a stream once. A second attempt throws an error. --- ## Page: Streaming > Section: Stream with tools Accumulate tool call fragments from a streamed response: ```python stream = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "What's the weather in Paris?"}], tools=[{ "type": "function", "function": { "name": "get_weather", "description": "Get weather", "parameters": {"type": "object", "properties": {"city": {"type": "string"}}} } }], stream=True ) tool_calls = [] for chunk in stream: if not chunk.choices: continue delta = chunk.choices[0].delta # Handle tool call streaming if delta.tool_calls: for tc in delta.tool_calls: if tc.index >= len(tool_calls): tool_calls.append({"id": tc.id, "function": {"name": "", "arguments": ""}}) if tc.function and tc.function.name: tool_calls[tc.index]["function"]["name"] += tc.function.name if tc.function and tc.function.arguments: tool_calls[tc.index]["function"]["arguments"] += tc.function.arguments print(tool_calls) ``` See [Tool Calling Guide](/guides/tool-calling) for function definitions and multi-turn tool conversations. --- ## Page: Streaming > Section: Stream with routing options Pass routing options to a streaming request: ```python Python stream = client.chat.completions.create( model="gpt-5.4", messages=[{"role": "user", "content": "Hello!"}], stream=True, routing={ "optimize": "speed", "max_ttft_ms": 100, } ) for chunk in stream: if chunk.choices and chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="", flush=True) ``` ```typescript TypeScript const stream = await client.chat.completions.create({ model: "gpt-5.4", messages: [{ role: "user", content: "Hello!" }], stream: true, routing: { optimize: "speed", max_ttft_ms: 100, }, }); for await (const chunk of stream) { if (chunk.choices[0]?.delta?.content) { process.stdout.write(chunk.choices[0].delta.content); } } ``` --- ## Page: Streaming > Section: Handle stream errors Catch errors during streaming: ```python import os from auriko import Client, ProviderError, RateLimitError client = Client( api_key=os.environ["AURIKO_API_KEY"], base_url="https://api.auriko.ai/v1" ) try: stream = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Hello!"}], stream=True ) for chunk in stream: if chunk.choices and chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="", flush=True) except ProviderError as e: print(f"Provider error: {e}") except RateLimitError as e: print(f"Rate limited: {e}") ``` See [Error Handling Guide](/guides/error-handling) for retry strategies and circuit breakers. --- ## Page: Streaming > Section: SSE format Raw SSE events look like this. Auriko appends a final event with `routing_metadata` and `usage` before `[DONE]`. ``` data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1234567890,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]} data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1234567890,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]} data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1234567890,"model":"gpt-4o","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]} data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1234567890,"model":"gpt-4o","choices":[],"usage":{"prompt_tokens":8,"completion_tokens":2,"total_tokens":10},"routing_metadata":{"provider":"openai","routing_strategy":"balanced","total_latency_ms":847,"cost":{"billable_cost_usd":0.00015}}} data: [DONE] ``` The final event before `[DONE]` carries `routing_metadata` and `usage` with `choices: []`. SDKs expose these as `stream.routing_metadata` and `stream.usage` after iteration. --- ## Page: Tool Calling Let LLMs call your functions to interact with external systems, databases, and APIs. --- ## Page: Tool Calling > Section: Prerequisites - An [Auriko API key](https://auriko.ai/signup) - Python 3.10+ with `auriko` SDK installed (`pip install auriko`) - OR Node.js 18+ with `@auriko/sdk` installed (`npm install @auriko/sdk`) - A model that supports tool calling (e.g., GPT-4o, Claude 3.5 Sonnet) --- ## Page: Tool Calling > Section: Define tools Define tools as JSON schemas describing the function signature: ```python tools = [ { "type": "function", "function": { "name": "get_weather", "description": "Get the current weather for a city", "parameters": { "type": "object", "properties": { "city": { "type": "string", "description": "The city name" }, "unit": { "type": "string", "enum": ["celsius", "fahrenheit"], "description": "Temperature unit" } }, "required": ["city"] } } } ] ``` --- ## Page: Tool Calling > Section: Call tools Send a request with tools and check the response: ```python Python import os from auriko import Client import json client = Client( api_key=os.environ["AURIKO_API_KEY"], base_url="https://api.auriko.ai/v1" ) tools = [ { "type": "function", "function": { "name": "get_weather", "description": "Get weather for a city", "parameters": { "type": "object", "properties": { "city": {"type": "string"} }, "required": ["city"] } } } ] response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "What's the weather in Paris?"}], tools=tools ) if response.choices[0].message.tool_calls: tool_call = response.choices[0].message.tool_calls[0] print(f"Function: {tool_call.function.name}") print(f"Arguments: {json.loads(tool_call.function.arguments)}") ``` ```typescript TypeScript import { Client } from "@auriko/sdk"; const client = new Client({ apiKey: process.env.AURIKO_API_KEY, baseUrl: "https://api.auriko.ai/v1", }); const tools = [ { type: "function" as const, function: { name: "get_weather", description: "Get weather for a city", parameters: { type: "object", properties: { city: { type: "string" }, }, required: ["city"], }, }, }, ]; const response = await client.chat.completions.create({ model: "gpt-4o", messages: [{ role: "user", content: "What's the weather in Paris?" }], tools, }); if (response.choices[0].message.tool_calls) { const toolCall = response.choices[0].message.tool_calls[0]; console.log(`Function: ${toolCall.function.name}`); console.log(`Arguments: ${JSON.parse(toolCall.function.arguments)}`); } ``` --- ## Page: Tool Calling > Section: Execute tool calls After receiving tool calls, execute them and send the results back: ```python import json def get_weather(city: str) -> str: # Your actual implementation here return f"Weather in {city}: 72°F, sunny" # Step 1: Initial request response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "What's the weather in Paris?"}], tools=tools ) # Step 2: Check for tool calls message = response.choices[0].message if message.tool_calls: # Build message history messages = [ {"role": "user", "content": "What's the weather in Paris?"}, message.model_dump(), # Assistant message with tool_calls ] # Execute each tool call for tool_call in message.tool_calls: args = json.loads(tool_call.function.arguments) if tool_call.function.name == "get_weather": result = get_weather(args["city"]) # Add tool result messages.append({ "role": "tool", "tool_call_id": tool_call.id, "content": result }) # Step 3: Get final response final_response = client.chat.completions.create( model="gpt-4o", messages=messages, tools=tools ) print(final_response.choices[0].message.content) ``` --- ## Page: Tool Calling > Section: Use multiple tools Define multiple tools in the same request: ```python tools = [ { "type": "function", "function": { "name": "get_weather", "description": "Get weather for a city", "parameters": { "type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"] } } }, { "type": "function", "function": { "name": "search_web", "description": "Search the web for information", "parameters": { "type": "object", "properties": {"query": {"type": "string"}}, "required": ["query"] } } }, { "type": "function", "function": { "name": "send_email", "description": "Send an email", "parameters": { "type": "object", "properties": { "to": {"type": "string"}, "subject": {"type": "string"}, "body": {"type": "string"} }, "required": ["to", "subject", "body"] } } } ] ``` --- ## Page: Tool Calling > Section: Use parallel tool calls Models can request multiple tool calls in parallel: ```python response = client.chat.completions.create( model="gpt-4o", messages=[{ "role": "user", "content": "What's the weather in Paris and Tokyo?" }], tools=tools ) # May return two tool calls if response.choices[0].message.tool_calls: for tool_call in response.choices[0].message.tool_calls: print(f"{tool_call.function.name}: {tool_call.function.arguments}") ``` --- ## Page: Tool Calling > Section: Control tool choice Control which tools the model can use: ```python # Let model decide response = client.chat.completions.create( model="gpt-4o", messages=messages, tools=tools, tool_choice="auto" # default ) # Force tool use response = client.chat.completions.create( model="gpt-4o", messages=messages, tools=tools, tool_choice="required" ) # Force specific tool response = client.chat.completions.create( model="gpt-4o", messages=messages, tools=tools, tool_choice={"type": "function", "function": {"name": "get_weather"}} ) # Disable tools response = client.chat.completions.create( model="gpt-4o", messages=messages, tools=tools, tool_choice="none" ) ``` --- ## Page: Tool Calling > Section: Stream tool calls Accumulate streamed tool call fragments: ```python stream = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "What's the weather in Paris?"}], tools=tools, stream=True ) tool_calls = {} for chunk in stream: if not chunk.choices: continue delta = chunk.choices[0].delta if delta.tool_calls: for tc in delta.tool_calls: idx = tc.index if idx not in tool_calls: tool_calls[idx] = {"id": tc.id, "function": {"name": "", "arguments": ""}} if tc.function and tc.function.name: tool_calls[idx]["function"]["name"] += tc.function.name if tc.function and tc.function.arguments: tool_calls[idx]["function"]["arguments"] += tc.function.arguments print(list(tool_calls.values())) ``` See [Streaming Guide](/guides/streaming#stream-with-tools) for full streaming patterns including error handling and metadata access. --- ## Page: Tool Calling > Section: Convert legacy functions Auriko auto-converts the deprecated `functions`/`function_call` parameters to the modern `tools`/`tool_choice` format: | Legacy parameter | Converted to | Condition | |-----------------|-------------|-----------| | `functions` | `tools` | Only if `tools` is absent | | `function_call: "auto"` | `tool_choice: "auto"` | Only if `tool_choice` is absent | | `function_call: "none"` | `tool_choice: "none"` | Only if `tool_choice` is absent | | `function_call: {name: "fn"}` | `tool_choice: {type: "function", function: {name: "fn"}}` | Only if `tool_choice` is absent | Conversion only runs when the legacy field is present and the modern field is absent. If both are present, the modern field takes precedence. Use `tools`/`tool_choice` for new code. Auriko supports the legacy format for backward compatibility. **Provider compatibility:** Most major providers support function calling (`tools` / `tool_choice`), but subfeatures such as `parallel_tool_calls` vary by provider. Auriko filters out providers that don't support tool calling at all, but doesn't guarantee every provider-specific tool subfeature. Check `/v1/directory/models` for current capability details. --- ## Page: Tool Calling > Section: Best practices Write clear, specific function descriptions so the model knows when to use them. Always validate tool call arguments before executing. Return helpful error messages in tool results when execution fails. Only include relevant tools to reduce confusion and latency. --- ## Page: Structured Output You can force models to return valid JSON, optionally conforming to a specific schema. --- ## Page: Structured Output > Section: Prerequisites - An [Auriko API key](https://auriko.ai/signup) - Python 3.10+ with `auriko` SDK installed (`pip install auriko`) - OR Node.js 18+ with `@auriko/sdk` installed (`npm install @auriko/sdk`) --- ## Page: Structured Output > Section: Choose a response format Auriko supports three response format types: | Type | Description | Use case | |------|-------------|----------| | `text` | Default. Model returns plain text. | General chat, creative writing | | `json_object` | Model returns valid JSON. No schema enforcement. | Flexible JSON extraction | | `json_schema` | Model returns JSON matching a provided schema. | Typed data extraction, API responses | `json_schema` and `json_object` are separate capabilities. `json_schema` has broader model support. **Claude** supports `json_schema` but not `json_object`. If you request an unsupported mode, Auriko returns a `503` with a suggested alternative. Check per-model support on the [models page](https://optimal-inference.vercel.app/models) or via the [Directory API](/api-reference/list-directory-models). `json_schema` appears as **Structured Output**, `json_object` as **JSON Mode**. --- ## Page: Structured Output > Section: Return JSON To return any JSON output, set `response_format` to `json_object`: ```python Python import os from auriko import Client client = Client( api_key=os.environ["AURIKO_API_KEY"], base_url="https://api.auriko.ai/v1" ) response = client.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": "Extract the user's name and age as JSON."}, {"role": "user", "content": "I'm Alice and I'm 30 years old."} ], response_format={"type": "json_object"} ) print(response.choices[0].message.content) # {"name": "Alice", "age": 30} ``` ```typescript TypeScript import { Client } from "@auriko/sdk"; const client = new Client({ apiKey: process.env.AURIKO_API_KEY, baseUrl: "https://api.auriko.ai/v1", }); const response = await client.chat.completions.create({ model: "gpt-4o", messages: [ { role: "system", content: "Extract the user's name and age as JSON." }, { role: "user", content: "I'm Alice and I'm 30 years old." }, ], response_format: { type: "json_object" }, }); console.log(response.choices[0].message.content); // {"name": "Alice", "age": 30} ``` The model returns valid JSON, but the structure isn't guaranteed. For strict schema conformance, use `json_schema` instead. When using `json_object` mode, always include the word "JSON" in your system or user message. **OpenAI** and **DeepSeek** require this and return a 400 error without it. Including it is harmless on other providers. The `json_schema` mode does not have this requirement. The examples above include "JSON" in the system message. --- ## Page: Structured Output > Section: Enforce schema You can enforce a specific JSON structure by providing a schema: ```python Python import os import json from auriko import Client client = Client( api_key=os.environ["AURIKO_API_KEY"], base_url="https://api.auriko.ai/v1" ) response = client.chat.completions.create( model="gpt-4o", messages=[ {"role": "user", "content": "Extract: Alice is 30, lives in NYC, alice@example.com"} ], response_format={ "type": "json_schema", "json_schema": { "name": "ContactInfo", "schema": { "type": "object", "properties": { "name": {"type": "string"}, "age": {"type": "integer"}, "city": {"type": "string"}, "email": {"type": "string"} }, "required": ["name", "age", "city", "email"] } } } ) contact = json.loads(response.choices[0].message.content) print(contact["name"]) # Alice print(contact["email"]) # alice@example.com ``` ```typescript TypeScript import { Client } from "@auriko/sdk"; const client = new Client({ apiKey: process.env.AURIKO_API_KEY, baseUrl: "https://api.auriko.ai/v1", }); const response = await client.chat.completions.create({ model: "gpt-4o", messages: [ { role: "user", content: "Extract: Alice is 30, lives in NYC, alice@example.com" }, ], response_format: { type: "json_schema", json_schema: { name: "ContactInfo", schema: { type: "object", properties: { name: { type: "string" }, age: { type: "integer" }, city: { type: "string" }, email: { type: "string" }, }, required: ["name", "age", "city", "email"], }, }, }, }); const contact = JSON.parse(response.choices[0].message.content!); console.log(contact.name); // Alice console.log(contact.email); // alice@example.com ``` The `json_schema` object requires a `name` field. The `schema` field accepts a standard JSON Schema definition. Auriko automatically routes to providers that support your requested format. If no provider supports it, you get a clear error with suggestions. --- ## Page: Structured Output > Section: Use OpenAI SDK You can use the standard OpenAI SDK pointed at Auriko: ```python Python import os from openai import OpenAI client = OpenAI( api_key=os.environ["AURIKO_API_KEY"], base_url="https://api.auriko.ai/v1" ) response = client.chat.completions.create( model="gpt-4o", messages=[ {"role": "user", "content": "Extract: Bob is 25, lives in London, bob@example.com"} ], response_format={ "type": "json_schema", "json_schema": { "name": "ContactInfo", "schema": { "type": "object", "properties": { "name": {"type": "string"}, "age": {"type": "integer"}, "city": {"type": "string"}, "email": {"type": "string"} }, "required": ["name", "age", "city", "email"] } } } ) print(response.choices[0].message.content) # {"name": "Bob", "age": 25, "city": "London", "email": "bob@example.com"} ``` ```typescript TypeScript import OpenAI from "openai"; const client = new OpenAI({ apiKey: process.env.AURIKO_API_KEY, baseURL: "https://api.auriko.ai/v1", }); const response = await client.chat.completions.create({ model: "gpt-4o", messages: [ { role: "user", content: "Extract: Bob is 25, lives in London, bob@example.com" }, ], response_format: { type: "json_schema", json_schema: { name: "ContactInfo", schema: { type: "object", properties: { name: { type: "string" }, age: { type: "integer" }, city: { type: "string" }, email: { type: "string" }, }, required: ["name", "age", "city", "email"], }, }, }, }); console.log(response.choices[0].message.content); // {"name": "Bob", "age": 25, "city": "London", "email": "bob@example.com"} ``` The same `response_format` field works with both the Auriko SDK and the OpenAI SDK. --- ## Page: Structured Output > Section: Resources Call functions from LLM responses Optimize for cost, speed, or throughput Handle errors and retries See which models support structured output and JSON mode --- ## Page: Routing Options Auriko intelligently routes your requests across multiple providers. Use routing options to optimize for your specific needs. --- ## Page: Routing Options > Section: Prerequisites - An [Auriko API key](https://auriko.ai/signup) - Python 3.10+ with `auriko` SDK installed (`pip install auriko`) - OR Node.js 18+ with `@auriko/sdk` installed (`npm install @auriko/sdk`) --- ## Page: Routing Options > Section: Overview Auriko supports six optimization strategies: | Strategy | Description | Best For | |----------|-------------|----------| | `cost` | Route to cheapest provider | Batch processing, non-urgent tasks | | `cheapest` | Absolute lowest cost | Maximum cost savings, no latency requirements | | `speed` | Minimize latency, maximize throughput | Real-time applications, chatbots | | `ttft` | Minimize time to first token | Streaming UX, interactive apps | | `throughput` | Maximize tokens per second | High-volume processing | | `balanced` (default) | Weighted combination | General-purpose, mixed workloads | --- ## Page: Routing Options > Section: Cost Optimization Minimize your LLM costs: ```python Python response = client.chat.completions.create( model="gpt-5.4", messages=[{"role": "user", "content": "Hello!"}], routing={ "optimize": "cost" } ) # See which provider was used and the cost print(f"Provider: {response.routing_metadata.provider}") print(f"Cost: ${response.routing_metadata.cost.billable_cost_usd:.6f}") ``` ```typescript TypeScript const response = await client.chat.completions.create({ model: "gpt-5.4", messages: [{ role: "user", content: "Hello!" }], routing: { optimize: "cost", }, }); console.log(`Provider: ${response.routing_metadata?.provider}`); ``` --- ## Page: Routing Options > Section: Latency Optimization Get the fastest response: ```python Python response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Quick answer: 2+2?"}], routing={ "optimize": "speed" } ) print(f"Latency: {response.routing_metadata.total_latency_ms}ms") ``` ```typescript TypeScript const response = await client.chat.completions.create({ model: "gpt-4o", messages: [{ role: "user", content: "Quick answer: 2+2?" }], routing: { optimize: "speed", }, }); console.log(`Latency: ${response.routing_metadata?.total_latency_ms}ms`); ``` --- ## Page: Routing Options > Section: Latency Constraints Set maximum time-to-first-token (TTFT): ```python Python response = client.chat.completions.create( model="gpt-5.4", messages=[{"role": "user", "content": "Hello!"}], routing={ "optimize": "cost", "max_ttft_ms": 200 # Must start responding within 200ms } ) ``` ```typescript TypeScript const response = await client.chat.completions.create({ model: "gpt-5.4", messages: [{ role: "user", content: "Hello!" }], routing: { optimize: "cost", max_ttft_ms: 200, // Must start responding within 200ms }, }); ``` If no provider can meet the latency constraint, Auriko returns a 503 error. --- ## Page: Routing Options > Section: Set a cost ceiling Exclude providers that exceed a per-1M-token budget: ```python Python import os from auriko import Client client = Client( api_key=os.environ["AURIKO_API_KEY"], base_url="https://api.auriko.ai/v1" ) response = client.chat.completions.create( model="gpt-5.4", messages=[{"role": "user", "content": "Hello!"}], routing={ "optimize": "cost", "max_cost_per_1m": 5.00 # Max $5.00 per 1M tokens (average of input + output) } ) ``` ```typescript TypeScript const response = await client.chat.completions.create({ model: "gpt-5.4", messages: [{ role: "user", content: "Hello!" }], routing: { optimize: "cost", max_cost_per_1m: 5.0, // Max $5.00 per 1M tokens (average of input + output) }, }); ``` Auriko calculates cost as the average of input and output price per 1M tokens. Providers exceeding this ceiling are excluded from routing. For fine-grained quality and cost constraints, see [Advanced routing](/guides/advanced-routing). --- ## Page: Routing Options > Section: Provider Preferences Prefer or exclude specific providers: ```python Python # Only consider these providers response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Hello!"}], routing={ "providers": ["openai", "anthropic"] } ) # Exclude providers response = client.chat.completions.create( model="gpt-5.4", messages=[{"role": "user", "content": "Hello!"}], routing={ "exclude_providers": ["deepseek"] } ) ``` ```typescript TypeScript // Only consider these providers const response = await client.chat.completions.create({ model: "gpt-4o", messages: [{ role: "user", content: "Hello!" }], routing: { providers: ["openai", "anthropic"], }, }); // Exclude providers const response2 = await client.chat.completions.create({ model: "gpt-5.4", messages: [{ role: "user", content: "Hello!" }], routing: { exclude_providers: ["deepseek"], }, }); ``` --- ## Page: Routing Options > Section: Restrict key source Force requests to use only BYOK (bring-your-own-key) or only platform-managed keys: ```python Python import os from auriko import Client client = Client( api_key=os.environ["AURIKO_API_KEY"], base_url="https://api.auriko.ai/v1" ) # Use only your own provider keys response = client.chat.completions.create( model="gpt-5.4", messages=[{"role": "user", "content": "Hello!"}], routing={ "only_byok": True } ) # Use only Auriko platform keys response = client.chat.completions.create( model="gpt-5.4", messages=[{"role": "user", "content": "Hello!"}], routing={ "only_platform": True } ) ``` ```typescript TypeScript // Use only your own provider keys const response = await client.chat.completions.create({ model: "gpt-5.4", messages: [{ role: "user", content: "Hello!" }], routing: { only_byok: true, }, }); // Use only Auriko platform keys const response2 = await client.chat.completions.create({ model: "gpt-5.4", messages: [{ role: "user", content: "Hello!" }], routing: { only_platform: true, }, }); ``` Both are booleans, default `false`. Setting both to `true` returns a 400 error — they are mutually exclusive. When no key of the requested type is available, the request fails with no fallback. See [Bring Your Own Key](/platform/byok) for BYOK setup. --- ## Page: Routing Options > Section: Routing Metadata Every response carries routing information: ```python Python response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Hello!"}] ) metadata = response.routing_metadata print(f"Provider: {metadata.provider}") print(f"Model: {metadata.provider_model_id}") print(f"Latency: {metadata.total_latency_ms}ms") print(f"Input tokens: {metadata.cost.input_tokens}") print(f"Output tokens: {metadata.cost.output_tokens}") print(f"Cost: ${metadata.cost.billable_cost_usd:.6f}") ``` ```typescript TypeScript const response = await client.chat.completions.create({ model: "gpt-4o", messages: [{ role: "user", content: "Hello!" }], }); const metadata = response.routing_metadata; console.log(`Provider: ${metadata?.provider}`); console.log(`Model: ${metadata?.providerModelId}`); console.log(`Latency: ${metadata?.total_latency_ms}ms`); console.log(`Input tokens: ${metadata?.cost?.input_tokens}`); console.log(`Output tokens: ${metadata?.cost?.output_tokens}`); console.log(`Cost: $${metadata?.cost?.billable_cost_usd}`); ``` For the complete field reference including fallback chain, warnings, and all optional fields, see [Response Extensions](/api-reference/overview#response-extensions). --- ## Page: Routing Options > Section: Full Example ```python Python import os from auriko import Client client = Client( api_key=os.environ["AURIKO_API_KEY"], base_url="https://api.auriko.ai/v1" ) # For a chatbot: optimize speed with cost ceiling response = client.chat.completions.create( model="gpt-5.4", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What's the capital of France?"} ], routing={ "optimize": "speed", "max_ttft_ms": 150, } ) print(response.choices[0].message.content) print(f"\n--- Routing Info ---") print(f"Provider: {response.routing_metadata.provider}") print(f"Latency: {response.routing_metadata.total_latency_ms}ms") print(f"Cost: ${response.routing_metadata.cost.billable_cost_usd:.6f}") ``` ```typescript TypeScript import { Client } from "@auriko/sdk"; const client = new Client({ apiKey: process.env.AURIKO_API_KEY, baseUrl: "https://api.auriko.ai/v1", }); const response = await client.chat.completions.create({ model: "gpt-5.4", messages: [ { role: "system", content: "You are a helpful assistant." }, { role: "user", content: "What's the capital of France?" }, ], routing: { optimize: "speed", max_ttft_ms: 150, }, }); console.log(response.choices[0].message.content); console.log(`\n--- Routing Info ---`); console.log(`Provider: ${response.routing_metadata?.provider}`); console.log(`Latency: ${response.routing_metadata?.total_latency_ms}ms`); console.log(`Cost: $${response.routing_metadata?.cost?.billable_cost_usd}`); ``` --- ## Page: Routing Options > Section: OpenAI SDK Compatibility Using the OpenAI SDK, pass routing options via `extra_body`: ```python import os from openai import OpenAI client = OpenAI( api_key=os.environ["AURIKO_API_KEY"], base_url="https://api.auriko.ai/v1" ) response = client.chat.completions.create( model="gpt-5.4", messages=[{"role": "user", "content": "Hello!"}], extra_body={ "routing": { "optimize": "cost", "max_ttft_ms": 200 } } ) ``` --- ## Page: Routing Options > Section: Choose a strategy Match your use case to the right routing strategy: | Use case | Strategy | Key constraints | Example | |----------|----------|-----------------|---------| | Chatbot / real-time UI | `speed` or `ttft` | `max_ttft_ms: 200` | Interactive conversation | | Batch processing | `cost` or `cheapest` | — | Document summarization | | High-volume pipeline | `throughput` | `min_throughput_tps: 50` | Log analysis | | Cost-conscious real-time | `cost` | `max_ttft_ms: 500` | Customer support | | Compliance-sensitive | `balanced` | `data_policy: "zdr"` | Financial data | | Multi-model exploration | `balanced` | `models: [...]` | A/B testing | Start `max_ttft_ms` at 200-500ms and adjust — setting it too low causes 503 errors when no provider meets the constraint. For fine-grained control, see [Advanced routing](/guides/advanced-routing). --- ## Page: Cost Optimization Auriko can save you 30-70% on LLM costs by intelligently routing requests to the most cost-effective provider. --- ## Page: Cost Optimization > Section: Prerequisites - An [Auriko API key](https://auriko.ai/signup) - Python 3.10+ with `auriko` SDK installed (`pip install auriko`) - OR Node.js 18+ with `@auriko/sdk` installed (`npm install @auriko/sdk`) - Active usage to see cost comparisons --- ## Page: Cost Optimization > Section: How It Works When you set `optimize: "cost"`, Auriko: 1. Identifies all providers that can serve your model 2. Compares real-time pricing across providers 3. Routes to the cheapest available option 4. Falls back to alternatives if the cheapest is unavailable --- ## Page: Cost Optimization > Section: Enable Cost Optimization ```python Python import os from auriko import Client client = Client( api_key=os.environ["AURIKO_API_KEY"], base_url="https://api.auriko.ai/v1" ) response = client.chat.completions.create( model="gpt-5.4", messages=[{"role": "user", "content": "Hello!"}], routing={ "optimize": "cost" } ) # See the actual cost print(f"Cost: ${response.routing_metadata.cost.billable_cost_usd:.6f}") print(f"Provider: {response.routing_metadata.provider}") ``` ```typescript TypeScript import { Client } from "@auriko/sdk"; const client = new Client({ apiKey: process.env.AURIKO_API_KEY, baseUrl: "https://api.auriko.ai/v1", }); const response = await client.chat.completions.create({ model: "gpt-5.4", messages: [{ role: "user", content: "Hello!" }], routing: { optimize: "cost", }, }); console.log(`Provider: ${response.routing_metadata?.provider}`); ``` --- ## Page: Cost Optimization > Section: Cost with Latency Constraints Optimize for cost while maintaining latency requirements: ```python Python response = client.chat.completions.create( model="gpt-5.4", messages=[{"role": "user", "content": "Hello!"}], routing={ "optimize": "cost", "max_ttft_ms": 500 # Max 500ms to first token } ) ``` ```typescript TypeScript const response = await client.chat.completions.create({ model: "gpt-5.4", messages: [{ role: "user", content: "Hello!" }], routing: { optimize: "cost", max_ttft_ms: 500, // Max 500ms to first token }, }); ``` Auriko will find the cheapest provider that can meet the latency constraint. --- ## Page: Cost Optimization > Section: Restrict key source If you have negotiated provider rates through your own API keys, force requests to use only BYOK keys for cost control: ```python Python import os from auriko import Client client = Client( api_key=os.environ["AURIKO_API_KEY"], base_url="https://api.auriko.ai/v1" ) response = client.chat.completions.create( model="gpt-5.4", messages=[{"role": "user", "content": "Hello!"}], routing={ "optimize": "cost", "only_byok": True # Use only your own provider keys } ) ``` ```typescript TypeScript const response = await client.chat.completions.create({ model: "gpt-5.4", messages: [{ role: "user", content: "Hello!" }], routing: { optimize: "cost", only_byok: true, // Use only your own provider keys }, }); ``` See [Routing options](/guides/routing-options#restrict-key-source) for the full constraint API and [Bring Your Own Key](/platform/byok) for BYOK setup. --- ## Page: Cost Optimization > Section: View Your Costs Every response includes detailed cost information: ```python Python cost = response.routing_metadata.cost print(f"Input tokens: {cost.input_tokens}") print(f"Output tokens: {cost.output_tokens}") print(f"Total cost: ${cost.billable_cost_usd:.6f}") ``` ```typescript TypeScript const cost = response.routing_metadata?.cost; console.log(`Input tokens: ${cost?.input_tokens}`); console.log(`Output tokens: ${cost?.output_tokens}`); console.log(`Total cost: $${cost?.billable_cost_usd}`); ``` --- ## Page: Cost Optimization > Section: Cost Comparison Example Without Auriko (single provider): ``` 100,000 requests × $0.01/request = $1,000/day ``` With Auriko cost optimization: ``` 100,000 requests × $0.004/request = $400/day Savings: $600/day (60%) ``` --- ## Page: Cost Optimization > Section: Cost Breakdown Track costs by model and provider in your dashboard: | Model | OpenAI | Anthropic | Fireworks AI | **Auriko (optimized)** | |-------|--------|-----------|--------------|------------------------| | GPT-4o | $0.005/1K | - | - | **$0.005/1K** | | Claude Sonnet | - | $0.003/1K | $0.003/1K | **$0.003/1K** | Auriko automatically selects the cheapest option for each model. --- ## Page: Cost Optimization > Section: Best Practices Group similar requests to maximize cache hits and reduce costs Use smaller models for simple tasks, reserve large models for complex ones Track costs in your dashboard to identify optimization opportunities Configure spending limits in your dashboard settings --- ## Page: Cost Optimization > Section: Use Cases ### Background Processing For batch jobs where latency doesn't matter: ```python # Process documents overnight at lowest cost for doc in documents: response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": f"Summarize: {doc}"}], routing={"optimize": "cost"} ) save_summary(doc.id, response.choices[0].message.content) ``` ### With Latency Budget For user-facing features with cost consciousness: ```python # Respond quickly but minimize cost response = client.chat.completions.create( model="gpt-5.4", messages=conversation, routing={ "optimize": "cost", "max_ttft_ms": 300 # User won't notice < 300ms } ) ``` ### A/B test providers Compare costs across providers: ```python import random # 10% to primary, 90% cost-optimized if random.random() < 0.1: routing = {"providers": ["anthropic"]} else: routing = {"optimize": "cost"} response = client.chat.completions.create( model="gpt-5.4", messages=messages, routing=routing ) # Log for analysis log_cost( provider=response.routing_metadata.provider, cost=response.routing_metadata.cost.billable_cost_usd ) ``` --- ## Page: Cost Optimization > Section: Dashboard Track your cost savings in the Auriko dashboard: - Total spend by day/week/month - Cost per model - Cost per provider - Savings vs. single-provider baseline Monitor your usage and costs in real-time --- ## Page: Error Handling Handle errors gracefully with retries, fallbacks, and proper exception handling. --- ## Page: Error Handling > Section: Prerequisites - An [Auriko API key](https://auriko.ai/signup) - Python 3.10+ with `auriko` SDK installed (`pip install auriko`) - OR Node.js 18+ with `@auriko/sdk` installed (`npm install @auriko/sdk`) --- ## Page: Error Handling > Section: Error types All Auriko errors extend `AurikoAPIError` with these fields: | Field | Type | Description | |-------|------|-------------| | `message` | `str` | Human-readable error message | | `status_code` | `int` | HTTP status code | | `code` | `str` | Machine-readable error code | | `type` | `str \| None` | Error category | | `param` | `str \| None` | Parameter that caused the error | | `response_headers` | `ResponseHeaders` | Response headers (includes `request_id` for support) | The SDK provides 10 specific error classes: | Exception | Status | When | |-----------|--------|------| | `AuthenticationError` | 401 | Invalid or missing API key | | `InvalidRequestError` | 400 | Malformed request, invalid parameter value, or missing required parameter | | `InsufficientCreditsError` | 402 | Account has insufficient credits | | `BudgetExceededError` | 402 | Budget limit hit (workspace, key, or Bring Your Own Key (BYOK) scope) | | `ModelNotFoundError` | 404 | Requested model not in catalog | | `RateLimitError` | 429 | Rate limit exceeded | | `InternalError` | 500 | Unexpected Auriko server error | | `ProviderError` | 502/503/504 | Upstream provider error, timeout, or all providers failed | | `ProviderAuthError` | 401 | BYOK key authentication failed at provider | | `ServiceUnavailableError` | 503 | Auriko service temporarily unavailable | Some error codes (like `missing_required_parameter` and `no_providers_available`) map to shared classes (`InvalidRequestError` and `ProviderError` respectively). Any unrecognized error falls to the `AurikoAPIError` base class via status-code fallback. See the [Python SDK Reference](/sdk/python-reference#error-classes) or [TypeScript SDK Reference](/sdk/typescript-reference#error-classes) for complete error class fields and hierarchy. --- ## Page: Error Handling > Section: Handle errors Catch typed exceptions: ```python Python import os from auriko import ( Client, AurikoAPIError, AuthenticationError, RateLimitError, BudgetExceededError, ModelNotFoundError, ProviderError, # Also available: InvalidRequestError, InsufficientCreditsError, # InternalError, ProviderAuthError, ServiceUnavailableError ) client = Client( api_key=os.environ["AURIKO_API_KEY"], base_url="https://api.auriko.ai/v1" ) try: response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Hello!"}] ) print(response.choices[0].message.content) except AuthenticationError as e: print(f"Check your API key: {e}") except RateLimitError as e: print(f"Rate limited, retry later: {e}") except BudgetExceededError as e: print(f"Budget exceeded: {e}") except ModelNotFoundError as e: print(f"Model not found: {e}") except ProviderError as e: print(f"Provider error: {e}") except AurikoAPIError as e: # Catches all other Auriko errors (InternalError, # ServiceUnavailableError, InvalidRequestError, etc.) print(f"API error ({e.status_code}): {e}") ``` ```typescript TypeScript import { Client, AurikoAPIError, AuthenticationError, RateLimitError, BudgetExceededError, ModelNotFoundError, ProviderError, // Also available: InvalidRequestError, InsufficientCreditsError, // InternalError, ProviderAuthError, ServiceUnavailableError } from "@auriko/sdk"; const client = new Client({ apiKey: process.env.AURIKO_API_KEY, baseUrl: "https://api.auriko.ai/v1", }); try { const response = await client.chat.completions.create({ model: "gpt-4o", messages: [{ role: "user", content: "Hello!" }], }); console.log(response.choices[0].message.content); } catch (e) { if (e instanceof AuthenticationError) { console.log(`Check your API key: ${e.message}`); } else if (e instanceof RateLimitError) { console.log(`Rate limited, retry later: ${e.message}`); } else if (e instanceof BudgetExceededError) { console.log(`Budget exceeded: ${e.message}`); } else if (e instanceof ModelNotFoundError) { console.log(`Model not found: ${e.message}`); } else if (e instanceof ProviderError) { console.log(`Provider error: ${e.message}`); } else if (e instanceof AurikoAPIError) { console.log(`API error (${e.statusCode}): ${e.message}`); } } ``` --- ## Page: Error Handling > Section: Use built-in retries The SDK automatically retries transient errors with exponential backoff: | Setting | Value | |---------|-------| | Max retries | 2 (default) | | Initial interval | 500ms | | Max interval | 30 seconds | | Backoff | Exponential (1.5 exponent) + random jitter | | Retried status codes | 429, 500, 502, 503, 504 | | Connection/timeout errors | Retried | | `Retry-After` header | Respected (overrides backoff when present) | ```python Python import os from auriko import Client # Default: 2 retries with exponential backoff client = Client( api_key=os.environ["AURIKO_API_KEY"], base_url="https://api.auriko.ai/v1" ) # More retries for resilience client = Client( api_key=os.environ["AURIKO_API_KEY"], base_url="https://api.auriko.ai/v1", max_retries=5 ) # Disable retries entirely client = Client( api_key=os.environ["AURIKO_API_KEY"], base_url="https://api.auriko.ai/v1", max_retries=0 ) ``` ```typescript TypeScript import { Client } from "@auriko/sdk"; // Default: 2 retries with exponential backoff const client = new Client({ apiKey: process.env.AURIKO_API_KEY, baseUrl: "https://api.auriko.ai/v1", }); // More retries for resilience const resilientClient = new Client({ apiKey: process.env.AURIKO_API_KEY, baseUrl: "https://api.auriko.ai/v1", maxRetries: 5, }); // Disable retries entirely const noRetryClient = new Client({ apiKey: process.env.AURIKO_API_KEY, baseUrl: "https://api.auriko.ai/v1", maxRetries: 0, }); ``` When the server returns a `Retry-After` header (common with 429 responses), the SDK uses that value instead of the calculated backoff interval. --- ## Page: Error Handling > Section: Retry manually For more control, implement custom retry logic: ```python Python import os import time from auriko import Client, RateLimitError, ProviderError client = Client( api_key=os.environ["AURIKO_API_KEY"], base_url="https://api.auriko.ai/v1", max_retries=0 # Disable auto-retry ) def make_request_with_retry(messages, max_retries=3): last_error = None for attempt in range(max_retries): try: return client.chat.completions.create( model="gpt-4o", messages=messages ) except RateLimitError as e: last_error = e wait_time = min(2 ** attempt, 60) # Cap at 60 seconds print(f"Rate limited, waiting {wait_time}s...") time.sleep(wait_time) except ProviderError as e: last_error = e wait_time = 2 ** attempt print(f"Provider error, retrying in {wait_time}s...") time.sleep(wait_time) raise last_error # Usage response = make_request_with_retry([{"role": "user", "content": "Hello!"}]) ``` ```typescript TypeScript import { Client, RateLimitError, ProviderError } from "@auriko/sdk"; const client = new Client({ apiKey: process.env.AURIKO_API_KEY, baseUrl: "https://api.auriko.ai/v1", maxRetries: 0, // Disable auto-retry }); async function makeRequestWithRetry( messages: Array<{ role: string; content: string }>, maxRetries = 3 ) { let lastError: Error | undefined; for (let attempt = 0; attempt < maxRetries; attempt++) { try { return await client.chat.completions.create({ model: "gpt-4o", messages, }); } catch (e) { lastError = e as Error; const waitTime = Math.min(2 ** attempt, 60) * 1000; if (e instanceof RateLimitError || e instanceof ProviderError) { await new Promise((r) => setTimeout(r, waitTime)); } else { throw e; } } } throw lastError; } ``` --- ## Page: Error Handling > Section: Retry asynchronously Retry with async/await: ```python import os import asyncio from auriko import AsyncClient, RateLimitError, ProviderError client = AsyncClient( api_key=os.environ["AURIKO_API_KEY"], base_url="https://api.auriko.ai/v1", max_retries=0 ) async def make_request_with_backoff(messages, max_retries=3): for attempt in range(max_retries): try: return await client.chat.completions.create( model="gpt-4o", messages=messages ) except (RateLimitError, ProviderError) as e: if attempt == max_retries - 1: raise wait_time = 2 ** attempt await asyncio.sleep(wait_time) ``` **Side effects and retries:** When using tools or multi-step workflows, consider whether retries are safe. A retried request that triggers a tool call may execute the tool twice. For idempotency-sensitive operations, either disable automatic retries (`max_retries=0`) or implement your own deduplication logic. --- ## Page: Error Handling > Section: Fall back to another model Use a cheaper/faster model as fallback: ```python import os from auriko import Client, AurikoAPIError client = Client( api_key=os.environ["AURIKO_API_KEY"], base_url="https://api.auriko.ai/v1" ) def chat_with_fallback(messages): try: # Try primary model return client.chat.completions.create( model="gpt-4o", messages=messages, routing={"max_ttft_ms": 200} ) except AurikoAPIError as e: print(f"Primary failed ({e}), trying fallback...") # Fallback to a different model return client.chat.completions.create( model="gpt-4o-mini", messages=messages ) ``` --- ## Page: Error Handling > Section: Use circuit breakers Prevent cascading failures: ```python import os from datetime import datetime, timedelta, timezone from auriko import Client, ProviderError client = Client( api_key=os.environ["AURIKO_API_KEY"], base_url="https://api.auriko.ai/v1" ) class CircuitBreaker: def __init__(self, failure_threshold=5, reset_timeout=60): self.failures = 0 self.failure_threshold = failure_threshold self.reset_timeout = reset_timeout self.last_failure = None self.is_open = False def record_failure(self): self.failures += 1 self.last_failure = datetime.now(timezone.utc) if self.failures >= self.failure_threshold: self.is_open = True def record_success(self): self.failures = 0 self.is_open = False def can_proceed(self): if not self.is_open: return True if datetime.now(timezone.utc) - self.last_failure > timedelta(seconds=self.reset_timeout): self.is_open = False return True return False # Usage breaker = CircuitBreaker() def safe_request(messages): if not breaker.can_proceed(): raise Exception("Circuit breaker open, try later") try: response = client.chat.completions.create( model="gpt-4o", messages=messages ) breaker.record_success() return response except ProviderError as e: breaker.record_failure() raise ``` --- ## Page: Error Handling > Section: Set timeouts ```python Python import os import httpx from auriko import Client client = Client( api_key=os.environ["AURIKO_API_KEY"], base_url="https://api.auriko.ai/v1", timeout=30.0 # 30 second timeout ) try: response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Write a long essay..."}] ) except httpx.TimeoutException: print("Request timed out") ``` ```typescript TypeScript import { Client } from "@auriko/sdk"; const client = new Client({ apiKey: process.env.AURIKO_API_KEY, baseUrl: "https://api.auriko.ai/v1", timeout: 30000, // 30 second timeout (ms) }); try { const response = await client.chat.completions.create({ model: "gpt-4o", messages: [{ role: "user", content: "Write a long essay..." }], }); } catch (e) { if (e instanceof Error && e.name === "TimeoutError") { console.log("Request timed out"); } } ``` --- ## Page: Error Handling > Section: Log errors Log errors for debugging: ```python import os import logging from auriko import Client, AurikoAPIError logging.basicConfig(level=logging.INFO) logger = logging.getLogger(__name__) client = Client( api_key=os.environ["AURIKO_API_KEY"], base_url="https://api.auriko.ai/v1" ) try: response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Hello!"}] ) except AurikoAPIError as e: logger.exception("Chat completion failed", extra={ "error_type": type(e).__name__, "status_code": e.status_code, "request_id": e.response_headers.request_id, "model": "gpt-4o", }) raise ``` --- ## Page: Error Handling > Section: Map OpenAI SDK errors If you use the OpenAI SDK directly (with `base_url` pointed at Auriko), you can convert OpenAI errors to typed Auriko errors using `map_openai_error()`: ```python import os import openai from auriko import map_openai_error, RateLimitError, BudgetExceededError client = openai.OpenAI( api_key=os.environ["AURIKO_API_KEY"], base_url="https://api.auriko.ai/v1" ) try: response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Hello!"}] ) except openai.APIStatusError as e: auriko_error = map_openai_error(e) if isinstance(auriko_error, RateLimitError): print(f"Rate limited. Retry after: {auriko_error.response_headers.rate_limit_reset}") elif isinstance(auriko_error, BudgetExceededError): print(f"Budget exceeded: {auriko_error.message}") else: raise auriko_error ``` This gives you access to typed error fields (`status_code`, `code`, `response_headers`) and fine-grained `isinstance` checks, even when using the OpenAI client. `map_openai_error()` is Python-only. TypeScript users should use the Auriko SDK directly for typed errors. See [Switching from OpenAI](/switching-from-openai#error-mapping) for migration-focused error mapping. --- ## Page: Error Handling > Section: Best practices Let the SDK handle transient errors automatically Log errors with context for debugging Prevent requests from hanging indefinitely Use fallback models for critical paths --- ## Page: Prompt Caching Reduce costs and latency by reusing cached prompt prefixes across requests. Auriko automatically injects cache control directives for all supported providers — no user action needed. --- ## Page: Prompt Caching > Section: Prerequisites - An [Auriko API key](https://auriko.ai/signup) - Python 3.10+ with `auriko` SDK installed (`pip install auriko`) - OR Node.js 18+ with `@auriko/sdk` installed (`npm install @auriko/sdk`) --- ## Page: Prompt Caching > Section: How it works Auriko intercepts outgoing requests and injects provider-specific caching directives when conversations exceed provider-specific token thresholds. You send requests normally — caching happens transparently. When a subsequent request shares the same prompt prefix, the provider serves the cached portion at a reduced cost and lower latency. --- ## Page: Prompt Caching > Section: See it in action Send a normal request — caching is automatic: ```python Python import os from auriko import Client client = Client( api_key=os.environ["AURIKO_API_KEY"], base_url="https://api.auriko.ai/v1" ) response = client.chat.completions.create( model="claude-sonnet-4-20250514", messages=[ {"role": "system", "content": "You are a helpful coding assistant..."}, {"role": "user", "content": "Explain async/await in Python."} ] ) # Check cache usage in the response usage = response.usage if hasattr(usage, "prompt_tokens_details") and usage.prompt_tokens_details: cached = getattr(usage.prompt_tokens_details, "cached_tokens", 0) print(f"Cached tokens: {cached}") print(f"Total prompt tokens: {usage.prompt_tokens}") ``` ```typescript TypeScript import { Client } from "@auriko/sdk"; const client = new Client({ apiKey: process.env.AURIKO_API_KEY, baseUrl: "https://api.auriko.ai/v1", }); const response = await client.chat.completions.create({ model: "claude-sonnet-4-20250514", messages: [ { role: "system", content: "You are a helpful coding assistant..." }, { role: "user", content: "Explain async/await in Python." }, ], }); // Check cache usage in the response const cached = response.usage?.prompt_tokens_details?.cached_tokens ?? 0; console.log(`Cached tokens: ${cached}`); console.log(`Total prompt tokens: ${response.usage?.prompt_tokens}`); ``` --- ## Page: Prompt Caching > Section: Provider support Auriko injects caching directives for four providers. Each uses a different mechanism: | Provider | User action | Auriko behavior | |----------|-------------|-----------------| | Anthropic | None — automatic | Injects `cache_control: {type: "ephemeral"}` when estimated tokens exceed a per-model threshold (1024–4096 tokens depending on model). Skips if user already added `cache_control` blocks. | | OpenAI | None — automatic | Injects `prompt_cache_key` for server affinity on all requests with a conversation ID. Adds `prompt_cache_retention: "24h"` for supported models (gpt-5, gpt-5.1, gpt-5.2, gpt-4.1 families). Skips retention for zdr data policy. | | Fireworks | None — automatic | Sets `user` field to conversation ID for same-replica routing and KV-cache reuse. | | xAI | None — automatic | Injects `x-grok-conv-id` header (UUID4 derived from conversation ID) for session affinity. | ### Anthropic token thresholds Caching is only activated when the estimated prompt token count exceeds the model-specific threshold: | Model family | Threshold | |-------------|-----------| | claude-sonnet-4-5, claude-sonnet-4, claude-opus-4, claude-opus-4-1, claude-3-7-sonnet | 1024 tokens | | claude-sonnet-4-6, claude-3-5-haiku, claude-3-haiku | 2048 tokens | | claude-haiku-4-5, claude-opus-4-5, claude-opus-4-6 | 4096 tokens | Requests below the threshold skip auto-injection because Anthropic charges for cache writes on small requests. ### Manual cache control If you need fine-grained control (for example, caching a specific system prompt block), add `cache_control` blocks manually. When Auriko detects any existing `cache_control` in the request, it skips auto-injection entirely. --- ## Page: Prompt Caching > Section: Check cache usage Cache hit information appears in the response `usage` object: ```json { "usage": { "prompt_tokens": 1500, "completion_tokens": 200, "total_tokens": 1700, "prompt_tokens_details": { "cached_tokens": 1200 } } } ``` The `cached_tokens` field shows how many prompt tokens were served from cache. --- ## Page: Prompt Caching > Section: When to use Caching works best with: - **Multi-turn conversations** — the shared conversation prefix grows with each turn - **Long system prompts** — reused across many requests - **Few-shot examples** — static example blocks cached across calls Caching provides minimal benefit for: - **Unique prompts** — no shared prefix to cache - **Very short prompts** — below provider token thresholds - **Single-turn requests** — no subsequent requests to benefit from the cache --- ## Page: Extensions and Thinking Access provider-specific features like thinking tokens through a normalized interface. Auriko translates a single `extensions.thinking` configuration into provider-native formats automatically. --- ## Page: Extensions and Thinking > Section: Prerequisites - An [Auriko API key](https://auriko.ai/signup) - Python 3.10+ with `auriko` SDK installed (`pip install auriko`) - OR Node.js 18+ with `@auriko/sdk` installed (`npm install @auriko/sdk`) - A model that supports reasoning (Claude 3.5+, o1, o3, o4-mini, DeepSeek R1, Gemini 2.0 Flash Thinking) --- ## Page: Extensions and Thinking > Section: Enable thinking Pass `extensions.thinking` in your request to enable extended reasoning: ```python Python import os from auriko import Client client = Client( api_key=os.environ["AURIKO_API_KEY"], base_url="https://api.auriko.ai/v1" ) response = client.chat.completions.create( model="claude-sonnet-4-20250514", messages=[{"role": "user", "content": "Solve this step by step: what is 23! / 20!?"}], extensions={"thinking": {"enabled": True, "budget_tokens": 10000}} ) print(response.choices[0].message.content) ``` ```typescript TypeScript import { Client } from "@auriko/sdk"; const client = new Client({ apiKey: process.env.AURIKO_API_KEY, baseUrl: "https://api.auriko.ai/v1", }); const response = await client.chat.completions.create({ model: "claude-sonnet-4-20250514", messages: [{ role: "user", content: "Solve this step by step: what is 23! / 20!?" }], extensions: { thinking: { enabled: true, budget_tokens: 10000 } }, }); console.log(response.choices[0].message.content); ``` --- ## Page: Extensions and Thinking > Section: Check provider support Auriko translates `extensions.thinking` into provider-native formats: | Provider | Models | Translation | |----------|--------|-------------| | Anthropic | Claude 3.5+, Claude 4 | `thinking: {type: "enabled", budget_tokens: }` — budget passed directly | | OpenAI | o1, o3, o4-mini | `reasoning_effort: "low" / "medium" / "high"` — mapped from budget_tokens thresholds | | DeepSeek | R1 | `thinking: {enabled: true, max_tokens: }` — budget passed directly | | Google AI Studio | Gemini 2.0 Flash Thinking | `thinking_config: {thinking_budget: }` — budget passed directly | | Other providers | Varies | OpenAI-compatible `reasoning_effort` format (default translator) | ### OpenAI budget mapping Since OpenAI uses discrete `reasoning_effort` levels instead of a token budget, Auriko maps `budget_tokens` to the appropriate level: | Budget tokens | Reasoning effort | |--------------|-----------------| | < 5,000 | `low` | | 5,000 – 14,999 | `medium` | | >= 15,000 | `high` | If `budget_tokens` is omitted, the default is 8,000 (maps to `medium`). --- ## Page: Extensions and Thinking > Section: Read thinking output When a model supports reasoning, the thinking output appears in the `reasoning_content` field on the response message: ```python Python response = client.chat.completions.create( model="claude-sonnet-4-20250514", messages=[{"role": "user", "content": "Solve step by step: what is 23! / 20!?"}], extensions={"thinking": {"enabled": True, "budget_tokens": 10000}} ) # Access the reasoning (if the model returns it) if response.choices[0].message.reasoning_content: print(f"Reasoning: {response.choices[0].message.reasoning_content}") print(f"Answer: {response.choices[0].message.content}") ``` ```typescript TypeScript const response = await client.chat.completions.create({ model: "claude-sonnet-4-20250514", messages: [{ role: "user", content: "Solve step by step: what is 23! / 20!?" }], extensions: { thinking: { enabled: true, budget_tokens: 10000 } }, }); if (response.choices[0].message.reasoning_content) { console.log(`Reasoning: ${response.choices[0].message.reasoning_content}`); } console.log(`Answer: ${response.choices[0].message.content}`); ``` ### Providers with `reasoning_content` | Provider | `reasoning_content` populated? | Notes | |----------|-------------------------------|-------| | Anthropic | Yes | Extracted from thinking block | | DeepSeek | Yes | Extracted from thinking content | | Google | Yes | Extracted from thinking_config response | | Fireworks AI | Yes | Extracted from `` tags in content (Qwen3 models) | | OpenAI | No | Reasoning is internal; not exposed in response | Fireworks AI Qwen3 models populate `reasoning_content` by default, without `extensions.thinking`. --- ## Page: Extensions and Thinking > Section: Use provider passthrough For provider-specific features beyond thinking, use provider-keyed extensions. Auriko forwards these as-is after security sanitization: ```python Python import os from auriko import Client client = Client( api_key=os.environ["AURIKO_API_KEY"], base_url="https://api.auriko.ai/v1" ) response = client.chat.completions.create( model="claude-sonnet-4-20250514", messages=[{"role": "user", "content": "Hello!"}], extensions={ "thinking": {"enabled": True, "budget_tokens": 10000}, "anthropic": { "custom_metadata": {"session_id": "abc123"} } } ) ``` ```typescript TypeScript import { Client } from "@auriko/sdk"; const client = new Client({ apiKey: process.env.AURIKO_API_KEY, baseUrl: "https://api.auriko.ai/v1", }); const response = await client.chat.completions.create({ model: "claude-sonnet-4-20250514", messages: [{ role: "user", content: "Hello!" }], extensions: { thinking: { enabled: true, budget_tokens: 10000 }, anthropic: { custom_metadata: { session_id: "abc123" }, }, }, }); ``` Auriko normalizes provider aliases automatically. The aliases `google`, `google_ai`, `googleai`, and `gemini` all map to `google_ai_studio`. ### Precedence When both normalized features and provider passthrough contain the same field, the provider passthrough wins. For example, if you set `extensions.thinking.budget_tokens: 10000` and `extensions.anthropic.thinking.budget_tokens: 15000`, Anthropic receives `15000`. ### Security filtering Auriko blocks authentication-related keys (`api_key`, `authorization`, `token`, etc.) at all nesting levels in passthrough extensions. Auriko also blocks core request fields (`model`, `messages`, `temperature`, etc.) at the top level to prevent routing bypass. --- ## Page: Extensions and Thinking > Section: Cost and latency Thinking tokens count toward output tokens and increase both cost and latency. Use `budget_tokens` to cap the reasoning budget for your use case. For cost-sensitive workloads, see [Cost optimization](/guides/cost-optimization). See [Check reasoning token availability](#check-reasoning-token-availability) for which providers report a breakdown. --- ## Page: Extensions and Thinking > Section: Check reasoning token availability The `completion_tokens_details.reasoning_tokens` field reports how many tokens the model spent on reasoning. Auriko passes through what the upstream provider reports. | Provider | Model examples | `reasoning_tokens` reported? | Notes | |----------|---------------|----------------------------|-------| | OpenAI | o1, o3, o4-mini | Yes | Native field | | DeepSeek | deepseek-v3.2-thinking | Yes | Native field (routed to DeepSeek API) | | xAI | grok-4-fast-reasoning | Yes | Native field | | Google | Gemini 2.5 Flash | Yes | Mapped from `thoughtsTokenCount` | | Anthropic | All Claude models | No | Upstream returns combined `output_tokens` only | | Moonshot | kimi-k2-thinking, kimi-k2-thinking-turbo | No | Upstream doesn't include token details | | Fireworks | deepseek-v3.2 | No | Upstream doesn't include token details for hosted models | When the provider doesn't report a reasoning token breakdown, Auriko doesn't include `completion_tokens_details` in the response. Check for the field before accessing it: ```python Python if response.usage.completion_tokens_details: print(f"Reasoning: {response.usage.completion_tokens_details.reasoning_tokens}") ``` ```typescript TypeScript if (response.usage?.completion_tokens_details) { console.log(`Reasoning: ${response.usage.completion_tokens_details.reasoning_tokens}`); } ``` When `completion_tokens_details` isn't available, `completion_tokens` reflects the combined total of reasoning and content tokens. You can still use it for cost tracking. --- ## Page: Budget Management Set spending limits at the workspace, API key, or BYOK provider level. Budgets enforce hard limits that block requests when exceeded. --- ## Page: Budget Management > Section: Prerequisites - An [Auriko API key](https://auriko.ai/signup) for read operations, or a [session token](/api-reference/authentication#session-authentication) for full access - Workspace owner or admin role --- ## Page: Budget Management > Section: Authentication | Operation | API key (`ak_*`) | Session JWT | |-----------|:-:|:-:| | List / Get budgets | Yes | Yes | | Create / Update / Delete | No | Yes | Read operations accept API keys. Write operations require session authentication. --- ## Page: Budget Management > Section: Create a budget Budget endpoints aren't wrapped by the SDK. Use cURL or any HTTP client with a session token: ```bash curl -X POST https://api.auriko.ai/v1/workspaces/{workspace_id}/budgets \ -H "Authorization: Bearer $SESSION_JWT" \ -H "Content-Type: application/json" \ -d '{ "scope_type": "workspace", "period": "monthly", "limit_usd": 500, "enforce": true }' ``` Response: ```json { "id": "bdgt_abc123", "workspace_id": "7c9e6679-7425-40de-944b-e07fc1f90ae7", "scope_type": "workspace", "period": "monthly", "limit_usd": 500.0, "enforce": true, "include_byok": false, "spend_usd": 42.50, "percent_used": 8.5, "created_at": "2026-03-20T10:00:00Z", "updated_at": "2026-03-20T10:00:00Z" } ``` --- ## Page: Budget Management > Section: Budget scopes The `scope_type` field determines what spending the budget tracks: | `scope_type` | Description | Required extra field | |--------------|-------------|----------------------| | `workspace` | Total workspace spend | `include_byok` (optional, default `false`) | | `api_key` | Per-key spend | `scope_id` (API key ID, required) | | `byok_provider` | Per-BYOK-provider spend | `scope_provider` (provider name, required) | ### Workspace budget with BYOK To include BYOK usage in a workspace budget, set `include_byok: true`: ```bash curl -X POST https://api.auriko.ai/v1/workspaces/{workspace_id}/budgets \ -H "Authorization: Bearer $SESSION_JWT" \ -H "Content-Type: application/json" \ -d '{ "scope_type": "workspace", "period": "monthly", "limit_usd": 1000, "enforce": true, "include_byok": true }' ``` ### API key budget ```bash curl -X POST https://api.auriko.ai/v1/workspaces/{workspace_id}/budgets \ -H "Authorization: Bearer $SESSION_JWT" \ -H "Content-Type: application/json" \ -d '{ "scope_type": "api_key", "scope_id": "key_abc123", "period": "daily", "limit_usd": 50, "enforce": true }' ``` --- ## Page: Budget Management > Section: Periods Budgets reset on a fixed schedule (UTC): | Period | Resets at | |--------|----------| | `daily` | 00:00 UTC | | `weekly` | Monday 00:00 UTC | | `monthly` | 1st of month 00:00 UTC | --- ## Page: Budget Management > Section: Check budget status ```bash curl https://api.auriko.ai/v1/workspaces/{workspace_id}/budgets \ -H "Authorization: Bearer $AURIKO_API_KEY" ``` Each budget in the response shows current spend: ```json { "id": "bdgt_abc123", "scope_type": "workspace", "period": "monthly", "limit_usd": 500.0, "spend_usd": 127.50, "percent_used": 25.5, "enforce": true } ``` --- ## Page: Budget Management > Section: Update and delete Update a budget (at least one field required): ```bash curl -X PATCH https://api.auriko.ai/v1/workspaces/{workspace_id}/budgets/{budget_id} \ -H "Authorization: Bearer $SESSION_JWT" \ -H "Content-Type: application/json" \ -d '{"limit_usd": 750}' ``` Delete a budget: ```bash curl -X DELETE https://api.auriko.ai/v1/workspaces/{workspace_id}/budgets/{budget_id} \ -H "Authorization: Bearer $SESSION_JWT" ``` --- ## Page: Budget Management > Section: Enforcement When `enforce` is `true` and spending reaches the enforcement threshold, subsequent inference requests return a `402` error with code `budget_exceeded`. The enforcement threshold has a buffer to account for in-flight requests: ``` enforcement_limit = limit_usd - min($10, 10% of limit_usd) ``` For example, a $100 budget enforces at $90. A $500 budget enforces at $490. Auriko triggers alerts at 50%, 75%, 90%, and 100% of the budget limit. For handling `budget_exceeded` errors, see [Error handling](/guides/error-handling). --- ## Page: Budget Management > Section: Rate limiting Auriko rate-limits budget management writes to 10 per minute per user and API key reads to 60 per minute per IP. See [Rate limits](/platform/rate-limits) for details. --- ## Page: Advanced Routing Fine-tune routing with suffix shortcuts, multi-model requests, quality constraints, and data policies. For basic routing, see [Routing options](/guides/routing-options). --- ## Page: Advanced Routing > Section: Prerequisites - An [Auriko API key](https://auriko.ai/signup) - A [session token](/api-reference/authentication#session-authentication) (for routing defaults) - Python 3.10+ with `auriko` SDK installed (`pip install auriko`) - OR Node.js 18+ with `@auriko/sdk` installed (`npm install @auriko/sdk`) - Familiarity with [Routing options](/guides/routing-options) --- ## Page: Advanced Routing > Section: How routing works When you send a request, Auriko's router: 1. **Enumerates candidates** — finds all providers offering the requested model(s) 2. **Filters by constraints** — removes providers that violate your routing options (data policy, Bring Your Own Key (BYOK) requirement, min success rate, excluded providers) 3. **Scores by strategy** — ranks remaining candidates using your `optimize` strategy: - `cost` / `cheapest`: lowest price per token - `ttft` / `speed`: lowest latency to first token - `throughput`: highest tokens per second - `balanced`: weighted combination of cost, latency, and throughput 4. **Selects and routes** — selects from the ranked list, favoring higher-scored providers 5. **Falls back if needed** — if the provider fails and `allow_fallbacks` is true, retries with the next candidate (up to `max_fallback_attempts`) See [Python SDK](/sdk/python#with-routing-options) or [TypeScript SDK](/sdk/typescript#with-routing-options) for routing code examples. --- ## Page: Advanced Routing > Section: Use suffix shortcuts Append a suffix to any model name for quick routing configuration: | Suffix | Strategy | Effect | |--------|----------|--------| | `:floor` | `cheapest` | Absolute lowest cost | | `:cost` | `cost` | Cost-optimized with more spread | | `:nitro` | `speed` | Fastest overall provider | | `:fast` | `ttft` | Fastest time to first token | | `:balanced` | `balanced` | Weighted combination | ```python Python import os from auriko import Client client = Client( api_key=os.environ["AURIKO_API_KEY"], base_url="https://api.auriko.ai/v1" ) # Cheapest provider for gpt-4o response = client.chat.completions.create( model="gpt-4o:floor", messages=[{"role": "user", "content": "Hello!"}] ) # Fastest time to first token response = client.chat.completions.create( model="claude-sonnet-4-20250514:fast", messages=[{"role": "user", "content": "Hello!"}] ) ``` ```typescript TypeScript import { Client } from "@auriko/sdk"; const client = new Client({ apiKey: process.env.AURIKO_API_KEY, baseUrl: "https://api.auriko.ai/v1", }); // Cheapest provider for gpt-4o const response = await client.chat.completions.create({ model: "gpt-4o:floor", messages: [{ role: "user", content: "Hello!" }], }); // Fastest time to first token const fast = await client.chat.completions.create({ model: "claude-sonnet-4-20250514:fast", messages: [{ role: "user", content: "Hello!" }], }); ``` The router parses suffixes only when the model ID contains exactly one colon. Fine-tuned models with multiple colons (for example, `ft:gpt-4o:org:custom`) pass through unchanged. --- ## Page: Advanced Routing > Section: Route across models Pass `models` instead of `model` to route across multiple models (mutually exclusive with `model`, max 10): ```python Python import os from auriko import Client client = Client( api_key=os.environ["AURIKO_API_KEY"], base_url="https://api.auriko.ai/v1" ) # Pool mode (default): best provider across all models response = client.chat.completions.create( models=["gpt-4o", "claude-sonnet-4-20250514", "gemini-2.0-flash"], messages=[{"role": "user", "content": "Hello!"}], routing={"mode": "pool"} ) # Fallback mode: try models in order response = client.chat.completions.create( models=["gpt-4o", "claude-sonnet-4-20250514"], messages=[{"role": "user", "content": "Hello!"}], routing={"mode": "fallback"} ) ``` ```typescript TypeScript import { Client } from "@auriko/sdk"; const client = new Client({ apiKey: process.env.AURIKO_API_KEY, baseUrl: "https://api.auriko.ai/v1", }); // Pool mode (default): best provider across all models const response = await client.chat.completions.create({ models: ["gpt-4o", "claude-sonnet-4-20250514", "gemini-2.0-flash"], messages: [{ role: "user", content: "Hello!" }], routing: { mode: "pool" }, }); // Fallback mode: try models in order const fallback = await client.chat.completions.create({ models: ["gpt-4o", "claude-sonnet-4-20250514"], messages: [{ role: "user", content: "Hello!" }], routing: { mode: "fallback" }, }); ``` | Mode | Behavior | |------|----------| | `pool` (default) | Select the best-scoring provider across all requested models | | `fallback` | Try all providers for the first model, then the second model, and so on | --- ## Page: Advanced Routing > Section: Set quality constraints Filter providers by performance requirements: | Constraint | Type | Description | |-----------|------|-------------| | `min_throughput_tps` | number | Minimum tokens per second | | `min_success_rate` | number (0–1) | Minimum success rate | | `max_cost_per_1m` | number | Maximum cost per 1M tokens (average of input + output) | | `max_ttft_ms` | number | Maximum time to first token in milliseconds | | `weights` | object | Custom scoring weights: `{ cost, ttft, throughput, reliability }`. Overrides preset. | ```python Python response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Hello!"}], routing={ "optimize": "balanced", "min_throughput_tps": 50, "min_success_rate": 0.95, "weights": {"cost": 0.6, "ttft": 0.4} } ) ``` ```typescript TypeScript const response = await client.chat.completions.create({ model: "gpt-4o", messages: [{ role: "user", content: "Hello!" }], routing: { optimize: "balanced", min_throughput_tps: 50, min_success_rate: 0.95, weights: { cost: 0.6, ttft: 0.4 }, }, }); ``` Custom weights let you control the exact tradeoff between cost, latency, throughput, and reliability. When provided, they override the preset coefficients. The server normalizes weights to sum to 1.0. --- ## Page: Advanced Routing > Section: Data policy Control how providers handle your data: | Policy | Description | |--------|-------------| | `none` (default) | No restrictions | | `no_training` | Provider must not use data for training | | `zdr` | Zero data retention — strictest policy | The hierarchy is `zdr` > `no_training` > `none`. When a per-request policy intersects with an account-level policy, the most restrictive one wins. ```python Python response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Sensitive financial data..."}], routing={"data_policy": "zdr"} ) ``` ```typescript TypeScript const response = await client.chat.completions.create({ model: "gpt-4o", messages: [{ role: "user", content: "Sensitive financial data..." }], routing: { data_policy: "zdr" }, }); ``` --- ## Page: Advanced Routing > Section: Provider alias normalization Provider names in `providers` and `exclude_providers` are case-insensitive and support aliases: | Alias | Canonical name | |-------|----------------| | `google`, `google_ai`, `googleai`, `gemini` | `google_ai_studio` | | `fireworks` | `fireworks_ai` | | `together` | `together_ai` | Unrecognized names pass through as-is (lowercased). --- ## Page: Advanced Routing > Section: Configure fallbacks By default, Auriko retries with alternative providers on 429 (rate limit), 5xx (server error), and timeout responses. | Setting | Default | Description | |---------|---------|-------------| | `allow_fallbacks` | `true` | Enable automatic fallback to alternative providers | | `max_fallback_attempts` | 3 | Maximum fallback attempts (not counting the primary attempt) | Timeouts: 10 seconds for the first byte on streaming requests, 60 seconds total for non-streaming requests. ```python Python response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Hello!"}], routing={ "allow_fallbacks": True, "max_fallback_attempts": 5 } ) ``` ```typescript TypeScript const response = await client.chat.completions.create({ model: "gpt-4o", messages: [{ role: "user", content: "Hello!" }], routing: { allow_fallbacks: true, max_fallback_attempts: 5, }, }); ``` --- ## Page: Advanced Routing > Section: Set workspace defaults Set default routing options for all requests in a workspace: ```bash # Get current defaults curl https://api.auriko.ai/v1/workspaces/{workspace_id}/routing-defaults \ -H "Authorization: Bearer $SESSION_JWT" # Set defaults curl -X PATCH https://api.auriko.ai/v1/workspaces/{workspace_id}/routing-defaults \ -H "Authorization: Bearer $SESSION_JWT" \ -H "Content-Type: application/json" \ -d '{ "optimize": "cost", "data_policy": "no_training" }' ``` Routing defaults use session authentication, not API key authentication. See [Authentication](/api-reference/authentication). To clear all routing defaults, send an empty object `{}` as the PATCH body. Per-request `routing` options override workspace defaults. Model suffix overrides sit between workspace defaults and per-request options. ### Precedence 1. Per-request `routing` options (highest) 2. Model suffix overrides (for example, `:floor`) 3. Workspace routing defaults (lowest) === # Auriko API Reference --- ## Page: Introduction Auriko is an LLM routing layer that applies quantitative trading methodology to inference optimization. You can access a growing list of models across providers through a single API, define your own routing strategy, and switch models without changing application code. Get your first response in under 2 minutes Complete API documentation Native Python SDK with OpenAI compatibility Native TypeScript SDK with full typing --- ## Page: Introduction > Section: What Auriko Provides - **[Routing and arbitrage](/guides/routing-options)** — Cost, latency, and quality optimization across models and providers. Auriko runs deep [prompt-caching optimization](/guides/prompt-caching). - **[Automatic failover](/guides/error-handling)** — Redundancy and provider-aware rate limit management. - **[Budget controls](/guides/budget-management)** — Spending limits and alerts at the workspace or API key level. - **[BYOK](/platform/byok)** — Use your own provider keys, platform keys, or both. Auriko provides native SDKs for [Python](/sdk/python) and [TypeScript](/sdk/typescript). It's also OpenAI-compatible — any existing OpenAI client or [framework](/frameworks/langchain) works without modification. --- ## Page: Introduction > Section: Resources - [Available Models](https://optimal-inference.vercel.app/models) — Supported models and providers - [Pricing](https://optimal-inference.vercel.app/pricing) — Pricing information --- ## Page: Introduction > Section: Machine-readable sources You can access Auriko's documentation in machine-readable formats for AI agents and programmatic use. - [llms.txt](/llms.txt) — Index of all documentation sections in plaintext, following the [llms.txt standard](https://llmstxt.org/) - [llms-full.txt](/llms-full.txt) — Complete documentation in a single file - [OpenAPI spec](/openapi.yaml) — OpenAPI 3.1 specification for all API endpoints --- ## Page: Quickstart Get your first LLM response through Auriko in under 2 minutes. --- ## Page: Quickstart > Section: Prerequisites - An [Auriko account](https://auriko.ai/signup) with an API key --- ## Page: Quickstart > Section: 1. Get an API Key Create your account and get an API key from the dashboard **Base URL:** Use `https://api.auriko.ai/v1` as your base URL. This value matches `servers[0].url` in our [OpenAPI spec](https://github.com/zxyaction/optimal_inference/blob/main/api_gateway/openapi/auriko-api.yaml) and is the canonical endpoint for all API requests. --- ## Page: Quickstart > Section: 2. Install ```bash Python pip install auriko ``` ```bash TypeScript npm install @auriko/sdk ``` ```bash OpenAI SDK (Alternative) pip install openai ``` --- ## Page: Quickstart > Section: 3. Make Your First Request ```python Python (Auriko SDK) import os from auriko import Client client = Client( api_key=os.environ["AURIKO_API_KEY"], base_url="https://api.auriko.ai/v1" ) response = client.chat.completions.create( model="gpt-5.4", messages=[{"role": "user", "content": "Hello!"}] ) print(response.choices[0].message.content) if response.routing_metadata: print(f"Provider: {response.routing_metadata.provider}") if response.routing_metadata.cost: print(f"Cost: ${response.routing_metadata.cost.billable_cost_usd:.6f}") ``` ```typescript TypeScript import { Client } from "@auriko/sdk"; const client = new Client({ apiKey: process.env.AURIKO_API_KEY, baseUrl: "https://api.auriko.ai/v1", }); const response = await client.chat.completions.create({ model: "gpt-5.4", messages: [{ role: "user", content: "Hello!" }], }); console.log(response.choices[0].message.content); ``` ```python OpenAI SDK (Drop-in) import os from openai import OpenAI client = OpenAI( api_key=os.environ["AURIKO_API_KEY"], base_url="https://api.auriko.ai/v1" ) response = client.chat.completions.create( model="gpt-5.4", messages=[{"role": "user", "content": "Hello!"}] ) print(response.choices[0].message.content) ``` --- ## Page: Quickstart > Section: 4. Enable Routing Features (Optional) ```python Python response = client.chat.completions.create( model="gpt-5.4", messages=[{"role": "user", "content": "Hello!"}], routing={ "optimize": "cost", # Optimize for cost "max_ttft_ms": 200, # Max 200ms to first token } ) ``` ```typescript TypeScript const response = await client.chat.completions.create({ model: "gpt-5.4", messages: [{ role: "user", content: "Hello!" }], routing: { optimize: "cost", max_ttft_ms: 200, }, }); ``` --- ## Page: Quickstart > Section: Next Steps Full API documentation Configure cost/speed optimization Real-time streaming responses Use with LangChain --- ## Page: Switching from OpenAI Switch from OpenAI to Auriko in 3 lines of code. Your existing chat completions, streaming, and tool calling code works without changes. --- ## Page: Switching from OpenAI > Section: Before and after ```python Python # Before (OpenAI) from openai import OpenAI client = OpenAI(api_key="sk-...") # After (Auriko) import os from auriko import Client client = Client( api_key=os.environ["AURIKO_API_KEY"], base_url="https://api.auriko.ai/v1" ) # Everything else stays the same response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Hello!"}] ) ``` ```typescript TypeScript // Before (OpenAI) import OpenAI from "openai"; const client = new OpenAI({ apiKey: "sk-..." }); // After (Auriko) import { Client } from "@auriko/sdk"; const client = new Client({ apiKey: process.env.AURIKO_API_KEY, baseUrl: "https://api.auriko.ai/v1", }); // Everything else stays the same const response = await client.chat.completions.create({ model: "gpt-4o", messages: [{ role: "user", content: "Hello!" }], }); ``` --- ## Page: Switching from OpenAI > Section: What works identically All standard OpenAI API features work through Auriko with no code changes: | Feature | Status | |---------|--------| | Chat completions | Fully compatible | | Streaming | Fully compatible | | Tool calling | Fully compatible | | Structured output | Fully compatible | | Models list | Fully compatible | | Async client | Fully compatible | | Error classes | Fully compatible | | Retry logic | Built-in (max 2 retries, exponential backoff) | --- ## Page: Switching from OpenAI > Section: What's new Auriko adds capabilities on top of the OpenAI-compatible interface: - **Routing options** — optimize for cost, speed, or throughput across providers. See [Routing options](/guides/routing-options). - **Cost optimization** — save 30-70% by routing to the cheapest provider. See [Cost optimization](/guides/cost-optimization). - **Prompt caching** — automatic cache injection for all supported providers. See [Prompt caching](/guides/prompt-caching). - **Budget management** — set spending limits per workspace, API key, or BYOK provider. See [Budget management](/guides/budget-management). - **Response headers** — every response carries `request_id`, rate limit headers, and credit usage. See [Python SDK](/sdk/python#read-response-headers). --- ## Page: Switching from OpenAI > Section: Use OpenAI SDK directly You don't need the `auriko` package at all. The OpenAI SDK works with a `base_url` override: ```python Python import os from openai import OpenAI client = OpenAI( api_key=os.environ["AURIKO_API_KEY"], base_url="https://api.auriko.ai/v1" ) response = client.chat.completions.create( model="claude-sonnet-4-20250514", messages=[{"role": "user", "content": "Hello from Claude via Auriko!"}] ) print(response.choices[0].message.content) ``` ```typescript TypeScript import OpenAI from "openai"; const client = new OpenAI({ apiKey: process.env.AURIKO_API_KEY, baseURL: "https://api.auriko.ai/v1", }); const response = await client.chat.completions.create({ model: "claude-sonnet-4-20250514", messages: [{ role: "user", content: "Hello from Claude via Auriko!" }], }); console.log(response.choices[0].message.content); ``` This approach lets you access any model from any provider (Anthropic, Google, Meta, and more) through the familiar OpenAI client. --- ## Page: Switching from OpenAI > Section: Error mapping When using the OpenAI SDK directly, convert errors to typed Auriko errors with `map_openai_error()`: ```python import os import openai from auriko import map_openai_error, RateLimitError, BudgetExceededError client = openai.OpenAI( api_key=os.environ["AURIKO_API_KEY"], base_url="https://api.auriko.ai/v1" ) try: response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Hello!"}] ) except openai.APIStatusError as e: auriko_error = map_openai_error(e) if isinstance(auriko_error, RateLimitError): print(f"Rate limited. Retry after: {auriko_error.response_headers.rate_limit_reset}") elif isinstance(auriko_error, BudgetExceededError): print(f"Budget exceeded: {auriko_error.message}") else: raise auriko_error ``` `map_openai_error()` is Python-only. See [Error Handling](/guides/error-handling) for the full error handling guide. --- ## Page: Switching from OpenAI > Section: Access routing metadata When using the OpenAI SDK directly, extract routing metadata from responses with `parse_routing_metadata()`: ```python from auriko.route_types import parse_routing_metadata response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Hello!"}] ) metadata = parse_routing_metadata(response) if metadata: print(f"Provider: {metadata.provider}") if metadata.cost: print(f"Cost: ${metadata.cost.billable_cost_usd}") ``` `parse_routing_metadata()` is Python-only. The Auriko SDK exposes `routing_metadata` as a typed property on `ChatCompletion` and `Stream` directly. Use the native SDK for the best experience. For the full native SDK experience with typed responses and errors, see the [Python SDK Guide](/sdk/python) or [TypeScript SDK Guide](/sdk/typescript). --- ## Page: API Overview The Auriko API is OpenAI-compatible, meaning you can use it as a drop-in replacement for OpenAI's API. --- ## Page: API Overview > Section: Base URL ``` https://api.auriko.ai/v1 ``` --- ## Page: API Overview > Section: Endpoints ### Inference API | Endpoint | Method | Description | |----------|--------|-------------| | [`/v1/chat/completions`](/api-reference/chat-completions) | POST | Create a chat completion | | [`/v1/me`](/api-reference/get-api-key-identity) | GET | Get API key identity | ### Discovery API | Endpoint | Method | Description | |----------|--------|-------------| | [`/v1/registry/providers`](/api-reference/list-providers) | GET | List available providers | | [`/v1/registry/models`](/api-reference/list-registry-models) | GET | List canonical models | | [`/v1/directory/models`](/api-reference/list-directory-models) | GET | Full model directory with pricing | ### Management API | Resource | Endpoints | Description | |----------|-----------|-------------| | [Workspaces](/api-reference/create-workspace) | 4 | Create, list, get, update workspaces | | [Routing](/api-reference/get-routing-defaults) | 2 | Get and update workspace routing defaults | | [Budgets](/api-reference/list-budgets) | 5 | CRUD for spending limits | | [API Keys](/api-reference/create-api-key) | 4 | Create, list, revoke keys; usage stats | | [Billing](/api-reference/get-credit-balance) | 1 | Credit balance | | [Provider Keys](/api-reference/add-provider-key) | 7 | BYOK key management | Management API endpoints use session authentication. Workspace and budget reads also accept API keys. See [Authentication](/api-reference/authentication#session-authentication). See also: [Team management](/platform/team-management), [Budget management](/guides/budget-management), [Bring Your Own Key](/platform/byok). --- ## Page: API Overview > Section: OpenAI Compatibility Auriko supports the same request/response format as OpenAI. Existing code using OpenAI's API can switch to Auriko by changing: 1. **Base URL:** `https://api.openai.com/v1` → `https://api.auriko.ai/v1` 2. **API Key:** Use your Auriko API key (starts with `ak_`) --- ## Page: API Overview > Section: Auriko Extensions In addition to OpenAI-compatible fields, Auriko responses carry: - **`routing_metadata`** - Information about which provider handled the request - **`routing`** (request) - Options to optimize for cost, speed, or throughput - **`auriko_metadata`** (request) - Custom tags and trace IDs for request tracking ### Response Extensions Every chat completion response carries a `routing_metadata` object: ```json { "routing_metadata": { "provider": "openai", "provider_model_id": "gpt-5.4", "tier": "standard", "model_canonical": "gpt-5.4", "routing_strategy": "balanced", "total_latency_ms": 1234, "ttft_ms": 312, "candidates_total": 5, "candidates_viable": 3, "routing_decision_ms": 2.1, "cost": { "input_tokens": 100, "output_tokens": 50, "provider_cost_usd": 0.000135, "billable_cost_usd": 0.00015 } } } ``` #### Field reference | Field | Type | Always present | Description | |-------|------|----------------|-------------| | `provider` | string | yes | Provider that served the request (e.g., `openai`, `anthropic`) | | `provider_model_id` | string | yes | Provider's internal model ID | | `tier` | string | no | Pricing tier when applicable (e.g., `flex`, `standard`). Omitted for providers without tiers. | | `model_canonical` | string | yes | Canonical model ID from the request | | `routing_strategy` | string | yes | Strategy used: `cost`, `speed`, `balanced`, `ttft`, `throughput`, `cheapest`, or `custom`. `custom` is returned when explicit `routing.weights` are provided. | | `candidates_total` | number | yes | Total provider candidates before filtering | | `candidates_viable` | number | yes | Candidates remaining after constraint filtering | | `routing_decision_ms` | number | yes | Time spent on the routing decision (ms) | | `ttft_ms` | number | no | Time to first token (ms). Streaming responses only. | | `total_latency_ms` | number | yes | Total request latency (ms) | | `cost` | object | no | Cost breakdown. Present when token usage is available and pricing is token-based. | | `fallback_chain` | array | no | Fallback attempt history. Present only when the primary provider failed and a fallback was used. | | `warnings` | string[] | no | Warnings about ignored or unsupported routing configuration. Omitted when empty. | **`cost` fields:** `input_tokens` (number), `output_tokens` (number), `provider_cost_usd` (number), `billable_cost_usd` (number). **`fallback_chain` entries:** `provider` (string), `status` (`"success"` or `"failed"`), `reason` (string, present on failed entries only). ### Request Extensions Pass routing options to optimize your requests: ```json { "model": "gpt-5.4", "messages": [...], "routing": { "optimize": "cost", "max_ttft_ms": 200 } } ``` ### Request metadata Attach custom metadata to requests for tracking and observability. Auriko strips `auriko_metadata` before forwarding to the provider. The `auriko_` prefix avoids collision with OpenAI/Anthropic native `metadata` fields. ```json { "model": "gpt-5.4", "messages": [...], "auriko_metadata": { "tags": ["production", "chatbot"], "user_id": "user_abc123", "trace_id": "trace_xyz789", "custom_fields": { "environment": "production", "feature": "customer-support" } } } ``` | Field | Type | Limits | |-------|------|--------| | `tags` | `string[]` | Max 100 tags, each max 50 chars | | `user_id` | `string` | Max 255 chars | | `trace_id` | `string` | Max 255 chars | | `custom_fields` | `Record` | Max 10 fields, keys max 50 chars, values max 200 chars | --- ## Page: API Overview > Section: Response headers Every response carries custom headers organized by category. ### Request tracing | Header | Description | |--------|-------------| | `X-Request-ID` | Unique request identifier for support and debugging | ### Routing headers | Header | Description | |--------|-------------| | `X-Provider-Used` | Provider that served the request | | `X-Model-Requested` | Model ID from the original request | | `X-Model-Canonical` | Canonical model ID after alias resolution | | `X-Model-Used` | Actual model ID sent to the provider | | `X-Routing-Strategy` | Strategy used (`cost`, `speed`, `balanced`, etc.) | | `X-Routing-Time-Ms` | Time spent on routing decision | | `X-Api-Key-Source` | Key type used (`platform` or `byok`) | | `X-Multi-Model-Count` | Number of models in a multi-model request | ### Fallback headers | Header | Description | |--------|-------------| | `X-Fallback-Enabled` | Whether fallback is enabled for this request | | `X-Fallback-Used` | Whether a fallback was triggered | | `X-Fallback-Depth` | Number of fallback attempts made | | `X-Fallback-Original-Provider` | Provider of the first (failed) attempt | | `X-Fallback-Attempted-Providers` | Comma-separated list of all attempted providers | | `X-Fallback-Reason` | Reason the primary provider failed | | `X-Fallback-Total-Time-Ms` | Total time across all attempts | | `X-Fallback-Max-Attempts` | Maximum fallback attempts configured | ### Error diagnostic headers | Header | Description | |--------|-------------| | `X-Error-Provider` | Provider that returned the error | | `X-Error-Type` | Error classification | | `X-Error-Retryable` | Whether the error is safe to retry | ### Billing headers | Header | Description | |--------|-------------| | `X-Credits-Balance-Microdollars` | Current credit balance in microdollars | | `X-Credits-Tier` | Current billing tier | Per-key rate limit headers are covered in [Rate limits](/platform/rate-limits#rate-limit-headers). ### Budget headers Returned when budgets are configured. Error headers appear on 402 responses when a budget is exceeded. Spend and limit headers appear on successful responses. | Header | Description | |--------|-------------| | `X-Budget-Exceeded` | `true` when the request was rejected for exceeding a budget (402 only) | | `X-Budget-Exceeded-Period` | Budget period that was exceeded: `daily`, `weekly`, or `monthly` (402 only) | | `X-Budget-Exceeded-Scope` | Scope of the exceeded budget: `workspace`, `api_key`, or `byok_provider` (402 only) | | `X-Budget-Daily-Spend` | Current daily spend in USD | | `X-Budget-Daily-Limit` | Configured daily budget limit in USD | | `X-Budget-Weekly-Spend` | Current weekly spend in USD | | `X-Budget-Weekly-Limit` | Configured weekly budget limit in USD | | `X-Budget-Monthly-Spend` | Current monthly spend in USD | | `X-Budget-Monthly-Limit` | Configured monthly budget limit in USD | Spend and limit headers appear only for budget periods that are configured. If a workspace has only a monthly budget, daily and weekly headers are omitted. ### Cache headers | Header | Description | |--------|-------------| | `X-Cache-Savings-Percent` | Percentage of tokens saved via prompt caching | Learn how to authenticate your requests --- ## Page: Authentication All API requests require authentication using a Bearer token. --- ## Page: Authentication > Section: API Keys API keys are prefixed with `ak_` and can be created in your [dashboard](https://auriko.ai/dashboard). Keep your API key secret. Do not share it or commit it to version control. --- ## Page: Authentication > Section: Use your API key Include your API key in the `Authorization` header: ```bash curl https://api.auriko.ai/v1/chat/completions \ -H "Authorization: Bearer $AURIKO_API_KEY" \ -H "Content-Type: application/json" \ -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello"}]}' ``` --- ## Page: Authentication > Section: SDK Authentication ```python Python import os from auriko import Client # Option 1: Pass via environment variable (recommended) client = Client( api_key=os.environ["AURIKO_API_KEY"], base_url="https://api.auriko.ai/v1" ) # Option 2: Auto-detect from AURIKO_API_KEY env var client = Client(base_url="https://api.auriko.ai/v1") ``` ```typescript TypeScript import { Client } from "@auriko/sdk"; // Option 1: Pass via environment variable (recommended) const client = new Client({ apiKey: process.env.AURIKO_API_KEY, baseUrl: "https://api.auriko.ai/v1", }); // Option 2: Auto-detect from AURIKO_API_KEY env var const client = new Client({ baseUrl: "https://api.auriko.ai/v1", }); ``` --- ## Page: Authentication > Section: Environment Variables Set your API key as an environment variable for security: ```bash export AURIKO_API_KEY=ak_your_api_key_here ``` Then use the SDK without passing the key directly: ```python from auriko import Client client = Client(base_url="https://api.auriko.ai/v1") ``` --- ## Page: Authentication > Section: Error Responses | Status | Code | Description | |--------|------|-------------| | 401 | `invalid_api_key` | API key is invalid or missing | ```json 401 Response { "error": { "message": "Invalid API key provided", "type": "invalid_request_error", "code": "invalid_api_key" } } ``` --- ## Page: Authentication > Section: Session authentication Management API endpoints use session tokens for authentication. Workspace and budget read endpoints also accept API keys. The dashboard handles session authentication automatically. For programmatic access, use the returned `access_token` from your sign-in flow as a Bearer token: ```bash curl https://api.auriko.ai/v1/workspaces \ -H "Authorization: Bearer $SESSION_JWT" ``` --- ## Page: Authentication > Section: Authentication summary | Category | Auth method | Token prefix | |----------|-------------|-------------| | Inference (`/v1/chat/completions`, `/v1/models`, `/v1/me`) | API key | `ak_` | | Discovery (`/v1/registry/*`, `/v1/directory/*`) | None required | — | | Workspace & budget reads | API key or Session token | `ak_` / JWT | | All other management | Session token | JWT | --- ## Page: Errors Auriko uses standard HTTP status codes and returns detailed error information in the response body. --- ## Page: Errors > Section: Error response format All errors follow this format: ```json { "error": { "message": "Human-readable error message", "type": "error_type", "code": "error_code", "param": "optional_parameter_name" } } ``` --- ## Page: Errors > Section: HTTP status codes | Status | Description | |--------|-------------| | 400 | Bad Request — Invalid parameters or missing required fields | | 401 | Unauthorized — Invalid API key or BYOK provider key | | 402 | Payment Required — Insufficient credits or budget exceeded | | 403 | Forbidden — Insufficient permissions for this action | | 404 | Not Found — Model not in catalog | | 429 | Too Many Requests — Rate limit exceeded | | 500 | Internal Server Error — Unexpected Auriko error | | 502 | Bad Gateway — Upstream provider error | | 503 | Service Unavailable — No providers available or service down | | 504 | Gateway Timeout — Upstream provider timeout | --- ## Page: Errors > Section: Error codes ### Authentication (401) | Code | Description | |------|-------------| | `invalid_api_key` | The API key is invalid or missing | | `provider_auth_error` | BYOK key authentication failed at the upstream provider | `provider_auth_error` means your own provider key (BYOK) was rejected. This is distinct from `invalid_api_key`, which means your Auriko API key is invalid. ### Request errors (400) | Code | Description | |------|-------------| | `invalid_request` | The request body is malformed or a parameter has an invalid value | | `missing_required_parameter` | A required parameter is missing | The `param` field in the error response identifies the offending field (for example, `messages` or `routing.only_byok`). ### Billing errors (402) | Code | Description | |------|-------------| | `insufficient_quota` | Account has insufficient credits | | `budget_exceeded` | Budget limit hit (workspace, key, or BYOK scope) | ### Authorization (403) | Code | Description | |------|-------------| | `forbidden` | You don't have permission for this action | API keys are read-only for management endpoints (budgets, workspaces). Write operations require session authentication. See [Budget management](/guides/budget-management#authentication). ### Not found (404) | Code | Description | |------|-------------| | `model_not_found` | The specified model is not in the catalog | ### Rate limiting (429) | Code | Description | |------|-------------| | `rate_limit_exceeded` | Too many requests — back off and retry | ### Server errors (500) | Code | Description | |------|-------------| | `internal_error` | An unexpected error occurred on Auriko's side | ### Provider errors (502, 503, 504) | Code | Status | Description | |------|--------|-------------| | `provider_error` | 502 | Upstream provider returned an error | | `provider_error` | 504 | Upstream provider timed out | | `no_providers_available` | 503 | No providers can serve this model/request | | `service_unavailable` | 503 | Auriko service temporarily unavailable | --- ## Page: Errors > Section: Retry guidance | Status | Retryable | Recommended action | |--------|-----------|-------------------| | 400 | No | Fix the request parameters | | 401 (`invalid_api_key`) | No | Check your Auriko API key | | 401 (`provider_auth_error`) | No | Check your BYOK provider key | | 402 | No | Add credits or raise budget limit | | 403 | No | Check permissions or use session authentication | | 404 | No | Use a valid model name (see [Models](https://optimal-inference.vercel.app/models)) | | 429 | Yes | Back off using `Retry-After` header or exponential backoff | | 500 | Yes | Retry with backoff | | 502 | Yes | Retry — different provider may be selected | | 503 | Yes | Retry with backoff | | 504 | Yes | Retry — upstream provider timed out | --- ## Page: Errors > Section: Provider error mapping When an upstream provider returns an error, Auriko maps it to a client-facing response: | Upstream condition | Client status | Client code | |-------------------|---------------|-------------| | 504 timeout | 504 | `provider_error` | | Other 5xx | 502 | `provider_error` | | 401 auth failure | 401 | `provider_auth_error` | | 429 rate limit | 429 | `rate_limit_exceeded` | | 400 bad request | 400 | `invalid_request` | | Other 4xx | 502 | `provider_error` | --- ## Page: Errors > Section: Handle SDK errors ```python Python import os from auriko import ( Client, AurikoAPIError, AuthenticationError, RateLimitError, BudgetExceededError, ModelNotFoundError, ProviderError, # Also available: InvalidRequestError, InsufficientCreditsError, # InternalError, ProviderAuthError, ServiceUnavailableError ) client = Client( api_key=os.environ["AURIKO_API_KEY"], base_url="https://api.auriko.ai/v1" ) try: response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Hello!"}] ) except AuthenticationError as e: print(f"Check your API key: {e}") except RateLimitError as e: print(f"Rate limited, retry later: {e}") except BudgetExceededError as e: print(f"Budget exceeded: {e}") except ModelNotFoundError as e: print(f"Model not found: {e}") except ProviderError as e: print(f"Provider error: {e}") except AurikoAPIError as e: # Catches all other Auriko errors print(f"API error ({e.status_code}): {e}") ``` ```typescript TypeScript import { Client, AurikoAPIError, AuthenticationError, RateLimitError, BudgetExceededError, ModelNotFoundError, ProviderError, // Also available: InvalidRequestError, InsufficientCreditsError, // InternalError, ProviderAuthError, ServiceUnavailableError } from "@auriko/sdk"; const client = new Client({ apiKey: process.env.AURIKO_API_KEY, baseUrl: "https://api.auriko.ai/v1", }); try { const response = await client.chat.completions.create({ model: "gpt-4o", messages: [{ role: "user", content: "Hello!" }], }); } catch (e) { if (e instanceof AuthenticationError) { console.log(`Check your API key: ${e.message}`); } else if (e instanceof RateLimitError) { console.log(`Rate limited, retry later: ${e.message}`); } else if (e instanceof BudgetExceededError) { console.log(`Budget exceeded: ${e.message}`); } else if (e instanceof ModelNotFoundError) { console.log(`Model not found: ${e.message}`); } else if (e instanceof ProviderError) { console.log(`Provider error: ${e.message}`); } else if (e instanceof AurikoAPIError) { console.log(`API error (${e.statusCode}): ${e.message}`); } } ``` --- ## Page: Errors > Section: Retry logic For transient errors (429, 500, 502, 503, 504), the SDK includes built-in retries with exponential backoff. See the [Error Handling guide](/guides/error-handling) for retry configuration and custom retry patterns. ```python import os from auriko import Client # Default: 2 retries with exponential backoff client = Client( api_key=os.environ["AURIKO_API_KEY"], base_url="https://api.auriko.ai/v1" ) # Disable retries for manual control client = Client( api_key=os.environ["AURIKO_API_KEY"], base_url="https://api.auriko.ai/v1", max_retries=0 ) ``` The Auriko SDK includes built-in retry logic with configurable `max_retries` parameter. The SDK respects `Retry-After` headers when present. --- ## Page: Create Chat Completion Creates a model response for the given chat conversation. Auriko routes the request to the optimal provider based on your routing preferences (cost, speed, throughput, etc.). --- ## Page: Create Chat Completion > Section: Auriko Extensions Beyond OpenAI compatibility, this endpoint supports: - **Multi-model routing**: Use `models[]` instead of `model` to route across multiple models - **Routing options**: Control provider selection with the `routing` object - **Provider extensions**: Pass provider-specific parameters with `extensions` - **Cost transparency**: Response includes `routing_metadata` with cost breakdown All parameters, request/response schemas, and examples are auto-generated from the [OpenAPI specification](/api-reference/overview). --- ## Page: Get API key identity Returns the identity associated with your API key. Use this to discover your `workspace_id` for management API calls. --- ## Page: List Available Providers Returns all LLM providers available on the Auriko platform. This endpoint is public and does not require authentication. All parameters, request/response schemas, and examples are auto-generated from the [OpenAPI specification](/api-reference/overview). --- ## Page: List Canonical Models Returns all canonical models in the Auriko registry with provider availability and metadata. This endpoint is public and does not require authentication. All parameters, request/response schemas, and examples are auto-generated from the [OpenAPI specification](/api-reference/overview). --- ## Page: Full Model Directory Returns the complete model directory with detailed provider information including context windows, capabilities, pricing tiers, and modalities. This is the richest model metadata endpoint. This endpoint is public and does not require authentication. All parameters, request/response schemas, and examples are auto-generated from the [OpenAPI specification](/api-reference/overview). --- ## Page: Create a Workspace Creates a new workspace. The authenticated user becomes the owner. This endpoint uses session authentication, not API key authentication. See [Authentication](/api-reference/authentication). See also: [Team management](/platform/team-management). All parameters, request/response schemas, and examples are auto-generated from the [OpenAPI specification](/api-reference/overview). --- ## Page: List Workspaces Lists all workspaces the authenticated user is a member of. This endpoint accepts both API key and session authentication. See [Authentication](/api-reference/authentication). All parameters, request/response schemas, and examples are auto-generated from the [OpenAPI specification](/api-reference/overview). --- ## Page: Get Workspace Details Returns details for a workspace the authenticated user is a member of. This endpoint accepts both API key and session authentication. See [Authentication](/api-reference/authentication). All parameters, request/response schemas, and examples are auto-generated from the [OpenAPI specification](/api-reference/overview). --- ## Page: Update Workspace Updates workspace settings. Requires owner role. This endpoint uses session authentication, not API key authentication. See [Authentication](/api-reference/authentication). All parameters, request/response schemas, and examples are auto-generated from the [OpenAPI specification](/api-reference/overview). --- ## Page: Get Routing Defaults Returns the workspace routing defaults. Any workspace member can read. This endpoint uses session authentication, not API key authentication. See [Authentication](/api-reference/authentication). See also: [Routing options](/guides/routing-options). All parameters, request/response schemas, and examples are auto-generated from the [OpenAPI specification](/api-reference/overview). --- ## Page: Update Routing Defaults Updates workspace routing defaults. Requires owner or admin role. - Present fields are updated; omitted fields retain their current value - Fields set to `null` are cleared - Empty object `{}` clears all routing defaults This endpoint uses session authentication, not API key authentication. See [Authentication](/api-reference/authentication). See also: [Routing options](/guides/routing-options). All parameters, request/response schemas, and examples are auto-generated from the [OpenAPI specification](/api-reference/overview). --- ## Page: List Budgets Lists all budgets for the workspace. Any workspace member can read. This endpoint accepts both API key and session authentication. See [Authentication](/api-reference/authentication). See also: [Budget management](/guides/budget-management). All parameters, request/response schemas, and examples are auto-generated from the [OpenAPI specification](/api-reference/overview). --- ## Page: Create a Budget Creates a budget for the workspace. Requires owner or admin role. - **`workspace`**: Applies to all usage in the workspace - **`api_key`**: Applies to a specific API key (requires `scope_id`) - **`byok_provider`**: Applies to a BYOK provider (requires `scope_provider`) This endpoint uses session authentication, not API key authentication. See [Authentication](/api-reference/authentication). See also: [Budget management](/guides/budget-management). All parameters, request/response schemas, and examples are auto-generated from the [OpenAPI specification](/api-reference/overview). --- ## Page: Get Budget Details Returns a single budget with current spend. Any workspace member can read. This endpoint accepts both API key and session authentication. See [Authentication](/api-reference/authentication). See also: [Budget management](/guides/budget-management). All parameters, request/response schemas, and examples are auto-generated from the [OpenAPI specification](/api-reference/overview). --- ## Page: Update a Budget Updates a budget. Requires owner or admin role. At least one field must be provided. This endpoint uses session authentication, not API key authentication. See [Authentication](/api-reference/authentication). See also: [Budget management](/guides/budget-management). All parameters, request/response schemas, and examples are auto-generated from the [OpenAPI specification](/api-reference/overview). --- ## Page: Delete a Budget Deletes a budget. Requires owner or admin role. This action is irreversible. The budget and its spend history will be permanently removed. This endpoint uses session authentication, not API key authentication. See [Authentication](/api-reference/authentication). See also: [Budget management](/guides/budget-management). All parameters, request/response schemas, and examples are auto-generated from the [OpenAPI specification](/api-reference/overview). --- ## Page: Create an API Key Creates a new API key for the workspace. Requires owner or admin role. The full API key is returned exactly once in the response. It is never stored or retrievable again. Save it securely. This endpoint uses session authentication, not API key authentication. See [Authentication](/api-reference/authentication). All parameters, request/response schemas, and examples are auto-generated from the [OpenAPI specification](/api-reference/overview). --- ## Page: List API Keys Lists API keys for the workspace. Any workspace member can read. Keys are returned with prefixes only — full keys are never retrievable. This endpoint uses session authentication, not API key authentication. See [Authentication](/api-reference/authentication). All parameters, request/response schemas, and examples are auto-generated from the [OpenAPI specification](/api-reference/overview). --- ## Page: Revoke an API Key Revokes an API key. Owners and admins can revoke any key in the workspace. Members can revoke keys they created. This endpoint uses session authentication, not API key authentication. See [Authentication](/api-reference/authentication). All parameters, request/response schemas, and examples are auto-generated from the [OpenAPI specification](/api-reference/overview). --- ## Page: Get API Key Usage Returns usage statistics for an API key. Any workspace member can read. This endpoint uses session authentication, not API key authentication. See [Authentication](/api-reference/authentication). All parameters, request/response schemas, and examples are auto-generated from the [OpenAPI specification](/api-reference/overview). --- ## Page: Get Credit Balance Returns the workspace credit balance, tier, and billing configuration. Requires owner role. This endpoint uses session authentication, not API key authentication. See [Authentication](/api-reference/authentication). All parameters, request/response schemas, and examples are auto-generated from the [OpenAPI specification](/api-reference/overview). --- ## Page: List Supported Providers Returns the list of providers that support bring-your-own-key (BYOK). This endpoint is public and does not require authentication. See also: [Bring Your Own Key](/platform/byok). All parameters, request/response schemas, and examples are auto-generated from the [OpenAPI specification](/api-reference/overview). --- ## Page: Add Provider Key Adds a provider API key (BYOK) to the workspace. This endpoint uses session authentication, not API key authentication. See [Authentication](/api-reference/authentication). See also: [Bring Your Own Key](/platform/byok). All parameters, request/response schemas, and examples are auto-generated from the [OpenAPI specification](/api-reference/overview). --- ## Page: List Provider Keys Lists all provider API keys (BYOK) for the workspace. This endpoint uses session authentication, not API key authentication. See [Authentication](/api-reference/authentication). See also: [Bring Your Own Key](/platform/byok). All parameters, request/response schemas, and examples are auto-generated from the [OpenAPI specification](/api-reference/overview). --- ## Page: Delete Provider Key Deletes a provider API key from the workspace. This endpoint uses session authentication, not API key authentication. See [Authentication](/api-reference/authentication). See also: [Bring Your Own Key](/platform/byok). All parameters, request/response schemas, and examples are auto-generated from the [OpenAPI specification](/api-reference/overview). --- ## Page: Validate Provider Key Re-validates a provider API key to check if it is still active. This endpoint uses session authentication, not API key authentication. See [Authentication](/api-reference/authentication). See also: [Bring Your Own Key](/platform/byok). All parameters, request/response schemas, and examples are auto-generated from the [OpenAPI specification](/api-reference/overview). --- ## Page: Set Default Provider Key Sets the specified provider key as the default for its provider. This endpoint uses session authentication, not API key authentication. See [Authentication](/api-reference/authentication). See also: [Bring Your Own Key](/platform/byok). All parameters, request/response schemas, and examples are auto-generated from the [OpenAPI specification](/api-reference/overview). --- ## Page: Update Provider Key Tier Updates the tier associated with a provider key. This endpoint uses session authentication, not API key authentication. See [Authentication](/api-reference/authentication). See also: [Bring Your Own Key](/platform/byok). All parameters, request/response schemas, and examples are auto-generated from the [OpenAPI specification](/api-reference/overview). --- # OpenAPI Specification Reference The following is extracted from the OpenAPI spec — the canonical source of truth for endpoint parameters, request/response schemas, and error codes. --- ## Authentication (ApiKeyAuth) API key authentication. Keys start with `ak_` prefix. Example: `Authorization: Bearer ak_live_xxxxxxxxxxxx` ## Authentication (UserAuth) User session authentication for management endpoints. Use a Supabase session JWT as the bearer token. Example: `Authorization: Bearer eyJhbGciOiJIUzI1NiIs...` ## Endpoint: POST /v1/chat/completions **Create a chat completion** Creates a model response for the given chat conversation. Auriko routes the request to the optimal provider based on your routing preferences (cost, speed, throughput, etc.). ## Streaming When `stream: true`, responses are delivered as Server-Sent Events (SSE). The final event contains `routing_metadata` with routing decision details. ## Multi-Model Routing Use `models[]` instead of `model` to enable multi-model routing: - `routing.mode: "pool"` (default): Best provider across all models - `routing.mode: "fallback"`: Try models in order ### Request Parameters - `model` (string, optional): Model ID to use. Mutually exclusive with `models`. Providing both `model` and `models` returns 400. Examples: `gpt-4o`, `claude-3-5-sonnet`, `llama-3.1-70b` - `models` (array[string], optional): Auriko extension: Multi-model routing. Mutually exclusive with `model`. Providing both returns 400. Allows routing across multiple models. Use with `routing.mode`: - `pool` (default): Route to best provider across all models - `fallback`: Try models in order until one succeeds See `routing_metadata.fallback_chain` in the response for the sequence of providers attempted when using `fallback` mode. - `messages` (array[Message], required): The messages to generate a completion for - `temperature` (number, optional): Sampling temperature (0-2) - `top_p` (number, optional): Nucleus sampling parameter - `max_tokens` (integer, optional): Maximum tokens to generate (legacy, use max_completion_tokens) - `max_completion_tokens` (integer, optional): Maximum tokens to generate (preferred for o1/o3 models) - `stop` (string | array, optional): Stop sequences - `presence_penalty` (number, optional): Presence penalty (-2 to 2) - `frequency_penalty` (number, optional): Frequency penalty (-2 to 2) - `logit_bias` (object, optional): Token logit bias - `seed` (integer, optional): Random seed for reproducibility - `tools` (array[Tool], optional): Tools the model can call - `tool_choice` (ToolChoice, optional): - `parallel_tool_calls` (boolean, optional): Allow parallel tool calls - `functions` (array[FunctionDefinition], optional): **Deprecated.** Deprecated. Use `tools` instead. Auto-converted. - `function_call` (string | object, optional): **Deprecated.** Deprecated. Use `tool_choice` instead. Auto-converted. - `response_format` (ResponseFormat, optional): - `type` (string, required): Response format type: - `text`: Plain text response (default) - `json_object`: JSON mode - model outputs valid JSON - `json_schema`: Structured output - model follows provided schema Values: `text`, `json_object`, `json_schema`. - `json_schema` (object, optional): Required when type is `json_schema` - `stream` (boolean, optional): Enable streaming responses Default: `False`. - `stream_options` (StreamOptions, optional): - `include_usage` (boolean, optional): Include token usage in final streaming chunk - `user` (string, optional): User identifier for abuse detection - `n` (integer, optional): Number of completions to generate Default: `1`. - `logprobs` (boolean, optional): Return log probabilities - `top_logprobs` (integer, optional): Number of top logprobs to return - `routing` (RoutingOptions, optional): Auriko routing configuration (15 fields). Controls how Auriko selects providers for your request. All fields are optional. Setting a field to `null` is equivalent to omitting it. - `extensions` (Extensions, optional): Auriko extensions for normalized features and provider-specific passthrough. ## Normalized Features These are translated to provider-native format automatically: - `thinking`: Enable thinking/reasoning mode ## Provider Passthrough Pass provider-specific parameters directly: - `anthropic`: Anthropic-specific parameters - `openai`: OpenAI-specific parameters - `google`: Google/Gemini-specific parameters - `deepseek`: DeepSeek-specific parameters Passthrough parameters are forwarded as-is to the target provider. - `auriko_metadata` (RequestMetadata, optional): Optional request metadata for tracking and observability. Attached via the `auriko_metadata` field on chat completion requests. Field name uses `auriko_metadata` (not `metadata`) to avoid collision with OpenAI's native metadata field. - `tags` (array[string], optional): Tags for categorizing requests (max 100 items, each ≤50 chars) - `user_id` (string, optional): Your application's user identifier for per-user analytics - `trace_id` (string, optional): Distributed tracing identifier to correlate with your observability stack - `custom_fields` (object, optional): Arbitrary key-value pairs (max 10 keys, keys ≤50 chars, values ≤200 chars) Variants: `variant`, `variant` ### Request Examples **Basic completion**: ```json { "model": "gpt-4o", "messages": [ { "role": "user", "content": "Hello!" } ] } ``` **With routing options**: ```json { "model": "claude-3-5-sonnet", "messages": [ { "role": "user", "content": "Explain quantum computing" } ], "routing": { "optimize": "cost", "max_cost_per_1m": 5.0 } } ``` **Multi-model routing**: ```json { "models": [ "gpt-4o", "claude-3-5-sonnet" ], "messages": [ { "role": "user", "content": "Hello!" } ], "routing": { "mode": "pool", "optimize": "cost" } } ``` ### Response (200) Successful completion. For streaming (`stream: true`), responses are Server-Sent Events. Each event is a `ChatCompletionChunk`. The final chunk has `choices: []` (empty) and contains `usage` and `routing_metadata`. Stream ends with `data: [DONE]`. ### Response Properties - `id` (string, required): Unique completion identifier - `object` (string, required): - `created` (integer, required): Unix timestamp of creation - `model` (string, required): Model used for completion - `choices` (array[Choice], required): Completion choices - `usage` (Usage, optional): - `prompt_tokens` (integer, required): Input tokens used - `completion_tokens` (integer, required): Output tokens generated - `total_tokens` (integer, required): Total tokens (prompt + completion) - `prompt_tokens_details` (PromptTokensDetails, optional): Detailed breakdown of prompt tokens - `completion_tokens_details` (CompletionTokensDetails, optional): Breakdown of completion tokens. Provider-dependent: present when the upstream provider reports token-level details, absent otherwise. - `system_fingerprint` (string, optional): System fingerprint for reproducibility - `routing_metadata` (RoutingMetadata, optional): Routing decision metadata included in all responses. Provides transparency into how Auriko selected the provider. ### Error Responses - **400**: Bad request - invalid parameters - **401**: Authentication failed - **402**: Insufficient credits. - `insufficient_quota`: workspace balance too low - **404**: Model not found - **429**: Rate limit exceeded - **500**: Internal server error - **502**: Upstream provider failure (Bad Gateway). All non-timeout 5xx errors from upstream providers are normalized to 502. Provider errors may also surface as: - 400: Invalid request passthrough (code: `invalid_request`) - 401: BYOK key auth failure (code: `provider_auth_error`) - 429: Provider rate limit (code: `rate_limit_exceeded`) These use their respective status codes with the same `ErrorResponse` body format. - **503**: Service unavailable — transient issue. Possible causes: - All providers for the model are rate-limited or unhealthy (`no_providers_available`) - Transient infrastructure issue such as KV outage (`service_unavailable`) Note: If the model doesn't support a requested capability (e.g., reasoning), the response is 400 `capability_mismatch`, not 503. If routing constraints excluded all providers, the response is 400 `routing_constraint_unsatisfiable`. - **504**: Upstream provider timed out. The client may retry with a longer timeout. ## Endpoint: GET /v1/models **List available models** Lists all models available through Auriko, including provider availability and pricing information. The response includes Auriko-specific extensions: - `providers[]`: Available providers with pricing - `catalog_version`: Version of the model catalog - `catalog_age_seconds`: Age of the catalog data ### Response (200) List of available models ### Response Properties - `object` (string, required): - `data` (array[Model], required): - `catalog_version` (string, optional): Version of the model catalog - `catalog_age_seconds` (number, optional): Age of catalog in seconds ### Error Responses - **401**: Authentication failed - **429**: Rate limit exceeded - **500**: Internal server error ## Endpoint: GET /v1/registry/providers **List available providers** Returns all LLM providers available on the Auriko platform. ### Response (200) List of providers ### Response Properties - `providers` (array[ProviderResponse], required): - `count` (integer, required): Total number of providers ### Error Responses - **429**: Rate limit exceeded - **500**: Internal server error - **502**: API gateway is unavailable. The edge worker could not reach the backend gateway. - **503**: Service unavailable — transient issue. Possible causes: - All providers for the model are rate-limited or unhealthy (`no_providers_available`) - Transient infrastructure issue such as KV outage (`service_unavailable`) Note: If the model doesn't support a requested capability (e.g., reasoning), the response is 400 `capability_mismatch`, not 503. If routing constraints excluded all providers, the response is 400 `routing_constraint_unsatisfiable`. ## Endpoint: GET /v1/registry/models **List canonical models** Returns all canonical models in the Auriko registry with provider availability and metadata. ### Response (200) List of canonical models ### Response Properties - `models` (array[CanonicalModelResponse], required): - `count` (integer, required): Total number of models - `source` (string, required): Data source (supabase or cache) ### Error Responses - **429**: Rate limit exceeded - **500**: Internal server error - **502**: API gateway is unavailable. The edge worker could not reach the backend gateway. - **503**: Service unavailable — transient issue. Possible causes: - All providers for the model are rate-limited or unhealthy (`no_providers_available`) - Transient infrastructure issue such as KV outage (`service_unavailable`) Note: If the model doesn't support a requested capability (e.g., reasoning), the response is 400 `capability_mismatch`, not 503. If routing constraints excluded all providers, the response is 400 `routing_constraint_unsatisfiable`. ## Endpoint: GET /v1/directory/models **Full model directory** Returns the complete model directory with detailed provider information including context windows, capabilities, pricing tiers, and modalities. This is the richest model metadata endpoint. ### Response (200) Full model directory ### Response Properties - `models` (object, required): Map of canonical model ID to model entry - `generated_at` (string, required): ISO 8601 timestamp when the directory was generated ### Error Responses - **429**: Rate limit exceeded - **500**: Internal server error - **502**: API gateway is unavailable. The edge worker could not reach the backend gateway. - **503**: Service unavailable — transient issue. Possible causes: - All providers for the model are rate-limited or unhealthy (`no_providers_available`) - Transient infrastructure issue such as KV outage (`service_unavailable`) Note: If the model doesn't support a requested capability (e.g., reasoning), the response is 400 `capability_mismatch`, not 503. If routing constraints excluded all providers, the response is 400 `routing_constraint_unsatisfiable`. ## Endpoint: POST /v1/workspaces **Create a workspace** Creates a new workspace. The authenticated user becomes the owner. ### Request Parameters - `name` (string, required): Workspace display name - `slug` (string, optional): URL-friendly workspace slug. Auto-generated from name if omitted. Must be lowercase alphanumeric with hyphens, cannot start or end with a hyphen. ### Error Responses - **400**: Bad request - invalid parameters - **401**: Authentication failed - **500**: Internal server error - **502**: API gateway is unavailable. The edge worker could not reach the backend gateway. ## Endpoint: GET /v1/workspaces **List workspaces** Lists all workspaces the authenticated user is a member of. When authenticated with an API key, returns only the key's workspace. ### Response (200) List of workspaces ### Response Properties - `workspaces` (array[WorkspaceResponse], required): - `count` (integer, required): Total number of workspaces ### Error Responses - **401**: Authentication failed - **500**: Internal server error - **502**: API gateway is unavailable. The edge worker could not reach the backend gateway. ## Endpoint: GET /v1/workspaces/{workspace_id} **Get workspace details** Returns details for a workspace the authenticated user is a member of. API keys can only access their own workspace. ### Response (200) Workspace details ### Response Properties - `id` (string, required): Workspace identifier - `name` (string, required): Workspace display name - `slug` (string, required): URL-friendly workspace slug - `tier` (string, required): Current billing tier - `billing_email` (['string', 'null'], optional): Billing contact email - `created_at` (string, required): When the workspace was created - `updated_at` (string, required): When the workspace was last updated - `member_count` (['integer', 'null'], optional): Number of members in the workspace - `user_role` (['string', 'null'], optional): Current user's role in this workspace (owner, admin, member) - `can_use_paid_models` (boolean, required): Whether this workspace has credits for paid models ### Error Responses - **400**: Bad request - invalid parameters - **401**: Authentication failed - **404**: Resource not found - **500**: Internal server error - **502**: API gateway is unavailable. The edge worker could not reach the backend gateway. ## Endpoint: PATCH /v1/workspaces/{workspace_id} **Update workspace** Updates workspace settings. Requires owner role. ### Request Parameters - `name` (string, optional): Updated workspace display name - `billing_email` (string, optional): Billing contact email ### Response (200) Workspace updated ### Response Properties - `id` (string, required): Workspace identifier - `name` (string, required): Workspace display name - `slug` (string, required): URL-friendly workspace slug - `tier` (string, required): Current billing tier - `billing_email` (['string', 'null'], optional): Billing contact email - `created_at` (string, required): When the workspace was created - `updated_at` (string, required): When the workspace was last updated - `member_count` (['integer', 'null'], optional): Number of members in the workspace - `user_role` (['string', 'null'], optional): Current user's role in this workspace (owner, admin, member) - `can_use_paid_models` (boolean, required): Whether this workspace has credits for paid models ### Error Responses - **400**: Bad request - invalid parameters - **401**: Authentication failed - **403**: You do not have permission to perform this action - **404**: Resource not found - **500**: Internal server error - **502**: API gateway is unavailable. The edge worker could not reach the backend gateway. ## Endpoint: GET /v1/workspaces/{workspace_id}/routing-defaults **Get routing defaults** Returns the workspace routing defaults. Any workspace member can read. ### Response (200) Current routing defaults ### Response Properties - `workspace_id` (string, required): Workspace identifier - `routing_defaults` (RoutingDefaults | null, optional): Current routing defaults, or null if none are configured. ### Error Responses - **400**: Bad request - invalid parameters - **401**: Authentication failed - **404**: Resource not found - **500**: Internal server error - **502**: API gateway is unavailable. The edge worker could not reach the backend gateway. ## Endpoint: PATCH /v1/workspaces/{workspace_id}/routing-defaults **Update routing defaults** Updates workspace routing defaults. Requires owner or admin role. **Merge semantics:** - Present fields are updated - Omitted fields retain their current value - Fields set to `null` are cleared - Empty object `{}` clears all routing defaults ### Request Parameters - `optimize` (['string', 'null'], optional): Default optimization strategy Values: `cost`, `ttft`, `speed`, `throughput`, `balanced`, `cheapest`, `None`. - `data_policy` (['string', 'null'], optional): Default data retention policy Values: `none`, `no_training`, `zdr`, `None`. - `allow_fallbacks` (['boolean', 'null'], optional): Whether to allow automatic fallbacks - `max_fallback_attempts` (['integer', 'null'], optional): Maximum number of fallback attempts - `providers` (['array', 'null'], optional): Default provider allowlist - `exclude_providers` (['array', 'null'], optional): Default provider blocklist ### Response (200) Routing defaults updated ### Response Properties - `workspace_id` (string, required): Workspace identifier - `routing_defaults` (RoutingDefaults | null, optional): Current routing defaults, or null if none are configured. ### Error Responses - **400**: Bad request - invalid parameters - **401**: Authentication failed - **403**: You do not have permission to perform this action - **404**: Resource not found - **500**: Internal server error - **502**: API gateway is unavailable. The edge worker could not reach the backend gateway. ## Endpoint: GET /v1/workspaces/{workspace_id}/budgets **List budgets** Lists all budgets for the workspace. Any workspace member can read. ### Response (200) List of budgets with current spend ### Response Properties - `budgets` (array[BudgetResponse], required): - `count` (integer, required): Total number of budgets ### Error Responses - **400**: Bad request - invalid parameters - **401**: Authentication failed - **404**: Resource not found - **500**: Internal server error - **502**: API gateway is unavailable. The edge worker could not reach the backend gateway. ## Endpoint: POST /v1/workspaces/{workspace_id}/budgets **Create a budget** Creates a budget for the workspace. Requires owner or admin role. **Scope types:** - `workspace`: Applies to all usage in the workspace - `api_key`: Applies to a specific API key (requires `scope_id`) - `byok_provider`: Applies to a BYOK provider (requires `scope_provider`) ### Request Parameters - `scope_type` (string, required): Budget scope: - `workspace`: Applies to all usage - `api_key`: Scoped to a specific API key (requires `scope_id`) - `byok_provider`: Scoped to a BYOK provider (requires `scope_provider`) Values: `workspace`, `api_key`, `byok_provider`. - `scope_id` (['string', 'null'], optional): API key ID (required when scope_type is `api_key`) - `scope_provider` (['string', 'null'], optional): Provider ID (required when scope_type is `byok_provider`) - `period` (string, required): Budget period Values: `monthly`, `weekly`, `daily`. - `limit_usd` (number, required): Budget limit in USD - `enforce` (boolean, optional): Whether to block requests when budget is exceeded Default: `True`. - `include_byok` (boolean, optional): Whether to include BYOK usage in budget tracking (workspace scope only) Default: `False`. ### Error Responses - **400**: Bad request - invalid parameters - **401**: Authentication failed - **403**: You do not have permission to perform this action - **404**: Resource not found - **500**: Internal server error - **502**: API gateway is unavailable. The edge worker could not reach the backend gateway. ## Endpoint: GET /v1/workspaces/{workspace_id}/budgets/{budget_id} **Get budget details** Returns a single budget with current spend. Any workspace member can read. ### Response (200) Budget details with current spend ### Response Properties - `id` (string, required): Budget identifier - `workspace_id` (string, required): Workspace this budget belongs to - `scope_type` (string, required): Budget scope type Values: `workspace`, `api_key`, `byok_provider`. - `scope_id` (['string', 'null'], optional): Scoped API key ID (when scope_type is api_key) - `scope_provider` (['string', 'null'], optional): Scoped provider ID (when scope_type is byok_provider) - `period` (string, required): Budget period Values: `monthly`, `weekly`, `daily`. - `limit_usd` (number, required): Budget limit in USD - `enforce` (boolean, required): Whether requests are blocked when exceeded - `include_byok` (boolean, required): Whether BYOK usage is included - `created_by` (['string', 'null'], optional): User ID who created the budget - `created_at` (string, required): When the budget was created - `updated_at` (string, required): When the budget was last updated - `spend_microdollars` (integer, required): Current period spend in microdollars - `spend_usd` (number, required): Current period spend in USD - `percent_used` (number, required): Percentage of budget used (0-100+) ### Error Responses - **400**: Bad request - invalid parameters - **401**: Authentication failed - **404**: Resource not found - **500**: Internal server error - **502**: API gateway is unavailable. The edge worker could not reach the backend gateway. ## Endpoint: PATCH /v1/workspaces/{workspace_id}/budgets/{budget_id} **Update a budget** Updates a budget. Requires owner or admin role. At least one field must be provided. ### Request Parameters - `limit_usd` (number, optional): Updated budget limit in USD - `enforce` (boolean, optional): Whether to block requests when budget is exceeded - `include_byok` (boolean, optional): Whether to include BYOK usage in budget tracking ### Response (200) Budget updated ### Response Properties - `id` (string, required): Budget identifier - `workspace_id` (string, required): Workspace this budget belongs to - `scope_type` (string, required): Budget scope type Values: `workspace`, `api_key`, `byok_provider`. - `scope_id` (['string', 'null'], optional): Scoped API key ID (when scope_type is api_key) - `scope_provider` (['string', 'null'], optional): Scoped provider ID (when scope_type is byok_provider) - `period` (string, required): Budget period Values: `monthly`, `weekly`, `daily`. - `limit_usd` (number, required): Budget limit in USD - `enforce` (boolean, required): Whether requests are blocked when exceeded - `include_byok` (boolean, required): Whether BYOK usage is included - `created_by` (['string', 'null'], optional): User ID who created the budget - `created_at` (string, required): When the budget was created - `updated_at` (string, required): When the budget was last updated - `spend_microdollars` (integer, required): Current period spend in microdollars - `spend_usd` (number, required): Current period spend in USD - `percent_used` (number, required): Percentage of budget used (0-100+) ### Error Responses - **400**: Bad request - invalid parameters - **401**: Authentication failed - **403**: You do not have permission to perform this action - **404**: Resource not found - **500**: Internal server error - **502**: API gateway is unavailable. The edge worker could not reach the backend gateway. ## Endpoint: DELETE /v1/workspaces/{workspace_id}/budgets/{budget_id} **Delete a budget** Deletes a budget. Requires owner or admin role. ### Error Responses - **400**: Bad request - invalid parameters - **401**: Authentication failed - **403**: You do not have permission to perform this action - **404**: Resource not found - **500**: Internal server error - **502**: API gateway is unavailable. The edge worker could not reach the backend gateway. ## Endpoint: POST /v1/workspaces/{workspace_id}/keys **Create an API key** Creates a new API key for the workspace. Requires owner or admin role. **Important:** The full API key is returned exactly once in the response. It is never stored or retrievable again. Save it securely. ### Request Parameters - `name` (string, optional): Display name for the API key Default: `Default Key`. - `rate_limit_rpm` (integer, optional): Custom rate limit in requests per minute (overrides tier default) - `expires_at` (string, optional): Optional expiration time for the API key ### Error Responses - **400**: Bad request - invalid parameters - **401**: Authentication failed - **403**: You do not have permission to perform this action - **404**: Resource not found - **500**: Internal server error - **502**: API gateway is unavailable. The edge worker could not reach the backend gateway. ## Endpoint: GET /v1/workspaces/{workspace_id}/keys **List API keys** Lists API keys for the workspace. Any workspace member can read. Keys are returned with prefixes only — full keys are never retrievable. ### Response (200) List of API keys ### Response Properties - `keys` (array[ApiKeyResponse], required): - `count` (integer, required): Total number of keys returned - `workspace_id` (string, required): Workspace the keys belong to ### Error Responses - **400**: Bad request - invalid parameters - **401**: Authentication failed - **404**: Resource not found - **500**: Internal server error - **502**: API gateway is unavailable. The edge worker could not reach the backend gateway. ## Endpoint: DELETE /v1/workspaces/{workspace_id}/keys/{key_id} **Revoke an API key** Revokes an API key. Owners and admins can revoke any key in the workspace. Members can revoke keys they created. ### Response (200) API key revoked ### Response Properties - `success` (boolean, required): Whether the revocation succeeded - `message` (string, required): Human-readable result message - `revoked_at` (string, required): When the key was revoked ### Error Responses - **400**: Bad request - invalid parameters - **401**: Authentication failed - **403**: You do not have permission to perform this action - **404**: Resource not found - **500**: Internal server error - **502**: API gateway is unavailable. The edge worker could not reach the backend gateway. ## Endpoint: GET /v1/workspaces/{workspace_id}/keys/{key_id}/usage **Get API key usage** Returns usage statistics for an API key. Any workspace member can read. ### Response (200) Key usage statistics ### Response Properties - `key_id` (string, required): API key identifier - `workspace_id` (string, required): Workspace identifier - `period` (string, required): Usage period (day, week, month) - `start_date` (string, required): Period start date (ISO 8601) - `end_date` (string, required): Period end date (ISO 8601) - `request_count` (integer, required): Total requests in the period - `error_count` (integer, required): Total errors in the period - `rate_limit_count` (integer, required): Rate-limited requests in the period - `tokens_input` (integer, required): Total input tokens consumed - `tokens_output` (integer, required): Total output tokens generated - `tokens_total` (integer, required): Total tokens (input + output) - `cost_usd` (number, required): Total cost in USD ### Error Responses - **400**: Bad request - invalid parameters - **401**: Authentication failed - **404**: Resource not found - **500**: Internal server error - **502**: API gateway is unavailable. The edge worker could not reach the backend gateway. ## Endpoint: GET /v1/workspaces/{workspace_id}/billing/balance **Get credit balance** Returns the workspace credit balance, tier, and billing configuration. Requires owner role. ### Response (200) Credit balance and billing details ### Response Properties - `balance_microdollars` (integer, required): Current balance in microdollars (1 USD = 1,000,000 μ$) - `balance_cents` (integer, required): Current balance in cents (computed) - `balance_dollars` (string, required): Current balance in dollars (computed, string for precision) - `lifetime_purchased_microdollars` (integer, required): Total credits ever purchased in microdollars - `lifetime_purchased_cents` (integer, required): Total credits ever purchased in cents (computed) - `lifetime_used_microdollars` (integer, required): Total credits ever consumed in microdollars - `lifetime_used_cents` (integer, required): Total credits ever consumed in cents (computed) - `auto_reload_enabled` (boolean, required): Whether auto-reload is enabled - `auto_reload_threshold_microdollars` (['integer', 'null'], optional): Balance threshold triggering auto-reload - `auto_reload_threshold_cents` (['integer', 'null'], optional): Balance threshold in cents (computed) - `auto_reload_amount_microdollars` (['integer', 'null'], optional): Target balance amount for auto-reload - `auto_reload_amount_cents` (['integer', 'null'], optional): Target balance amount in cents (computed) - `current_tier` (string, required): Current billing tier - `platform_fee_rate` (['string', 'null'], optional): Current platform fee rate as decimal string - `tier_volume_usd` (['string', 'null'], optional): Lifetime volume in USD for tier calculation - `next_tier_threshold_usd` (['string', 'null'], optional): Volume needed to reach next tier in USD - `byok_monthly_cap` (['integer', 'null'], optional): Monthly BYOK request cap (null if unlimited) - `byok_monthly_remaining` (['integer', 'null'], optional): Remaining BYOK requests this month - `has_payment_method` (boolean, required): Whether a payment method is on file - `payment_method_last4` (['string', 'null'], optional): Last 4 digits of payment method - `payment_method_brand` (['string', 'null'], optional): Payment method brand (visa, mastercard, etc.) ### Error Responses - **400**: Bad request - invalid parameters - **401**: Authentication failed - **403**: You do not have permission to perform this action - **404**: Resource not found - **500**: Internal server error - **502**: API gateway is unavailable. The edge worker could not reach the backend gateway. ## Endpoint: GET /v1/workspaces/providers **List supported BYOK providers** Returns the list of providers that support bring-your-own-key (BYOK). ### Response (200) List of supported BYOK providers ### Response Properties - `providers` (array[SupportedProviderItem], required): ### Error Responses - **500**: Internal server error - **502**: API gateway is unavailable. The edge worker could not reach the backend gateway. ## Endpoint: POST /v1/workspaces/{workspace_id}/provider-keys **Add provider API key** Adds a provider API key (BYOK) to the workspace. ### Request Parameters - `provider` (string, required): Provider identifier (e.g., openai, anthropic) - `api_key` (string, required): The API key to store - `label` (['string', 'null'], optional): Friendly name for the key - `is_default` (boolean, optional): Whether this is the default key for the provider Default: `True`. - `validate_before_save` (boolean, optional): Whether to validate the key before saving Default: `True`. - `is_enterprise` (boolean, optional): User asserts this is an enterprise key Default: `False`. - `selected_tier` (['string', 'null'], optional): User-selected tier for providers requiring manual selection ### Error Responses - **400**: Bad request - invalid parameters - **401**: Authentication failed - **403**: You do not have permission to perform this action - **500**: Internal server error - **502**: API gateway is unavailable. The edge worker could not reach the backend gateway. ## Endpoint: GET /v1/workspaces/{workspace_id}/provider-keys **List provider API keys** Lists all provider API keys (BYOK) for the workspace. ### Response (200) List of provider keys ### Response Properties - `keys` (array[ProviderKeyResponse], required): - `count` (integer, required): ### Error Responses - **400**: Bad request - invalid parameters - **401**: Authentication failed - **500**: Internal server error - **502**: API gateway is unavailable. The edge worker could not reach the backend gateway. ## Endpoint: DELETE /v1/workspaces/{workspace_id}/provider-keys/{key_id} **Delete provider API key** Deletes a provider API key from the workspace. ### Response (200) Provider key deleted ### Response Properties - `success` (boolean, required): - `message` (string, required): ### Error Responses - **400**: Bad request - invalid parameters - **401**: Authentication failed - **403**: You do not have permission to perform this action - **404**: Resource not found - **500**: Internal server error - **502**: API gateway is unavailable. The edge worker could not reach the backend gateway. ## Endpoint: POST /v1/workspaces/{workspace_id}/provider-keys/{key_id}/validate **Re-validate provider API key** Re-validates a provider API key to check if it is still active. ### Response (200) Validation result ### Response Properties - `status` (string, required): Values: `valid`, `invalid`, `error`. - `message` (string, required): - `provider` (string, required): ### Error Responses - **400**: Bad request - invalid parameters - **401**: Authentication failed - **404**: Resource not found - **500**: Internal server error - **502**: API gateway is unavailable. The edge worker could not reach the backend gateway. ## Endpoint: POST /v1/workspaces/{workspace_id}/provider-keys/{key_id}/set-default **Set provider key as default** Sets the specified provider key as the default for its provider. ### Response (200) Provider key set as default ### Response Properties - `id` (string, required): - `workspace_id` (string, required): - `provider` (string, required): - `provider_name` (string, required): - `name` (string, required): - `key_prefix` (string, required): - `is_default` (boolean, required): - `validation_status` (string, required): Values: `pending`, `valid`, `invalid`, `error`. - `last_validated_at` (['string', 'null'], optional): - `validation_error` (['string', 'null'], optional): - `detected_tier` (['string', 'null'], optional): - `tier_source` (['string', 'null'], optional): Values: `auto_detected`, `user_specified`, `fallback`, `None`. - `tier_detected_at` (['string', 'null'], optional): - `created_at` (string, required): - `updated_at` (string, required): ### Error Responses - **400**: Bad request - invalid parameters - **401**: Authentication failed - **404**: Resource not found - **500**: Internal server error - **502**: API gateway is unavailable. The edge worker could not reach the backend gateway. ## Endpoint: PATCH /v1/workspaces/{workspace_id}/provider-keys/{key_id}/tier **Update provider key tier** Updates the tier associated with a provider key. ### Request Parameters - `tier` (string, required): The tier name to set ### Response (200) Provider key tier updated ### Response Properties - `id` (string, required): - `workspace_id` (string, required): - `provider` (string, required): - `provider_name` (string, required): - `name` (string, required): - `key_prefix` (string, required): - `is_default` (boolean, required): - `validation_status` (string, required): Values: `pending`, `valid`, `invalid`, `error`. - `last_validated_at` (['string', 'null'], optional): - `validation_error` (['string', 'null'], optional): - `detected_tier` (['string', 'null'], optional): - `tier_source` (['string', 'null'], optional): Values: `auto_detected`, `user_specified`, `fallback`, `None`. - `tier_detected_at` (['string', 'null'], optional): - `created_at` (string, required): - `updated_at` (string, required): ### Error Responses - **400**: Bad request - invalid parameters - **401**: Authentication failed - **404**: Resource not found - **500**: Internal server error - **502**: API gateway is unavailable. The edge worker could not reach the backend gateway. ## Endpoint: GET /v1/me **Get API key identity** Returns the identity associated with your API key. Use this to discover your workspace_id for management API calls. ### Response (200) API key identity ### Response Properties - `object` (string, required): Values: `api_key_identity`. - `user_id` (['string', 'null'], required): - `workspace_id` (['string', 'null'], required): - `tier` (['string', 'null'], required): - `rate_limit_rpm` (['integer', 'null'], required): ### Error Responses - **401**: Authentication failed - **429**: Rate limit exceeded ## Schema: Message Variants (discriminator: `role`): `SystemMessage`, `UserMessage`, `AssistantMessage`, `ToolMessage` **SystemMessage**: - `role` (string, required): - `content` (string, required): - `name` (string, optional): **UserMessage**: - `role` (string, required): - `content` (string | array, required): - `name` (string, optional): **AssistantMessage**: - `role` (string, required): - `content` (['string', 'null'], optional): - `name` (string, optional): - `tool_calls` (array[ToolCall], optional): **ToolMessage**: - `role` (string, required): - `content` (string, required): - `tool_call_id` (string, required): ## Schema: RoutingOptions Auriko routing configuration (15 fields). Controls how Auriko selects providers for your request. All fields are optional. Setting a field to `null` is equivalent to omitting it. - `optimize` (['string', 'null'], optional): Optimization strategy: - `cost`: Minimize cost per token - `cheapest`: Absolute lowest cost (ignores other dimensions) - `ttft`: Minimize time to first token - `speed`: Minimize total latency + maximize throughput - `throughput`: Maximize tokens per second - `balanced`: Weighted combination (default) Values: `cost`, `ttft`, `speed`, `throughput`, `balanced`, `cheapest`, `None`. Default: `balanced`. - `weights` (['object', 'null'], optional): Custom scoring weights for routing optimization. When provided, overrides the `optimize` preset coefficients. All values must be non-negative. At least one dimension must be > 0. Unspecified dimensions default to 0. Server normalizes to sum to 1.0. - `cost` (['number', 'null'], optional): Weight for cost minimization. - `ttft` (['number', 'null'], optional): Weight for time-to-first-token optimization. - `throughput` (['number', 'null'], optional): Weight for tokens-per-second optimization. - `reliability` (['number', 'null'], optional): Weight for provider reliability. - `max_cost_per_1m` (['number', 'null'], optional): Maximum cost per 1M tokens (USD) - `max_ttft_ms` (['integer', 'null'], optional): Maximum time to first token (milliseconds) - `min_throughput_tps` (['number', 'null'], optional): Minimum throughput (tokens per second) - `min_success_rate` (['number', 'null'], optional): Minimum provider success rate (0-1) - `providers` (['array', 'null'], optional): Provider allowlist. Only consider these providers. Examples: `["openai", "anthropic", "fireworks_ai"]` - `exclude_providers` (['array', 'null'], optional): Provider blocklist. Exclude these providers. Examples: `["together_ai"]` - `prefer` (['string', 'null'], optional): Preference boost for this provider. Provider will be selected if it meets constraints. - `mode` (['string', 'null'], optional): How to interpret `models[]` array: - `pool` (default): Route to best provider across all models - `fallback`: Try models in order until one succeeds Values: `pool`, `fallback`, `None`. Default: `pool`. - `allow_fallbacks` (['boolean', 'null'], optional): Enable automatic fallback to alternative providers on failure Default: `True`. - `max_fallback_attempts` (['integer', 'null'], optional): Maximum fallback attempts before giving up Default: `3`. - `data_policy` (['string', 'null'], optional): Data retention policy requirement: - `none`: No restrictions (default) - `no_training`: Provider must not use data for training - `zdr`: Zero Data Retention (strictest) Values: `none`, `no_training`, `zdr`, `None`. Default: `none`. - `only_byok` (['boolean', 'null'], optional): Only use Bring Your Own Key (BYOK) providers. Mutually exclusive with `only_platform`. Returns 400 if both are set. Default: `False`. - `only_platform` (['boolean', 'null'], optional): Only use platform-managed API keys. Mutually exclusive with `only_byok`. Returns 400 if both are set. Default: `False`. ## Schema: Extensions Auriko extensions for normalized features and provider-specific passthrough. ## Normalized Features These are translated to provider-native format automatically: - `thinking`: Enable thinking/reasoning mode ## Provider Passthrough Pass provider-specific parameters directly: - `anthropic`: Anthropic-specific parameters - `openai`: OpenAI-specific parameters - `google`: Google/Gemini-specific parameters - `deepseek`: DeepSeek-specific parameters Passthrough parameters are forwarded as-is to the target provider. - `thinking` (ThinkingConfig, optional): Normalized thinking/reasoning configuration. Translates to provider-native format: - Anthropic: thinking block configuration - OpenAI o1/o3: reasoning_effort based on budget - DeepSeek R1: Native support - Gemini 2.0 Flash Thinking: thinking_config - `anthropic` (object, optional): Anthropic-specific parameters (passed through) - `openai` (object, optional): OpenAI-specific parameters (passed through) - `google` (object, optional): Google/Gemini-specific parameters (passed through) - `deepseek` (object, optional): DeepSeek-specific parameters (passed through) ## Schema: ThinkingConfig Normalized thinking/reasoning configuration. Translates to provider-native format: - Anthropic: thinking block configuration - OpenAI o1/o3: reasoning_effort based on budget - DeepSeek R1: Native support - Gemini 2.0 Flash Thinking: thinking_config - `enabled` (boolean, optional): Enable thinking/reasoning mode - `budget_tokens` (integer, optional): Token budget for thinking (provider-dependent minimum) ## Schema: RoutingMetadata Routing decision metadata included in all responses. Provides transparency into how Auriko selected the provider. - `provider` (string, required): Provider name (e.g., "fireworks_ai", "anthropic") - `provider_model_id` (string, required): Provider's model ID - `tier` (string, optional): Pricing tier if applicable (e.g., "flex", "standard") - `model_canonical` (string, required): Canonical model ID requested - `routing_strategy` (string, required): Strategy used for routing. Known values: `cost`, `ttft`, `speed`, `throughput`, `balanced`, `cheapest`, `custom`. `custom` is returned when explicit `routing.weights` are provided. Additional strategies may be added in future versions. - `candidates_total` (integer, required): Total catalog offerings before filtering (offering-level) - `candidates_viable` (integer, required): Source-level candidates after filtering and dedup (offering x key source pairs entering ranking) - `routing_decision_ms` (number, required): Time spent in routing decision (ms) - `ttft_ms` (number, optional): Time to first token (streaming only) - `total_latency_ms` (number, required): Total request latency (ms) - `cost` (CostInfo, optional): Cost breakdown for the request - `fallback_chain` (array[FallbackChainEntry], optional): Providers attempted during fallback, in order. Only includes providers that were actually called. Providers skipped due to cooldown, missing keys, or policy constraints are NOT included. Absent if no fallback was needed (primary succeeded). - `warnings` (array[string], optional): Warnings about ignored/unsupported configuration ## Schema: CostInfo Cost breakdown for the request - `input_tokens` (integer, required): Input tokens used - `output_tokens` (integer, required): Output tokens generated - `provider_cost_usd` (number, required): Cost at provider rates (USD) - `billable_cost_usd` (number, required): Billable cost including margin (USD) ## Error Response Details - **BadRequest**: Bad request - invalid parameters — example: "Missing required parameter: 'model'." - **Unauthorized**: Authentication failed — code: `invalid_api_key`, message: "Invalid API key or unauthorized access." - **InsufficientCredits**: Insufficient credits. - `insufficient_quota`: workspace balance too low — example: "Insufficient credits to complete this request. Please add credits to your account." - **ModelNotFound**: Model not found — code: `model_not_found`, message: "Model 'unknown-model' not found." - **RateLimited**: Rate limit exceeded — code: `rate_limit_exceeded`, message: "Rate limit exceeded. Please retry after 60 seconds." - **InternalError**: Internal server error — code: `internal_error`, message: "An internal server error occurred." - **ServiceUnavailable**: Service unavailable — transient issue. Possible causes: - All providers for the model are rate-limited or unhealthy (`no_providers_available`) - Transient infrastructure issue such as KV outage (`service_unavailable`) Note: If the model doesn't support a requested capability (e.g., reasoning), the response is 400 `capability_mismatch`, not 503. If routing constraints excluded all providers, the response is 400 `routing_constraint_unsatisfiable`. — example: "No providers available for model 'gpt-4o'." - **ProviderError**: Upstream provider failure (Bad Gateway). All non-timeout 5xx errors from upstream providers are normalized to 502. Provider errors may also surface as: - 400: Invalid request passthrough (code: `invalid_request`) - 401: BYOK key auth failure (code: `provider_auth_error`) - 429: Provider rate limit (code: `rate_limit_exceeded`) These use their respective status codes with the same `ErrorResponse` body format. — code: `provider_error`, message: "All providers failed for model gpt-4o (attempted: openai, azure). Last error: Bad Gateway" - **ProviderTimeout**: Upstream provider timed out. The client may retry with a longer timeout. — code: `provider_error`, message: "All providers failed for model gpt-4o (attempted: openai). Last error: Gateway Timeout" - **Forbidden**: You do not have permission to perform this action — code: `forbidden`, message: "You do not have permission to perform this action." - **NotFound**: Resource not found — code: `not_found`, message: "The requested resource was not found." - **GatewayUnavailable**: API gateway is unavailable. The edge worker could not reach the backend gateway. — code: `gateway_unavailable`, message: "API gateway is temporarily unavailable. Please retry." === # Auriko Python SDK Reference ## Page: Python SDK The `auriko` Python package provides an OpenAI-compatible client for the Auriko API. Complete API reference with all types, parameters, and examples --- ## Page: Python SDK > Section: Installation ```bash pip install auriko ``` Requires Python 3.10 or later. --- ## Page: Python SDK > Section: Get started ```python from auriko import Client client = Client() # reads AURIKO_API_KEY from environment response = client.chat.completions.create( model="gpt-5.4", messages=[{"role": "user", "content": "Hello!"}] ) print(response.choices[0].message.content) ``` --- ## Page: Python SDK > Section: Configure ### API Key ```python import os # Option 1: Auto-detect from AURIKO_API_KEY env var (recommended) client = Client() # Option 2: Pass explicitly client = Client(api_key=os.environ["AURIKO_API_KEY"]) ``` ### Base URL ```python # Default: https://api.auriko.ai/v1 # Override for self-hosted or proxy setups: client = Client(base_url="https://your-proxy.example.com/v1") ``` ### Timeout ```python client = Client(timeout=60.0) # seconds ``` ### Retries ```python client = Client(max_retries=3) # default is 2 ``` --- ## Page: Python SDK > Section: Create chat completions ### Basic request Send a chat completion request: ```python response = client.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is 2+2?"} ] ) print(response.choices[0].message.content) ``` ### With routing options ```python response = client.chat.completions.create( model="gpt-5.4", messages=[{"role": "user", "content": "Hello!"}], routing={ "optimize": "cost", "max_ttft_ms": 200, } ) # Access routing metadata print(f"Provider: {response.routing_metadata.provider}") if response.routing_metadata.cost: print(f"Cost: ${response.routing_metadata.cost.billable_cost_usd:.6f}") ``` You can also pass a `RoutingOptions` object for IDE autocomplete and validation: ```python from auriko.route_types import RoutingOptions, Optimize response = client.chat.completions.create( model="gpt-5.4", messages=[{"role": "user", "content": "Hello!"}], routing=RoutingOptions(optimize=Optimize.COST, max_ttft_ms=200) ) ``` **All routing fields:** | Field | Type | Description | |-------|------|-------------| | `optimize` | `Optimize` | Strategy: `"cost"`, `"speed"`, `"ttft"`, `"throughput"`, `"balanced"`, `"cheapest"` | | `weights` | `dict[str, float]` | Custom scoring weights: `cost`, `ttft`, `throughput`, `reliability`. Overrides preset. | | `max_cost_per_1m` | `float` | Max cost per 1M tokens | | `max_ttft_ms` | `int` | Max time to first token (ms) | | `min_throughput_tps` | `float` | Min tokens per second | | `min_success_rate` | `float` | Min provider success rate (0.0–1.0) | | `providers` | `list[str]` | Allowlist of providers | | `exclude_providers` | `list[str]` | Blocklist of providers | | `prefer` | `str` | Preferred provider (soft preference) | | `mode` | `Mode` | `"pool"` (default) or `"fallback"` | | `allow_fallbacks` | `bool` | Enable fallback on failure | | `max_fallback_attempts` | `int` | Max fallback retries | | `data_policy` | `DataPolicy` | `"none"`, `"no_training"`, `"zdr"` | | `only_byok` | `bool` | Only use BYOK providers | | `only_platform` | `bool` | Only use platform providers | See [Advanced Routing](/guides/advanced-routing) for detailed strategy guides. ### Multi-model routing Route a request across multiple models. The router picks the best option based on your routing strategy: ```python response = client.chat.completions.create( models=["gpt-4o", "claude-sonnet-4-20250514", "gemini-2.5-flash"], messages=[{"role": "user", "content": "Explain quantum computing briefly."}], routing={"optimize": "cost"} ) print(f"Model used: {response.model}") print(f"Provider: {response.routing_metadata.provider}") print(response.choices[0].message.content) ``` `model` and `models` are mutually exclusive. Specify exactly one. Passing both raises `InvalidRequestError`. ### Extended thinking Enable extended reasoning for complex tasks using the `extensions` parameter: ```python response = client.chat.completions.create( model="claude-sonnet-4-20250514", messages=[{"role": "user", "content": "Solve step by step: what is 23! / 20!?"}], extensions={"thinking": {"enabled": True, "budget_tokens": 10000}} ) # Access the reasoning output (if the model returns it) if response.choices[0].message.reasoning_content: print(f"Reasoning: {response.choices[0].message.reasoning_content}") print(f"Answer: {response.choices[0].message.content}") ``` You can also pass provider-specific parameters through `extensions`: ```python response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Hello!"}], extensions={"openai": {"logit_bias": {"1234": -100}}} ) ``` See [Extensions and Thinking](/guides/extensions-and-thinking) for provider details and streaming thinking output. ### Request metadata Attach metadata to requests for tracking and analytics: ```python response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Hello!"}], auriko_metadata={"session_id": "abc-123", "user_tier": "premium"} ) ``` The Auriko dashboard logs and displays your metadata. ### Stream responses ```python stream = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Count to 10"}], stream=True ) for chunk in stream: if chunk.choices and chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="", flush=True) ``` After consuming all chunks, access stream-level metadata: ```python print(f"\nProvider: {stream.routing_metadata.provider}") print(f"Tokens: {stream.usage.total_tokens}") print(f"Request ID: {stream.response_headers.request_id}") ``` Use a context manager for automatic cleanup: ```python with client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Count to 10"}], stream=True ) as stream: for chunk in stream: if chunk.choices and chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="", flush=True) # stream is automatically closed ``` Or close manually with `stream.close()`. Routing metadata, usage, and response headers are available only after consuming all chunks. See [Streaming Guide](/guides/streaming) for full patterns including tool call streaming. ### Tool calling ```python tools = [ { "type": "function", "function": { "name": "get_weather", "description": "Get weather for a city", "parameters": { "type": "object", "properties": { "city": {"type": "string"} }, "required": ["city"] } } } ] response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "What's the weather in Paris?"}], tools=tools ) if response.choices[0].message.tool_calls: tool_call = response.choices[0].message.tool_calls[0] print(f"Function: {tool_call.function.name}") print(f"Arguments: {tool_call.function.arguments}") ``` See [Tool Calling Guide](/guides/tool-calling) for multi-turn tool conversations. --- ## Page: Python SDK > Section: Read response headers Every response and error includes a `response_headers` object with typed accessors: ```python response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Hello!"}] ) response.response_headers.request_id # str | None response.response_headers.rate_limit_remaining # int | None response.response_headers.rate_limit_limit # int | None response.response_headers.rate_limit_reset # str | None response.response_headers.credits_balance_microdollars # int | None response.response_headers.provider_used # str | None response.response_headers.routing_strategy # str | None response.response_headers.get("x-custom-header") # generic lookup ``` | Property | Header | Type | |----------|--------|------| | `request_id` | `x-request-id` | `str \| None` | | `rate_limit_remaining` | `x-ratelimit-remaining-requests` | `int \| None` | | `rate_limit_limit` | `x-ratelimit-limit-requests` | `int \| None` | | `rate_limit_reset` | `x-ratelimit-reset-requests` | `str \| None` | | `credits_balance_microdollars` | `x-credits-balance-microdollars` | `int \| None` | | `provider_used` | `x-provider-used` | `str \| None` | | `routing_strategy` | `x-routing-strategy` | `str \| None` | Error objects also carry `response_headers`. Use `e.response_headers.request_id` when filing support tickets to correlate with server logs. See the [Python SDK Reference](/sdk/python-reference#response-headers) for the complete `ResponseHeaders` API. --- ## Page: Python SDK > Section: Read token usage The `Usage` object on every response carries optional detail breakdowns: ```python response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Hello!"}] ) usage = response.usage # Prompt token breakdown if usage.prompt_tokens_details: print(f"Cached: {usage.prompt_tokens_details.cached_tokens}") print(f"Text: {usage.prompt_tokens_details.text_tokens}") print(f"Image: {usage.prompt_tokens_details.image_tokens}") print(f"Audio: {usage.prompt_tokens_details.audio_tokens}") # Completion token breakdown if usage.completion_tokens_details: print(f"Reasoning: {usage.completion_tokens_details.reasoning_tokens}") print(f"Text: {usage.completion_tokens_details.text_tokens}") ``` | Field | Sub-fields | Type | |-------|-----------|------| | `prompt_tokens_details` | `cached_tokens`, `text_tokens`, `image_tokens`, `audio_tokens` | `Optional[int]` each | | `completion_tokens_details` | `reasoning_tokens`, `text_tokens`, `image_tokens`, `audio_tokens` | `Optional[int]` each | Availability depends on the provider. `completion_tokens_details.reasoning_tokens` is present for OpenAI o-series, DeepSeek, xAI, and Google Gemini. It's `None` for providers that don't report reasoning token counts (Anthropic, Moonshot, Fireworks). See [Check reasoning token availability](/guides/extensions-and-thinking#check-reasoning-token-availability) for the full breakdown. --- ## Page: Python SDK > Section: Handle errors Catch typed exceptions: ```python from auriko import ( Client, AurikoAPIError, AuthenticationError, RateLimitError, BudgetExceededError, ModelNotFoundError, ProviderError, # Also available: InvalidRequestError, InsufficientCreditsError, # InternalError, ProviderAuthError, ServiceUnavailableError ) client = Client() try: response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Hello!"}] ) except AuthenticationError as e: print(f"Check your API key: {e}") except RateLimitError as e: print(f"Rate limited: {e}") except BudgetExceededError as e: print(f"Budget exceeded: {e}") except ModelNotFoundError as e: print(f"Model not found: {e}") except ProviderError as e: print(f"Provider error: {e}") except AurikoAPIError as e: print(f"API error ({e.status_code}): {e}") ``` See [Error Handling Guide](/guides/error-handling) for retry patterns and `map_openai_error()`. --- ## Page: Python SDK > Section: Use management APIs Query workspace, budget, and model information: ```python # Identity (discover your workspace) identity = client.me.get() print(f"Workspace: {identity.workspace_id}") # Workspaces workspaces = client.workspaces.list() workspace = client.workspaces.get("ws-123") # Budgets budgets = client.budgets.list("ws-123") budget = client.budgets.get("ws-123", "budget-456") # Models registry = client.models.list_registry() directory = client.models.list_directory() providers = client.models.list_providers() ``` ### Model listing choices | Method | Returns | Use when | |--------|---------|----------| | `list_registry()` | Flat list: `id`, `family`, `display_name` | You need a quick model ID lookup | | `list_directory()` | Rich detail: provider entries, context windows, capabilities, pricing tiers | You need to compare providers or check capabilities | | `list_providers()` | Provider catalog: display name, description, data policy | You need to see available providers | See the [Python SDK Reference](/sdk/python-reference) for the complete API. --- ## Page: Python SDK > Section: Use async client Use the async client for non-blocking requests: ```python from auriko import AsyncClient async def main(): client = AsyncClient() response = await client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Hello!"}] ) print(response.choices[0].message.content) import asyncio asyncio.run(main()) ``` ### Async streaming Stream responses asynchronously: ```python from auriko import AsyncClient async def stream_response(): client = AsyncClient() stream = await client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Count to 10"}], stream=True ) async for chunk in stream: if chunk.choices and chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="", flush=True) ``` ### Async context manager Use `async with` for automatic connection cleanup: ```python from auriko import AsyncClient async def main(): async with AsyncClient() as client: response = await client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Hello!"}] ) print(response.choices[0].message.content) # client.close() called automatically ``` Or close explicitly: `await client.close()` --- ## Page: Python SDK > Section: Use context managers Use a context manager for automatic cleanup: ```python with Client() as client: response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Hello!"}] ) print(response.choices[0].message.content) ``` --- ## Page: Python SDK > Section: SDK scope The Auriko SDK covers: inference (chat completions with routing), read-only management (workspaces, budgets, identity), and model discovery. For full platform operations (workspace creation, budget management, API key rotation), use the [REST API](/api-reference/overview) directly. --- ## Page: Python SDK > Section: Use type hints The SDK provides typed responses, errors, and routing configuration. Use your IDE's autocomplete for the best experience: ```python from auriko import Client from auriko.models.chat import ChatCompletion, ChatCompletionChunk client = Client() response: ChatCompletion = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Hello!"}] ) ``` === # Auriko TypeScript SDK Reference ## Page: TypeScript SDK The `@auriko/sdk` package provides a typed TypeScript client for the Auriko API. Complete API reference with all types, parameters, and examples --- ## Page: TypeScript SDK > Section: Installation ```bash npm install @auriko/sdk # or yarn add @auriko/sdk # or pnpm add @auriko/sdk ``` --- ## Page: TypeScript SDK > Section: Get started ```typescript import { Client } from "@auriko/sdk"; const client = new Client(); // reads AURIKO_API_KEY from environment const response = await client.chat.completions.create({ model: "gpt-5.4", messages: [{ role: "user", content: "Hello!" }], }); console.log(response.choices[0].message.content); ``` --- ## Page: TypeScript SDK > Section: Configure ### API Key ```typescript // Option 1: Auto-detect from AURIKO_API_KEY env var (recommended) const client = new Client(); // Option 2: Pass explicitly const client = new Client({ apiKey: process.env.AURIKO_API_KEY, }); ``` ### Base URL ```typescript // Default: https://api.auriko.ai/v1 // Override for self-hosted or proxy setups: const client = new Client({ baseUrl: "https://your-proxy.example.com/v1", }); ``` ### Timeout ```typescript const client = new Client({ timeout: 60000, // milliseconds }); ``` ### Retries ```typescript const client = new Client({ maxRetries: 3, // default is 2 }); ``` --- ## Page: TypeScript SDK > Section: Create chat completions ### Basic request Send a chat completion request: ```typescript const response = await client.chat.completions.create({ model: "gpt-4o", messages: [ { role: "system", content: "You are a helpful assistant." }, { role: "user", content: "What is 2+2?" }, ], }); console.log(response.choices[0].message.content); ``` ### With routing options ```typescript import { Optimize } from "@auriko/sdk"; const response = await client.chat.completions.create({ model: "gpt-5.4", messages: [{ role: "user", content: "Hello!" }], routing: { optimize: "cost", max_ttft_ms: 200, }, }); // Access routing metadata console.log(`Provider: ${response.routing_metadata?.provider}`); if (response.routing_metadata?.cost) { console.log(`Cost: $${response.routing_metadata.cost.billable_cost_usd}`); } ``` You can also use the `RoutingOptions` type with enum constants for IDE autocomplete: ```typescript import { Optimize } from "@auriko/sdk"; import type { RoutingOptions } from "@auriko/sdk"; const routing: RoutingOptions = { optimize: Optimize.COST, max_ttft_ms: 200, }; ``` **All routing fields:** | Field | Type | Description | |-------|------|-------------| | `optimize` | `Optimize` | Strategy: `"cost"`, `"speed"`, `"ttft"`, `"throughput"`, `"balanced"`, `"cheapest"` | | `weights` | `RoutingWeights` | Custom scoring weights: `cost`, `ttft`, `throughput`, `reliability`. Overrides preset. | | `max_cost_per_1m` | `number` | Max cost per 1M tokens | | `max_ttft_ms` | `number` | Max time to first token (ms) | | `min_throughput_tps` | `number` | Min tokens per second | | `min_success_rate` | `number` | Min provider success rate (0.0–1.0) | | `providers` | `string[]` | Allowlist of providers | | `exclude_providers` | `string[]` | Blocklist of providers | | `prefer` | `string` | Preferred provider (soft preference) | | `mode` | `Mode` | `"pool"` (default) or `"fallback"` | | `allow_fallbacks` | `boolean` | Enable fallback on failure | | `max_fallback_attempts` | `number` | Max fallback retries | | `data_policy` | `DataPolicy` | `"none"`, `"no_training"`, `"zdr"` | | `only_byok` | `boolean` | Only use BYOK providers | | `only_platform` | `boolean` | Only use platform providers | See [Advanced Routing](/guides/advanced-routing) for detailed strategy guides. ### Multi-model routing Route a request across multiple models. The router picks the best option based on your routing strategy: ```typescript const response = await client.chat.completions.create({ models: ["gpt-4o", "claude-sonnet-4-20250514", "gemini-2.5-flash"], messages: [{ role: "user", content: "Explain quantum computing briefly." }], routing: { optimize: "cost" }, }); console.log(`Model used: ${response.model}`); console.log(`Provider: ${response.routing_metadata?.provider}`); console.log(response.choices[0].message.content); ``` `model` and `models` are mutually exclusive. Specify exactly one. Passing both raises `InvalidRequestError`. ### Extended thinking Enable extended reasoning for complex tasks using the `extensions` parameter: ```typescript const response = await client.chat.completions.create({ model: "claude-sonnet-4-20250514", messages: [{ role: "user", content: "Solve step by step: what is 23! / 20!?" }], extensions: { thinking: { enabled: true, budget_tokens: 10000 } }, }); // Access the reasoning output (if the model returns it) if (response.choices[0].message.reasoning_content) { console.log(`Reasoning: ${response.choices[0].message.reasoning_content}`); } console.log(`Answer: ${response.choices[0].message.content}`); ``` You can also pass provider-specific parameters through `extensions`: ```typescript const response = await client.chat.completions.create({ model: "gpt-4o", messages: [{ role: "user", content: "Hello!" }], extensions: { openai: { logit_bias: { "1234": -100 } } }, }); ``` See [Extensions and Thinking](/guides/extensions-and-thinking) for provider details and streaming thinking output. ### Request metadata Attach metadata to requests for tracking and analytics: ```typescript const response = await client.chat.completions.create({ model: "gpt-4o", messages: [{ role: "user", content: "Hello!" }], auriko_metadata: { session_id: "abc-123", user_tier: "premium" }, }); ``` The Auriko dashboard logs and displays your metadata. ### Stream responses ```typescript const stream = await client.chat.completions.create({ model: "gpt-4o", messages: [{ role: "user", content: "Count to 10" }], stream: true, }); for await (const chunk of stream) { if (chunk.choices[0]?.delta?.content) { process.stdout.write(chunk.choices[0].delta.content); } } ``` After consuming all chunks, access stream-level metadata: ```typescript console.log(`\nProvider: ${stream.routing_metadata?.provider}`); console.log(`Tokens: ${stream.usage?.total_tokens}`); console.log(`Request ID: ${stream.responseHeaders.requestId}`); console.log(`Closed: ${stream.isClosed}`); ``` Close a stream manually with `stream.close()`. Routing metadata, usage, and response headers are available only after consuming all chunks. See [Streaming Guide](/guides/streaming) for full patterns including tool call streaming. ### Tool calling ```typescript const tools = [ { type: "function" as const, function: { name: "get_weather", description: "Get weather for a city", parameters: { type: "object", properties: { city: { type: "string" }, }, required: ["city"], }, }, }, ]; const response = await client.chat.completions.create({ model: "gpt-4o", messages: [{ role: "user", content: "What's the weather in Paris?" }], tools, }); if (response.choices[0].message.tool_calls) { const toolCall = response.choices[0].message.tool_calls[0]; console.log(`Function: ${toolCall.function.name}`); console.log(`Arguments: ${toolCall.function.arguments}`); } ``` See [Tool Calling Guide](/guides/tool-calling) for multi-turn tool conversations. --- ## Page: TypeScript SDK > Section: Read response headers Every response and error includes a `responseHeaders` object with typed accessors: ```typescript const response = await client.chat.completions.create({ model: "gpt-4o", messages: [{ role: "user", content: "Hello!" }], }); response.responseHeaders.requestId; // string | undefined response.responseHeaders.rateLimitRemaining; // number | undefined response.responseHeaders.rateLimitLimit; // number | undefined response.responseHeaders.rateLimitReset; // string | undefined response.responseHeaders.creditsBalanceMicrodollars; // number | undefined response.responseHeaders.providerUsed; // string | undefined response.responseHeaders.routingStrategy; // string | undefined response.responseHeaders.get("x-custom-header"); // generic lookup response.responseHeaders.getAll("x-multi-header"); // string[] for multi-value headers ``` | Property | Header | Type | |----------|--------|------| | `requestId` | `x-request-id` | `string \| undefined` | | `rateLimitRemaining` | `x-ratelimit-remaining-requests` | `number \| undefined` | | `rateLimitLimit` | `x-ratelimit-limit-requests` | `number \| undefined` | | `rateLimitReset` | `x-ratelimit-reset-requests` | `string \| undefined` | | `creditsBalanceMicrodollars` | `x-credits-balance-microdollars` | `number \| undefined` | | `providerUsed` | `x-provider-used` | `string \| undefined` | | `routingStrategy` | `x-routing-strategy` | `string \| undefined` | Error objects also carry `responseHeaders`. Use `e.responseHeaders.requestId` when filing support tickets to correlate with server logs. See the [TypeScript SDK Reference](/sdk/typescript-reference#response-headers) for the complete `ResponseHeaders` API. --- ## Page: TypeScript SDK > Section: Read token usage The `Usage` object on every response carries optional detail breakdowns: ```typescript const response = await client.chat.completions.create({ model: "gpt-4o", messages: [{ role: "user", content: "Hello!" }], }); const usage = response.usage; // Prompt token breakdown if (usage?.prompt_tokens_details) { console.log(`Cached: ${usage.prompt_tokens_details.cached_tokens}`); console.log(`Text: ${usage.prompt_tokens_details.text_tokens}`); console.log(`Image: ${usage.prompt_tokens_details.image_tokens}`); console.log(`Audio: ${usage.prompt_tokens_details.audio_tokens}`); } // Completion token breakdown if (usage?.completion_tokens_details) { console.log(`Reasoning: ${usage.completion_tokens_details.reasoning_tokens}`); console.log(`Text: ${usage.completion_tokens_details.text_tokens}`); } ``` | Field | Sub-fields | Type | |-------|-----------|------| | `prompt_tokens_details` | `cached_tokens`, `text_tokens`, `image_tokens`, `audio_tokens` | `number \| undefined` each | | `completion_tokens_details` | `reasoning_tokens`, `text_tokens`, `image_tokens`, `audio_tokens` | `number \| undefined` each | Availability depends on the provider. `completion_tokens_details.reasoning_tokens` is present for OpenAI o-series, DeepSeek, xAI, and Google Gemini. It's `undefined` for providers that don't report reasoning token counts (Anthropic, Moonshot, Fireworks). See [Check reasoning token availability](/guides/extensions-and-thinking#check-reasoning-token-availability) for the full breakdown. --- ## Page: TypeScript SDK > Section: Handle errors Catch typed exceptions: ```typescript import { Client, AurikoAPIError, AuthenticationError, RateLimitError, BudgetExceededError, ModelNotFoundError, ProviderError, // Also available: InvalidRequestError, InsufficientCreditsError, // InternalError, ProviderAuthError, ServiceUnavailableError } from "@auriko/sdk"; const client = new Client(); try { const response = await client.chat.completions.create({ model: "gpt-4o", messages: [{ role: "user", content: "Hello!" }], }); } catch (e) { if (e instanceof AuthenticationError) { console.log(`Check your API key: ${e.message}`); } else if (e instanceof RateLimitError) { console.log(`Rate limited: ${e.message}`); } else if (e instanceof BudgetExceededError) { console.log(`Budget exceeded: ${e.message}`); } else if (e instanceof ModelNotFoundError) { console.log(`Model not found: ${e.message}`); } else if (e instanceof ProviderError) { console.log(`Provider error: ${e.message}`); } else if (e instanceof AurikoAPIError) { console.log(`API error (${e.statusCode}): ${e.message}`); } } ``` See [Error Handling Guide](/guides/error-handling) for retry patterns. --- ## Page: TypeScript SDK > Section: Use management APIs Query workspace, budget, and model information: ```typescript // Identity (discover your workspace) const identity = await client.me.get(); // Workspaces const workspaces = await client.workspaces.list(); const workspace = await client.workspaces.get("ws-123"); // Budgets const budgets = await client.budgets.list("ws-123"); const budget = await client.budgets.get("ws-123", "budget-456"); // Models const registry = await client.models.listRegistry(); const directory = await client.models.listDirectory(); const providers = await client.models.listProviders(); ``` ### Model listing choices | Method | Returns | Use when | |--------|---------|----------| | `listRegistry()` | Flat list: `id`, `family`, `display_name` | You need a quick model ID lookup | | `listDirectory()` | Rich detail: provider entries, context windows, capabilities, pricing tiers | You need to compare providers or check capabilities | | `listProviders()` | Provider catalog: display name, description, data policy | You need to see available providers | See the [TypeScript SDK Reference](/sdk/typescript-reference) for the complete API. --- ## Page: TypeScript SDK > Section: SDK scope The Auriko SDK covers: inference (chat completions with routing), read-only management (workspaces, budgets, identity), and model discovery. For full platform operations (workspace creation, budget management, API key rotation), use the [REST API](/api-reference/overview) directly. --- ## Page: TypeScript SDK > Section: Use TypeScript types The SDK provides typed responses, errors, and routing configuration. Import types directly: ```typescript import type { ChatCompletion, ChatCompletionChunk, ChoiceMessage, Choice, Usage, RoutingMetadata, RoutingOptions, Extensions, } from "@auriko/sdk"; ``` --- ## Page: TypeScript SDK > Section: Node.js, Deno, and Browser The SDK works in multiple environments: ### Node.js ```typescript import { Client } from "@auriko/sdk"; const client = new Client(); // reads AURIKO_API_KEY from env ``` ### Deno ```typescript import { Client } from "npm:@auriko/sdk"; const client = new Client({ apiKey: Deno.env.get("AURIKO_API_KEY"), }); ``` ### Browser (with bundler) ```typescript import { Client } from "@auriko/sdk"; // Pass API key from your backend - never expose in client-side code! const client = new Client({ apiKey: apiKeyFromBackend, }); ``` Never expose your API key in client-side code. Use a backend proxy instead. === # Auriko Framework Integrations ## Page: LangChain + Auriko Use Auriko as your LLM provider in LangChain with a drop-in `ChatOpenAI` replacement. --- ## Page: LangChain + Auriko > Section: Prerequisites - An [Auriko API key](https://auriko.ai/signup) --- ## Page: LangChain + Auriko > Section: Installation ```bash pip install "auriko[langchain]" ``` --- ## Page: LangChain + Auriko > Section: Use SDK adapter Use the `AurikoChatOpenAI` adapter: ```python from auriko.frameworks.langchain import AurikoChatOpenAI llm = AurikoChatOpenAI(model="gpt-5.4") ``` `AurikoChatOpenAI` extends LangChain's `ChatOpenAI` with: - Automatic `use_responses_api=False` (LangChain >=1.1 auto-routes GPT-5/Codex to the Responses API, which Auriko doesn't implement) - Routing injection via `extra_body` - OpenAI error mapping to typed Auriko error classes ```python from auriko.frameworks.langchain import AurikoChatOpenAI llm = AurikoChatOpenAI(model="gpt-5.4") # Simple invoke response = llm.invoke("What is 2+2?") print(response.content) # Streaming for chunk in llm.stream("Count to 5"): print(chunk.content, end="", flush=True) # With messages from langchain_core.messages import HumanMessage, SystemMessage messages = [ SystemMessage(content="You are a helpful assistant."), HumanMessage(content="Explain quantum computing briefly."), ] response = llm.invoke(messages) print(response.content) ``` --- ## Page: LangChain + Auriko > Section: Configure options | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | `model` | `str` | (required, via parent) | Model ID | | `api_key` | `str \| None` | `AURIKO_API_KEY` env | API key | | `routing` | `RoutingOptions \| None` | `None` | Routing configuration | | `base_url` | `str` | `"https://api.auriko.ai/v1"` | API base URL | | `**kwargs` | | | Passed through to `ChatOpenAI` (e.g., `temperature`, `max_tokens`) | --- ## Page: LangChain + Auriko > Section: Configure routing Configure routing options: ```python from auriko.frameworks.langchain import AurikoChatOpenAI from auriko.route_types import RoutingOptions llm = AurikoChatOpenAI( model="gpt-5.4", routing=RoutingOptions(optimize="cost", max_ttft_ms=200), ) response = llm.invoke("Hello!") print(response.content) ``` Routing metadata is available through response generation info when using `generate()`: ```python result = llm.generate([[HumanMessage(content="Hello!")]]) info = result.generations[0][0].generation_info if info and "routing_metadata" in info: print(f"Provider: {info['routing_metadata']['provider']}") ``` --- ## Page: LangChain + Auriko > Section: Configure manually If you prefer to use `ChatOpenAI` directly: ```python import os from langchain_openai import ChatOpenAI llm = ChatOpenAI( model="gpt-5.4", api_key=os.environ["AURIKO_API_KEY"], base_url="https://api.auriko.ai/v1", use_responses_api=False, # required for Auriko ) ``` Note: you must set `use_responses_api=False` manually, and routing options aren't available without `extra_body` configuration. --- ## Page: LangChain + Auriko > Section: Notes - `AurikoChatOpenAI` inherits all `ChatOpenAI` capabilities: chains, agents, tool calling, async, streaming. - OpenAI API errors are automatically mapped to typed Auriko error classes (`RateLimitError`, `BudgetExceededError`, etc.). - The `use_responses_api=False` flag is set automatically — you don't need to remember it. --- ## Page: OpenAI Agents SDK + Auriko Use Auriko as your LLM provider in the OpenAI Agents SDK. --- ## Page: OpenAI Agents SDK + Auriko > Section: Prerequisites - An [Auriko API key](https://auriko.ai/signup) --- ## Page: OpenAI Agents SDK + Auriko > Section: Installation ```bash pip install "auriko[agents]" ``` --- ## Page: OpenAI Agents SDK + Auriko > Section: Use SDK adapter Use the `AurikoModel` adapter: ```python from auriko.frameworks.agents import AurikoModel model = AurikoModel(model="gpt-5.4") ``` `AurikoModel` replaces 4 lines of global client configuration with a single model parameter. It extends `OpenAIChatCompletionsModel` with routing injection, error mapping, and per-task metadata isolation via `ContextVar`. ```python import asyncio from auriko.frameworks.agents import AurikoModel from agents import Agent, Runner model = AurikoModel(model="gpt-5.4") agent = Agent( name="assistant", instructions="You are a helpful assistant.", model=model, ) async def main(): result = await Runner.run(agent, input="What is the capital of France?") print(result.final_output) asyncio.run(main()) ``` --- ## Page: OpenAI Agents SDK + Auriko > Section: Configure options | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | `model` | `str` | (required) | Model ID | | `api_key` | `str \| None` | `AURIKO_API_KEY` env | API key | | `routing` | `RoutingOptions \| None` | `None` | Routing configuration | | `base_url` | `str` | `"https://api.auriko.ai/v1"` | API base URL | --- ## Page: OpenAI Agents SDK + Auriko > Section: Configure routing Configure routing options: ```python import asyncio from auriko.frameworks.agents import AurikoModel from auriko.route_types import RoutingOptions from agents import Agent, Runner model = AurikoModel( model="gpt-5.4", routing=RoutingOptions(optimize="cost"), ) agent = Agent(name="assistant", instructions="You are helpful.", model=model) async def main(): result = await Runner.run(agent, input="Hello!") print(result.final_output) asyncio.run(main()) ``` Routing metadata is isolated per async task using `ContextVar`, so concurrent `Runner.run()` calls sharing the same `AurikoModel` instance don't interfere with each other. --- ## Page: OpenAI Agents SDK + Auriko > Section: Configure manually If you prefer to configure the SDK's client directly: ```python import asyncio import os from openai import AsyncOpenAI from agents import Agent, Runner, set_default_openai_client, set_default_openai_api, set_tracing_disabled set_default_openai_api("chat_completions") set_tracing_disabled(True) client = AsyncOpenAI( base_url="https://api.auriko.ai/v1", api_key=os.environ["AURIKO_API_KEY"], ) set_default_openai_client(client, use_for_tracing=False) agent = Agent(name="assistant", instructions="You are helpful.", model="gpt-5.4") async def main(): result = await Runner.run(agent, input="Hello!") print(result.final_output) asyncio.run(main()) ``` Note: `set_default_openai_api("chat_completions")` is required because Auriko implements the Chat Completions API, not the Responses API. Routing options, error mapping, and per-task metadata isolation aren't available with manual configuration. --- ## Page: OpenAI Agents SDK + Auriko > Section: Notes - `AurikoModel` extends `OpenAIChatCompletionsModel` — it works with all Agents SDK features: tools, handoffs, streaming, guardrails. - OpenAI API errors are automatically mapped to typed Auriko error classes (`RateLimitError`, `BudgetExceededError`, etc.). - Concurrent agent runs using the same `AurikoModel` instance have isolated routing metadata via `ContextVar`. --- ## Page: Google ADK + Auriko Use Auriko as your LLM provider in Google's Agent Development Kit (ADK). --- ## Page: Google ADK + Auriko > Section: Prerequisites - An [Auriko API key](https://auriko.ai/signup) --- ## Page: Google ADK + Auriko > Section: Installation ```bash pip install "auriko[adk]" ``` --- ## Page: Google ADK + Auriko > Section: Use SDK adapter Use the `AurikoLlm` adapter: ```python from auriko.frameworks.adk import AurikoLlm llm = AurikoLlm(model="gpt-5.4") ``` `AurikoLlm` is a native `BaseLlm` implementation that doesn't use LiteLLM as an intermediary. It converts directly between ADK (Gemini) types and OpenAI message format, supporting text and function calling. ```python import asyncio from auriko.frameworks.adk import AurikoLlm from google.adk import Agent, Runner from google.adk.sessions import InMemorySessionService from google.genai import types llm = AurikoLlm(model="gpt-5.4") agent = Agent( model=llm, name="assistant", instruction="You are a helpful assistant.", ) session_service = InMemorySessionService() runner = Runner(agent=agent, app_name="my_app", session_service=session_service, auto_create_session=True) user_message = types.Content( role="user", parts=[types.Part(text="What is 2+2?")] ) async def main(): async for event in runner.run_async(user_id="user-1", session_id="session-1", new_message=user_message): if event.content and event.content.parts: for part in event.content.parts: if part.text: print(part.text, end="", flush=True) asyncio.run(main()) ``` --- ## Page: Google ADK + Auriko > Section: Configure options | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | `model` | `str` | (required) | Model ID | | `api_key` | `str` | `""` (reads `AURIKO_API_KEY` at first use) | API key | | `routing` | `RoutingOptions \| None` | `None` | Routing configuration | | `base_url` | `str` | `"https://api.auriko.ai/v1"` | API base URL | --- ## Page: Google ADK + Auriko > Section: Configure routing Configure routing options: ```python from auriko.frameworks.adk import AurikoLlm from auriko.route_types import RoutingOptions llm = AurikoLlm( model="gpt-5.4", routing=RoutingOptions(optimize="cost"), ) ``` --- ## Page: Google ADK + Auriko > Section: Configure manually If you prefer to use Google's `LiteLlm` class directly: ```python import os from google.adk.models.lite_llm import LiteLlm llm = LiteLlm( model="openai/gpt-5.4", api_key=os.environ["AURIKO_API_KEY"], api_base="https://api.auriko.ai/v1", custom_llm_provider="openai", ) ``` LiteLLM ignores `api_base` for model names containing provider keywords (like `gpt` or `claude`). Always include `custom_llm_provider="openai"` to force LiteLLM to respect your custom base URL. Note: routing options and Auriko error mapping aren't available with manual configuration. --- ## Page: Google ADK + Auriko > Section: Notes - Supports text and function calling. Inline data (`inline_data`) and file data (`file_data`) aren't yet supported and raise `NotImplementedError`. - OpenAI API errors are automatically mapped to typed Auriko error classes. - The adapter uses `AsyncOpenAI` internally; the client is lazily initialized on first use. --- ## Page: CrewAI + Auriko Use Auriko as your LLM provider in CrewAI for cost-effective multi-agent workflows. --- ## Page: CrewAI + Auriko > Section: Prerequisites - An [Auriko API key](https://auriko.ai/signup) --- ## Page: CrewAI + Auriko > Section: Installation ```bash pip install "auriko[crewai]" ``` --- ## Page: CrewAI + Auriko > Section: Use SDK adapter Use the `AurikoCrewAILLM` adapter: ```python from auriko.frameworks.crewai import AurikoCrewAILLM auriko_llm = AurikoCrewAILLM(model="gpt-5.4") ``` `AurikoCrewAILLM` adds an `openai/` prefix internally so all models (including Claude) route through Auriko. Without this wrapper, CrewAI detects `claude-` model names and silently routes to the native Anthropic SDK, bypassing Auriko entirely. ```python from crewai import Agent, Task, Crew researcher = Agent( role="Researcher", goal="Find accurate and comprehensive information", backstory="You are an expert researcher with attention to detail.", llm=auriko_llm.llm, verbose=True, ) writer = Agent( role="Writer", goal="Write clear, engaging content based on research", backstory="You are a skilled technical writer.", llm=auriko_llm.llm, verbose=True, ) research_task = Task( description="Research the latest trends in AI agents", agent=researcher, expected_output="A detailed summary of AI agent trends with sources", ) writing_task = Task( description="Write a blog post based on the research findings", agent=writer, expected_output="A 500-word blog post about AI agent trends", context=[research_task], ) crew = Crew( agents=[researcher, writer], tasks=[research_task, writing_task], verbose=True, ) result = crew.kickoff() print(result) ``` --- ## Page: CrewAI + Auriko > Section: Configure options | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | `model` | `str` | (required) | Model ID (e.g., `"gpt-5.4"`, `"claude-sonnet-4-20250514"`) | | `api_key` | `str \| None` | `AURIKO_API_KEY` env | API key | | `routing` | `RoutingOptions \| None` | `None` | Routing configuration | | `base_url` | `str` | `"https://api.auriko.ai/v1"` | API base URL | | `**kwargs` | | | Passed through to `crewai.LLM` | --- ## Page: CrewAI + Auriko > Section: Configure routing Configure routing options: ```python from auriko.frameworks.crewai import AurikoCrewAILLM from auriko.route_types import RoutingOptions auriko_llm = AurikoCrewAILLM( model="gpt-5.4", routing=RoutingOptions(optimize="cost"), ) # After crew.kickoff(), access routing metadata from the last request metadata = auriko_llm.last_routing_metadata if metadata: print(f"Provider: {metadata.provider}") ``` Different agents can use different models and routing strategies: ```python fast_llm = AurikoCrewAILLM(model="gpt-4o", routing=RoutingOptions(optimize="speed")) smart_llm = AurikoCrewAILLM(model="gpt-5.4", routing=RoutingOptions(optimize="balanced")) researcher = Agent(role="Researcher", goal="Find information", backstory="Expert", llm=smart_llm.llm) writer = Agent(role="Writer", goal="Write content", backstory="Skilled writer", llm=fast_llm.llm) ``` --- ## Page: CrewAI + Auriko > Section: Configure manually If you prefer not to use the SDK adapter, you can configure CrewAI's `LLM` directly. You must add the `openai/` prefix to the model name manually: ```python import os from crewai import LLM llm = LLM( model="openai/gpt-5.4", # openai/ prefix required base_url="https://api.auriko.ai/v1", api_key=os.environ["AURIKO_API_KEY"], ) ``` Note: routing options and metadata access aren't available with manual configuration. --- ## Page: CrewAI + Auriko > Section: Notes - `AurikoCrewAILLM` wraps `crewai.LLM` — pass the `.llm` property to `Agent`, not the wrapper itself. - The `openai/` prefix is added automatically for all models, including Claude. - `last_routing_metadata` returns metadata from the most recent non-streaming response only. --- ## Page: LlamaIndex + Auriko Use Auriko as your LLM provider in LlamaIndex. --- ## Page: LlamaIndex + Auriko > Section: Prerequisites - An [Auriko API key](https://auriko.ai/signup) --- ## Page: LlamaIndex + Auriko > Section: Installation ```bash pip install "auriko[llamaindex]" ``` --- ## Page: LlamaIndex + Auriko > Section: Use SDK adapter Use the `AurikoLlamaIndexLLM` adapter: ```python from auriko.frameworks.llamaindex import AurikoLlamaIndexLLM llm = AurikoLlamaIndexLLM(model="gpt-5.4") ``` `AurikoLlamaIndexLLM` extends LlamaIndex's `OpenAI` LLM class with routing injection, per-call routing overrides, and Auriko error mapping. ```python from auriko.frameworks.llamaindex import AurikoLlamaIndexLLM from llama_index.core.llms import ChatMessage llm = AurikoLlamaIndexLLM(model="gpt-5.4") # Simple chat response = llm.chat([ChatMessage(role="user", content="What is 2+2?")]) print(response.message.content) # Streaming for chunk in llm.stream_chat([ChatMessage(role="user", content="Count to 5")]): print(chunk.delta, end="", flush=True) ``` --- ## Page: LlamaIndex + Auriko > Section: Configure options | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | `model` | `str` | (required, via parent) | Model ID | | `api_key` | `str \| None` | `AURIKO_API_KEY` env | API key | | `routing` | `RoutingOptions \| None` | `None` | Default routing configuration | | `api_base` | `str` | `"https://api.auriko.ai/v1"` | API base URL | | `**kwargs` | | | Passed through to LlamaIndex's `OpenAI` (e.g., `temperature`, `max_tokens`) | --- ## Page: LlamaIndex + Auriko > Section: Configure routing Instance-level routing applies to all requests: ```python from auriko.frameworks.llamaindex import AurikoLlamaIndexLLM from auriko.route_types import RoutingOptions llm = AurikoLlamaIndexLLM( model="gpt-5.4", routing=RoutingOptions(optimize="cost"), ) ``` Per-call routing overrides the instance default: ```python from auriko.route_types import RoutingOptions # Use cost optimization for this call only response = llm.chat( [ChatMessage(role="user", content="Hello!")], routing=RoutingOptions(optimize="speed"), ) ``` Access routing metadata from the response: ```python response = llm.chat([ChatMessage(role="user", content="Hello!")]) metadata = response.additional_kwargs.get("routing_metadata") if metadata: print(f"Provider: {metadata['provider']}") ``` --- ## Page: LlamaIndex + Auriko > Section: Configure manually If you prefer to use LlamaIndex's `OpenAI` class directly: ```python import os from llama_index.llms.openai import OpenAI llm = OpenAI( model="gpt-5.4", api_key=os.environ["AURIKO_API_KEY"], api_base="https://api.auriko.ai/v1", ) ``` Note: routing options, per-call overrides, and Auriko error mapping aren't available with manual configuration. --- ## Page: LlamaIndex + Auriko > Section: Notes - `AurikoLlamaIndexLLM` inherits all LlamaIndex OpenAI capabilities: chat, completion, streaming, async. - OpenAI API errors are automatically mapped to typed Auriko error classes (`RateLimitError`, `BudgetExceededError`, etc.). - Per-call routing overrides are unique to this adapter — pass `routing=RoutingOptions(...)` to any chat/complete call. === # Auriko Platform ## Page: Rate Limits Inference rate limits apply only to BYOK (Bring Your Own Key) requests and scale with your usage tier. Platform keys have no inference rate limits. Management endpoints have separate per-user limits. --- ## Page: Rate Limits > Section: Prerequisites - An [Auriko API key](https://auriko.ai/signup) --- ## Page: Rate Limits > Section: Inference rate limits Your rate limit tier is determined by rolling 30-day inference spend and recalculates every 60 minutes: | Tier | 30-day spend | BYOK RPM | BYOK monthly cap | Platform fee | |------|-------------|----------|-------------------|--------------| | Starter | $0 – $500 | 30 | 1,000 | 2.0% | | Growth | $500 – $10,000 | 120 | 50,000 | 1.0% | | Scale | $10,000+ | 600 | Unlimited | 0.5% | | Enterprise | Custom | 1,200 | Unlimited | Custom | Enterprise tier is assigned manually — it is not auto-detected from spend. The limits above apply only to BYOK requests. See [BYOK](/platform/byok) for details. --- ## Page: Rate Limits > Section: Rate limit headers Every response carries OpenAI-compatible rate limit headers: | Header | Description | |--------|-------------| | `Retry-After` | Seconds until rate limit resets (RFC 7231) | | `X-RateLimit-Limit-Requests` | Requests allowed per window | | `X-RateLimit-Remaining-Requests` | Requests remaining in current window | | `X-RateLimit-Reset-Requests` | ISO 8601 timestamp when the window resets | --- ## Page: Rate Limits > Section: Management API rate limits Management endpoints have separate per-user rate limits: | Endpoint | Limit | |----------|-------| | API key creation | 10/min | | Billing checkout | 5/min | | Billing portal | 5/min | | Team invites | 20/min | | BYOK operations | 20/min | | Workspace creation | 5/min | | Account deletion | 2/min | | Budget writes | 10/min | | Management reads (API key) | 60/min (per IP) | | Public registry | 60/min (per IP) | --- ## Page: Rate Limits > Section: Handle 429 responses When you exceed a rate limit, the API returns a `429 Too Many Requests` response with a `Retry-After` header indicating when to retry. ```json { "error": { "message": "Rate limit exceeded. Retry after 12 seconds.", "type": "rate_limit_error", "code": "rate_limit_exceeded" } } ``` The Auriko SDK handles retries automatically with exponential backoff (up to 2 retries by default). For manual handling, see [Error handling — Retry manually](/guides/error-handling#retry-manually). --- ## Page: Team Management Create workspaces, invite members, and manage roles. Member and invite endpoints aren't yet available through the public API (`api.auriko.ai`). Manage team members through the [dashboard](https://auriko.ai/dashboard) instead. --- ## Page: Team Management > Section: Prerequisites - A [session token](/api-reference/authentication#session-authentication) - Workspace owner or admin role (for member management) --- ## Page: Team Management > Section: Roles Workspace permissions are role-based: | Action | Owner | Admin | Member | |--------|-------|-------|--------| | Invite members | Yes | Yes | — | | Change roles | Yes | — | — | | Remove members | Yes | Yes | — | | Transfer ownership | Yes | — | — | | Update workspace | Yes | — | — | | Delete workspace | Yes | — | — | | Cancel invites | Yes | Yes | — | | View members | Yes | Yes | Yes | | Use API keys | Yes | Yes | Yes | | Leave workspace | Yes | Yes | Yes | --- ## Page: Team Management > Section: Create a workspace Workspace management uses session authentication. See [Authentication](/api-reference/authentication#session-authentication) for details. Any authenticated user can create a workspace and becomes the owner: ```bash curl -X POST https://api.auriko.ai/v1/workspaces \ -H "Authorization: Bearer $SESSION_JWT" \ -H "Content-Type: application/json" \ -d '{ "name": "My Team", "slug": "my-team" }' ``` Response: ```json { "id": "7c9e6679-7425-40de-944b-e07fc1f90ae7", "name": "My Team", "slug": "my-team", "tier": "explorer", "user_role": "owner", "member_count": 1, "can_use_paid_models": false, "created_at": "2026-03-20T10:00:00Z" } ``` --- ## Page: Team Management > Section: Invite a member Owners and admins can invite new members: ```bash curl -X POST https://api.auriko.ai/v1/workspaces/{workspace_id}/members/invite \ -H "Authorization: Bearer $SESSION_JWT" \ -H "Content-Type: application/json" \ -d '{ "email": "teammate@example.com", "role": "member" }' ``` Invitations expire after 7 days and can be resent. --- ## Page: Team Management > Section: Accept an invitation The invited user accepts by authenticating and calling the accept endpoint with the invite token: ```bash curl -X POST https://api.auriko.ai/v1/invites/{token}/accept \ -H "Authorization: Bearer $SESSION_JWT" ``` The invite token acts as a secret — it is sent to the invitee's email and is not exposed to workspace admins. --- ## Page: Team Management > Section: Change a member's role Only workspace owners can change roles: ```bash curl -X PATCH https://api.auriko.ai/v1/workspaces/{workspace_id}/members/{user_id} \ -H "Authorization: Bearer $SESSION_JWT" \ -H "Content-Type: application/json" \ -d '{"role": "admin"}' ``` Assignable roles: `admin`, `member`. The `owner` role can only be transferred (see below). --- ## Page: Team Management > Section: Remove a member Owners and admins can remove members: ```bash curl -X DELETE https://api.auriko.ai/v1/workspaces/{workspace_id}/members/{user_id} \ -H "Authorization: Bearer $SESSION_JWT" ``` Members can leave voluntarily: ```bash curl -X POST https://api.auriko.ai/v1/workspaces/{workspace_id}/leave \ -H "Authorization: Bearer $SESSION_JWT" ``` --- ## Page: Team Management > Section: Transfer ownership Only the current owner can transfer ownership: ```bash curl -X POST https://api.auriko.ai/v1/workspaces/{workspace_id}/transfer-ownership \ -H "Authorization: Bearer $SESSION_JWT" \ -H "Content-Type: application/json" \ -d '{"new_owner_id": "550e8400-e29b-41d4-a716-446655440000"}' ``` Auriko demotes the previous owner to admin after the transfer. --- ## Page: Team Management > Section: List and manage invites ```bash # List pending invites curl https://api.auriko.ai/v1/workspaces/{workspace_id}/invites \ -H "Authorization: Bearer $SESSION_JWT" # Cancel an invite curl -X DELETE https://api.auriko.ai/v1/workspaces/{workspace_id}/invites/{invite_id} \ -H "Authorization: Bearer $SESSION_JWT" # Resend an invite curl -X POST https://api.auriko.ai/v1/invites/{invite_id}/resend \ -H "Authorization: Bearer $SESSION_JWT" ``` --- ## Page: Bring Your Own Key Use your own provider API keys with Auriko's routing, monitoring, and fallback capabilities. --- ## Page: Bring Your Own Key > Section: Prerequisites - An [Auriko API key](https://auriko.ai/signup) for inference - A [session token](/api-reference/authentication#session-authentication) for key management - Workspace owner or admin role (for key management) - A valid API key from a supported provider --- ## Page: Bring Your Own Key > Section: Find your workspace ID Your API key is scoped to a workspace. To discover your workspace ID, call `/v1/me`: ```bash curl https://api.auriko.ai/v1/me \ -H "Authorization: Bearer $AURIKO_API_KEY" ``` The response includes your `workspace_id`: ```json { "object": "api_key_identity", "workspace_id": "7c9e6679-7425-40de-944b-e07fc1f90ae7", "user_id": "550e8400-e29b-41d4-a716-446655440000", "tier": "explorer", "rate_limit_rpm": 60 } ``` You can also find your workspace ID in the [dashboard](https://auriko.ai/dashboard) under Settings. `workspace_id` is `null` for keys created before workspace support. --- ## Page: Bring Your Own Key > Section: Add a provider key Provider key management uses [session authentication](/api-reference/authentication#session-authentication). Get a session token from the [dashboard](https://auriko.ai/dashboard), then register a provider key: ```bash curl -X POST https://api.auriko.ai/v1/workspaces/{workspace_id}/provider-keys \ -H "Authorization: Bearer $SESSION_JWT" \ -H "Content-Type: application/json" \ -d '{ "provider": "openai", "api_key": "sk-...", "label": "Production OpenAI", "validate_before_save": true }' ``` Response: ```json { "id": "pk_abc123", "provider": "openai", "provider_name": "OpenAI", "key_prefix": "sk-...wxyz", "is_default": true, "validation_status": "valid", "detected_tier": "tier-5", "tier_source": "auto_detected", "created_at": "2026-03-20T10:00:00Z" } ``` When `validate_before_save` is `true` (default), Auriko makes a lightweight probe request to the provider to verify the key works before saving it. --- ## Page: Bring Your Own Key > Section: Supported providers | Provider identifier | Provider name | |---------------------|---------------| | `openai` | OpenAI | | `anthropic` | Anthropic Claude | | `google_ai_studio` | Google AI Studio | | `deepseek` | DeepSeek | | `xai` | xAI Grok | | `fireworks_ai` | Fireworks AI | | `together_ai` | Together AI | | `z_ai` | Z.AI | | `minimax` | MiniMax | | `moonshot` | Moonshot AI | --- ## Page: Bring Your Own Key > Section: Tier detection Auriko auto-detects your provider account tier from rate limit headers on first use. The detected tier affects available RPM and TPM limits for routing decisions. Override auto-detection: - **Enterprise flag** — set `is_enterprise: true` when adding a key to mark it as enterprise tier - **Manual tier** — for providers that require tier selection (for example, Google AI Studio), pass `selected_tier` at key creation - **Update later** — `PATCH /v1/workspaces/{workspace_id}/provider-keys/{id}/tier` to change the tier after creation Once a tier is manually set (`tier_source: "user_specified"`), auto-detection is disabled for that key. --- ## Page: Bring Your Own Key > Section: Use BYOK in requests Control key source with routing constraints: ```python Python import os from auriko import Client client = Client( api_key=os.environ["AURIKO_API_KEY"], base_url="https://api.auriko.ai/v1" ) # Use only your own keys response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Hello!"}], routing={"only_byok": True} ) # Use only platform keys (no BYOK) response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Hello!"}], routing={"only_platform": True} ) ``` ```typescript TypeScript import { Client } from "@auriko/sdk"; const client = new Client({ apiKey: process.env.AURIKO_API_KEY, baseUrl: "https://api.auriko.ai/v1", }); // Use only your own keys const response = await client.chat.completions.create({ model: "gpt-4o", messages: [{ role: "user", content: "Hello!" }], routing: { only_byok: true }, }); // Use only platform keys (no BYOK) const platform = await client.chat.completions.create({ model: "gpt-4o", messages: [{ role: "user", content: "Hello!" }], routing: { only_platform: true }, }); ``` --- ## Page: Bring Your Own Key > Section: Routing behavior The router **prefers your BYOK key** when one exists for the requested provider. You get direct billing control and your provider tier applies. The router falls back to platform keys in two cases: 1. **Exhausted:** your BYOK key has zero remaining rate-limit headroom and the platform key has capacity. 2. **Fetch failure:** your BYOK key can't be retrieved or decrypted at request time and a platform key is available. Override the default with routing constraints: - `only_byok: true`: use only your BYOK key and fail the request if unavailable. - `only_platform: true`: ignore BYOK keys entirely. --- ## Page: Bring Your Own Key > Section: Manage keys These endpoints also use session authentication: ```bash # List all provider keys curl https://api.auriko.ai/v1/workspaces/{workspace_id}/provider-keys \ -H "Authorization: Bearer $SESSION_JWT" # Delete a key curl -X DELETE https://api.auriko.ai/v1/workspaces/{workspace_id}/provider-keys/{id} \ -H "Authorization: Bearer $SESSION_JWT" # Re-validate a key curl -X POST https://api.auriko.ai/v1/workspaces/{workspace_id}/provider-keys/{id}/validate \ -H "Authorization: Bearer $SESSION_JWT" # Set as default for provider curl -X POST https://api.auriko.ai/v1/workspaces/{workspace_id}/provider-keys/{id}/set-default \ -H "Authorization: Bearer $SESSION_JWT" ``` --- ## Page: Bring Your Own Key > Section: Security Auriko encrypts your provider keys and isolates them per workspace. - **Encrypted at rest:** XSalsa20-Poly1305 with per-workspace HKDF-SHA256 key derivation. - **Masked in responses:** API responses return keys as `sk-xxxxx...****` with only the first 8 characters visible. - **Decrypted at request time only:** the edge router decrypts your key when calling the provider, then discards it. - **Never logged:** Auriko never logs or persists decrypted keys. - **Key rotation supported:** encryption key versions are tracked per key for zero-downtime master key rotation. --- ## Page: Bring Your Own Key > Section: Data policies BYOK keys inherit the account-level data policy. Options: `none`, `no_training`, and `zdr` (zero data retention). When a per-request data policy intersects with the account-level policy, the most restrictive one wins. For more on data policies, see [Advanced routing — Data policy](/guides/advanced-routing#data-policy). --- ## Page: Bring Your Own Key > Section: Rate limiting Auriko rate-limits BYOK management endpoints to 20 operations per minute per user. Permissions: owner and admin for add, delete, and tier changes; all members can list keys and use them in requests.