Documentation Index Fetch the complete documentation index at: https://docs.auriko.ai/llms.txt
Use this file to discover all available pages before exploring further.
The auriko Python package provides an OpenAI-compatible client for the Auriko API.
Full SDK Reference Complete API reference with all types, parameters, and examples
Installation
Requires Python 3.10 or later.
Get started
from auriko import Client
client = Client() # reads AURIKO_API_KEY from environment
response = client.chat.completions.create(
model = "gpt-5.4" ,
messages = [{ "role" : "user" , "content" : "Hello!" }]
)
print (response.choices[ 0 ].message.content)
API Key
import os
# Option 1: Auto-detect from AURIKO_API_KEY env var (recommended)
client = Client()
# Option 2: Pass explicitly
client = Client( api_key = os.environ[ "AURIKO_API_KEY" ])
Base URL
# Default: https://api.auriko.ai/v1
# Override for self-hosted or proxy setups:
client = Client( base_url = "https://your-proxy.example.com/v1" )
Timeout
client = Client( timeout = 60.0 ) # seconds
Retries
client = Client( max_retries = 3 ) # default is 2
Create chat completions
Basic request
Send a chat completion request:
response = client.chat.completions.create(
model = "gpt-4o" ,
messages = [
{ "role" : "system" , "content" : "You are a helpful assistant." },
{ "role" : "user" , "content" : "What is 2+2?" }
]
)
print (response.choices[ 0 ].message.content)
With routing options
response = client.chat.completions.create(
model = "gpt-5.4" ,
messages = [{ "role" : "user" , "content" : "Hello!" }],
gateway = {
"routing" : {
"optimize" : "cost" ,
"max_ttft_ms" : 1000 ,
},
}
)
# Access routing metadata
print ( f "Provider: { response.routing_metadata.provider } " )
if response.routing_metadata.cost:
print ( f "Cost: $ { response.routing_metadata.cost.usd :.6f} " )
You can also pass a RoutingOptions object for IDE autocomplete and validation:
from auriko.route_types import GatewayOptions, Optimize, RoutingOptions
response = client.chat.completions.create(
model = "gpt-5.4" ,
messages = [{ "role" : "user" , "content" : "Hello!" }],
gateway = GatewayOptions( routing = RoutingOptions( optimize = Optimize. COST , max_ttft_ms = 1000 )),
)
All routing fields:
Field Type Description optimizeOptimizeStrategy: "cost", "cost-focus", "ttft", "ttft-focus", "tps", "tps-focus", "balanced" weightsdict[str, float]Custom scoring weights: cost, ttft, throughput. Overrides preset. ttft_percentilestrTTFT scoring percentile: "p50" (default) or "p95" throughput_percentilestrThroughput scoring percentile: "p50" (default) or "p95" max_cost_per_1mfloatMax $ per 1M tokens (average of input + output) max_ttft_msintMax TTFT in milliseconds min_throughput_tpsfloatMin throughput in tokens/sec providerslist[str]Allowlist of providers exclude_providerslist[str]Blocklist of providers preferstrPreferred provider (soft preference) modeMode"pool" (default) or "fallback"allow_fallbacksboolEnable fallback on failure max_fallback_attemptsintMax fallback retries data_policyDataPolicy"none", "no_training", "zdr"only_byokboolOnly use BYOK providers only_platformboolOnly use platform providers
See Advanced Routing for detailed strategy guides.
Multi-model routing
Route a request across multiple models. The router picks the best option based on your routing strategy:
response = client.chat.completions.create(
messages = [{ "role" : "user" , "content" : "Explain quantum computing briefly." }],
gateway = {
"models" : [ "gpt-4o" , "claude-sonnet-4-20250514" , "gemini-2.5-flash" ],
"routing" : { "optimize" : "cost" },
},
)
print ( f "Model used: { response.model } " )
print ( f "Provider: { response.routing_metadata.provider } " )
print (response.choices[ 0 ].message.content)
model and gateway.models are mutually exclusive. Specify exactly one. Passing both raises BadRequestError.
Reasoning effort
Enable extended reasoning for complex tasks using the reasoning_effort parameter:
response = client.chat.completions.create(
model = "claude-sonnet-4-6" ,
messages = [{ "role" : "user" , "content" : "Solve step by step: what is 23! / 20!?" }],
reasoning_effort = "high" ,
)
# Access the reasoning output (if the model returns it)
if response.choices[ 0 ].message.reasoning_content:
print ( f "Reasoning: { response.choices[ 0 ].message.reasoning_content } " )
print ( f "Answer: { response.choices[ 0 ].message.content } " )
You can also pass provider-specific parameters through extensions:
response = client.chat.completions.create(
model = "gpt-4o" ,
messages = [{ "role" : "user" , "content" : "Hello!" }],
extensions = { "openai" : { "logit_bias" : { "1234" : - 100 }}}
)
See Extensions and Thinking for provider details and streaming thinking output.
Attach metadata to requests for tracking and analytics:
response = client.chat.completions.create(
model = "gpt-4o" ,
messages = [{ "role" : "user" , "content" : "Hello!" }],
gateway = { "metadata" : { "user_id" : "user-123" , "tags" : [ "premium" ]}},
)
Valid metadata fields: user_id, tags (list), trace_id, and custom_fields (dict for arbitrary key-value pairs). See the Python SDK Reference for field constraints.
Stream responses
stream = client.chat.completions.create(
model = "gpt-4o" ,
messages = [{ "role" : "user" , "content" : "Count to 10" }],
stream = True
)
for chunk in stream:
if chunk.choices and chunk.choices[ 0 ].delta.content:
print (chunk.choices[ 0 ].delta.content, end = "" , flush = True )
After consuming all chunks, access stream-level metadata:
print ( f " \n Provider: { stream.routing_metadata.provider } " )
print ( f "Tokens: { stream.usage.total_tokens } " )
print ( f "Request ID: { stream.response_headers.request_id } " )
Use a context manager for automatic cleanup:
with client.chat.completions.create(
model = "gpt-4o" ,
messages = [{ "role" : "user" , "content" : "Count to 10" }],
stream = True
) as stream:
for chunk in stream:
if chunk.choices and chunk.choices[ 0 ].delta.content:
print (chunk.choices[ 0 ].delta.content, end = "" , flush = True )
# stream is automatically closed
Or close manually with stream.close().
Routing metadata, usage, and response headers are available only after consuming all chunks.
See Streaming Guide for full patterns including tool call streaming.
tools = [
{
"type" : "function" ,
"function" : {
"name" : "get_weather" ,
"description" : "Get weather for a city" ,
"parameters" : {
"type" : "object" ,
"properties" : {
"city" : { "type" : "string" }
},
"required" : [ "city" ]
}
}
}
]
response = client.chat.completions.create(
model = "gpt-4o" ,
messages = [{ "role" : "user" , "content" : "What's the weather in Paris?" }],
tools = tools
)
if response.choices[ 0 ].message.tool_calls:
tool_call = response.choices[ 0 ].message.tool_calls[ 0 ]
print ( f "Function: { tool_call.function.name } " )
print ( f "Arguments: { tool_call.function.arguments } " )
See Tool Calling Guide for multi-turn tool conversations.
Every response and error includes a response_headers object with typed accessors:
response = client.chat.completions.create(
model = "gpt-4o" ,
messages = [{ "role" : "user" , "content" : "Hello!" }]
)
response.response_headers.request_id # str | None
response.response_headers.rate_limit_remaining # int | None
response.response_headers.rate_limit_limit # int | None
response.response_headers.rate_limit_reset # str | None
response.response_headers.credits_balance_microdollars # int | None
response.response_headers.get( "x-custom-header" ) # generic lookup
Property Header Type request_idx-request-idstr | Nonerate_limit_remainingx-ratelimit-remaining-requestsint | Nonerate_limit_limitx-ratelimit-limit-requestsint | Nonerate_limit_resetx-ratelimit-reset-requestsstr | Nonecredits_balance_microdollarsx-credits-balance-microdollarsint | None
Error objects also carry response_headers. Use e.response_headers.request_id when filing support tickets to correlate with server logs.
See the Python SDK Reference for the complete ResponseHeaders API.
Read token usage
The Usage object on every response carries optional detail breakdowns:
response = client.chat.completions.create(
model = "gpt-4o" ,
messages = [{ "role" : "user" , "content" : "Hello!" }]
)
usage = response.usage
# Prompt token breakdown
if usage.prompt_tokens_details:
print ( f "Cached: { usage.prompt_tokens_details.cached_tokens } " )
# Completion token breakdown
if usage.completion_tokens_details:
print ( f "Reasoning: { usage.completion_tokens_details.reasoning_tokens } " )
Field Sub-fields Type prompt_tokens_detailscached_tokensOptional[int]completion_tokens_detailsreasoning_tokensOptional[int]
Availability depends on the provider. completion_tokens_details.reasoning_tokens is present for OpenAI o-series, DeepSeek, xAI, and Google Gemini. It’s None for providers that don’t report reasoning token counts (Anthropic, Moonshot, Fireworks).
See Check reasoning token availability for the full breakdown.
Handle errors
Catch typed exceptions:
from auriko import Client
from auriko.errors import (
AurikoAPIError,
APIConnectionError,
AuthenticationError,
PermissionDeniedError,
BadRequestError,
ConflictError,
NotFoundError,
RateLimitError,
InternalServerError,
APIStatusError,
)
client = Client()
try :
response = client.chat.completions.create(
model = "gpt-4o" ,
messages = [{ "role" : "user" , "content" : "Hello!" }]
)
except AuthenticationError as e:
print ( f "Check your API key (request_id= { e.request_id } )" )
except RateLimitError as e:
print ( f "Rate limited, retry after { e.retry_after_seconds } s (code= { e.code } )" )
except NotFoundError as e:
print ( f "Not found: { e.message } " )
except BadRequestError as e:
print ( f "Bad request: { e.message } (param= { e.param } )" )
except PermissionDeniedError as e:
print ( f "Not allowed: { e.message } " )
except ConflictError as e:
print ( f "Conflict: { e.message } (code= { e.code } )" )
except InternalServerError as e:
print ( f "Server error (request_id= { e.request_id } )" )
except APIStatusError as e:
print ( f "Upstream error ( { e.status_code } ): { e.message } " )
except APIConnectionError as e:
print ( f "Network error: { e.message } " )
except AurikoAPIError as e:
print ( f "API error ( { e.status_code } ): { e.message } " )
See Error Handling Guide for retry patterns and map_openai_error().
Use identity and model discovery APIs
Query identity and model information:
# Identity (discover your workspace)
identity = client.me.get()
print ( f "Workspace: { identity.workspace_id } " )
# Models
models = client.models.list()
model = client.models.retrieve( "claude-sonnet-4-6" )
registry = client.models.list_registry()
directory = client.models.list_directory()
providers = client.models.list_providers()
Model listing choices
Method Returns Use when list()All models with provider availability, pricing, data policy You need the full model catalog retrieve(model_id)Single model: provider availability, pricing, data policy You have a model ID and need its details list_registry()Flat list: id, family, display_name You need a quick model ID lookup list_directory()Rich detail: provider entries, context windows, capabilities, pricing tiers You need to compare providers or check capabilities list_providers()Provider catalog: display name, description, data policy You need to see available providers
See the Python SDK Reference for the complete API.
Use async client
Use the async client for non-blocking requests:
from auriko import AsyncClient
async def main ():
client = AsyncClient()
response = await client.chat.completions.create(
model = "gpt-4o" ,
messages = [{ "role" : "user" , "content" : "Hello!" }]
)
print (response.choices[ 0 ].message.content)
import asyncio
asyncio.run(main())
Async streaming
Stream responses asynchronously:
from auriko import AsyncClient
async def stream_response ():
client = AsyncClient()
stream = await client.chat.completions.create(
model = "gpt-4o" ,
messages = [{ "role" : "user" , "content" : "Count to 10" }],
stream = True
)
async for chunk in stream:
if chunk.choices and chunk.choices[ 0 ].delta.content:
print (chunk.choices[ 0 ].delta.content, end = "" , flush = True )
Async context manager
Use async with for automatic connection cleanup:
from auriko import AsyncClient
async def main ():
async with AsyncClient() as client:
response = await client.chat.completions.create(
model = "gpt-4o" ,
messages = [{ "role" : "user" , "content" : "Hello!" }]
)
print (response.choices[ 0 ].message.content)
# client.close() called automatically
Or close explicitly: await client.close()
Use with OpenAI-compatible frameworks
AurikoAsyncOpenAI (experimental) is an AsyncOpenAI subclass that captures routing metadata automatically. Pass it to any framework that accepts an external AsyncOpenAI instance. The kwarg name varies across frameworks.
Install with the optional openai-compat extra:
pip install "auriko[openai-compat]"
Basic usage
Call it directly like any AsyncOpenAI client, then read last_routing_metadata on the client after the response completes:
import asyncio
from auriko import AurikoAsyncOpenAI
async def main ():
client = AurikoAsyncOpenAI()
response = await client.chat.completions.create(
model = "gpt-4o" ,
messages = [{ "role" : "user" , "content" : "Hello!" }],
)
print (response.choices[ 0 ].message.content)
print (client.last_routing_metadata.provider)
asyncio.run(main())
last_routing_metadata is a single-slot property. Under concurrent use it reflects the most recent response. For per-request capture, pass an on_response callback:
import asyncio
from auriko import AurikoAsyncOpenAI
captured = []
def handle ( metadata ):
captured.append(metadata.provider)
async def main ():
client = AurikoAsyncOpenAI( on_response = handle)
await asyncio.gather(
client.chat.completions.create( model = "gpt-4o" , messages = [{ "role" : "user" , "content" : "one" }]),
client.chat.completions.create( model = "gpt-4o" , messages = [{ "role" : "user" , "content" : "two" }]),
)
print (captured)
asyncio.run(main())
The callback must be synchronous. An async callable raises TypeError at construction.
Pass routing options
Pass routing options via the extra_body kwarg. RoutingOptions.to_extra_body() returns a dict shaped for the Auriko API:
import asyncio
from auriko import AurikoAsyncOpenAI
from auriko.route_types import RoutingOptions
async def main ():
client = AurikoAsyncOpenAI()
response = await client.chat.completions.create(
model = "gpt-4o" ,
messages = [{ "role" : "user" , "content" : "Hello!" }],
extra_body = RoutingOptions( optimize = "cost" ).to_extra_body(),
)
print (response.choices[ 0 ].message.content)
asyncio.run(main())
RoutingOptions lives in auriko.route_types. It is not exported at top-level.
Framework wiring
Each supported framework accepts an external AsyncOpenAI instance via its own kwarg:
Framework Constructor call OpenAI Agents SDK OpenAIChatCompletionsModel(model="gpt-4o", openai_client=client)LangChain ChatOpenAI ChatOpenAI(model="gpt-4o", async_client=client.chat.completions, api_key="placeholder")LlamaIndex OpenAI OpenAI(model="gpt-4o", async_openai_client=client, api_key="placeholder")
LangChain takes the chat.completions resource rather than the full client. LangChain and LlamaIndex both still require an api_key argument for their own parent-class construction; pass any placeholder value.
For the Agents SDK path, see OpenAI Agents SDK . For the full class reference, see AurikoAsyncOpenAI .
AurikoAsyncOpenAI (experimental) or AsyncClient?
Use AurikoAsyncOpenAI when a framework needs an AsyncOpenAI instance. Use auriko.AsyncClient for direct Python code. AsyncClient exposes routing_metadata directly on each response, so you do not need to read a separate client-level property.
AurikoAsyncOpenAI is Python-only. TypeScript consumers can use @auriko/ai-sdk-provider with the Vercel AI SDK, or the OpenAI TS SDK with baseURL: 'https://api.auriko.ai/v1'.
Use context managers
Use a context manager for automatic cleanup:
with Client() as client:
response = client.chat.completions.create(
model = "gpt-4o" ,
messages = [{ "role" : "user" , "content" : "Hello!" }]
)
print (response.choices[ 0 ].message.content)
SDK scope
The Auriko SDK covers: inference (chat completions and the Response API, both with routing), identity, and model discovery. For full platform operations, use the REST API directly.
Use type hints
The SDK provides typed responses, errors, and routing configuration. Use your IDE’s autocomplete for the best experience:
from auriko import Client
from auriko.models.chat import ChatCompletion, ChatCompletionChunk
client = Client()
response: ChatCompletion = client.chat.completions.create(
model = "gpt-4o" ,
messages = [{ "role" : "user" , "content" : "Hello!" }]
)