Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.auriko.ai/llms.txt

Use this file to discover all available pages before exploring further.

The auriko Python package provides an OpenAI-compatible client for the Auriko API.

Full SDK Reference

Complete API reference with all types, parameters, and examples

Installation

pip install auriko
Requires Python 3.10 or later.

Get started

from auriko import Client

client = Client()  # reads AURIKO_API_KEY from environment

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)

Configure

API Key

import os

# Option 1: Auto-detect from AURIKO_API_KEY env var (recommended)
client = Client()

# Option 2: Pass explicitly
client = Client(api_key=os.environ["AURIKO_API_KEY"])

Base URL

# Default: https://api.auriko.ai/v1
# Override for self-hosted or proxy setups:
client = Client(base_url="https://your-proxy.example.com/v1")

Timeout

client = Client(timeout=60.0)  # seconds

Retries

client = Client(max_retries=3)  # default is 2

Create chat completions

Basic request

Send a chat completion request:
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is 2+2?"}
    ]
)

print(response.choices[0].message.content)

With routing options

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello!"}],
    gateway={
        "routing": {
            "optimize": "cost",
            "max_ttft_ms": 1000,
        },
    }
)

# Access routing metadata
print(f"Provider: {response.routing_metadata.provider}")
if response.routing_metadata.cost:
    print(f"Cost: ${response.routing_metadata.cost.usd:.6f}")
You can also pass a RoutingOptions object for IDE autocomplete and validation:
from auriko.route_types import GatewayOptions, Optimize, RoutingOptions

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello!"}],
    gateway=GatewayOptions(routing=RoutingOptions(optimize=Optimize.COST, max_ttft_ms=1000)),
)
All routing fields:
FieldTypeDescription
optimizeOptimizeStrategy: "cost", "cost-focus", "ttft", "ttft-focus", "tps", "tps-focus", "balanced"
weightsdict[str, float]Custom scoring weights: cost, ttft, throughput. Overrides preset.
ttft_percentilestrTTFT scoring percentile: "p50" (default) or "p95"
throughput_percentilestrThroughput scoring percentile: "p50" (default) or "p95"
max_cost_per_1mfloatMax $ per 1M tokens (average of input + output)
max_ttft_msintMax TTFT in milliseconds
min_throughput_tpsfloatMin throughput in tokens/sec
providerslist[str]Allowlist of providers
exclude_providerslist[str]Blocklist of providers
preferstrPreferred provider (soft preference)
modeMode"pool" (default) or "fallback"
allow_fallbacksboolEnable fallback on failure
max_fallback_attemptsintMax fallback retries
data_policyDataPolicy"none", "no_training", "zdr"
only_byokboolOnly use BYOK providers
only_platformboolOnly use platform providers
See Advanced Routing for detailed strategy guides.

Multi-model routing

Route a request across multiple models. The router picks the best option based on your routing strategy:
response = client.chat.completions.create(
    messages=[{"role": "user", "content": "Explain quantum computing briefly."}],
    gateway={
        "models": ["gpt-4o", "claude-sonnet-4-20250514", "gemini-2.5-flash"],
        "routing": {"optimize": "cost"},
    },
)

print(f"Model used: {response.model}")
print(f"Provider: {response.routing_metadata.provider}")
print(response.choices[0].message.content)
model and gateway.models are mutually exclusive. Specify exactly one. Passing both raises BadRequestError.

Reasoning effort

Enable extended reasoning for complex tasks using the reasoning_effort parameter:
response = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Solve step by step: what is 23! / 20!?"}],
    reasoning_effort="high",
)

# Access the reasoning output (if the model returns it)
if response.choices[0].message.reasoning_content:
    print(f"Reasoning: {response.choices[0].message.reasoning_content}")
print(f"Answer: {response.choices[0].message.content}")
You can also pass provider-specific parameters through extensions:
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    extensions={"openai": {"logit_bias": {"1234": -100}}}
)
See Extensions and Thinking for provider details and streaming thinking output.

Request metadata

Attach metadata to requests for tracking and analytics:
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    gateway={"metadata": {"user_id": "user-123", "tags": ["premium"]}},
)
Valid metadata fields: user_id, tags (list), trace_id, and custom_fields (dict for arbitrary key-value pairs). See the Python SDK Reference for field constraints.

Stream responses

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Count to 10"}],
    stream=True
)

for chunk in stream:
    if chunk.choices and chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
After consuming all chunks, access stream-level metadata:
print(f"\nProvider: {stream.routing_metadata.provider}")
print(f"Tokens: {stream.usage.total_tokens}")
print(f"Request ID: {stream.response_headers.request_id}")
Use a context manager for automatic cleanup:
with client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Count to 10"}],
    stream=True
) as stream:
    for chunk in stream:
        if chunk.choices and chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="", flush=True)
# stream is automatically closed
Or close manually with stream.close().
Routing metadata, usage, and response headers are available only after consuming all chunks.
See Streaming Guide for full patterns including tool call streaming.

Tool calling

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string"}
                },
                "required": ["city"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What's the weather in Paris?"}],
    tools=tools
)

if response.choices[0].message.tool_calls:
    tool_call = response.choices[0].message.tool_calls[0]
    print(f"Function: {tool_call.function.name}")
    print(f"Arguments: {tool_call.function.arguments}")
See Tool Calling Guide for multi-turn tool conversations.

Read response headers

Every response and error includes a response_headers object with typed accessors:
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

response.response_headers.request_id                  # str | None
response.response_headers.rate_limit_remaining         # int | None
response.response_headers.rate_limit_limit             # int | None
response.response_headers.rate_limit_reset             # str | None
response.response_headers.credits_balance_microdollars # int | None
response.response_headers.get("x-custom-header")       # generic lookup
PropertyHeaderType
request_idx-request-idstr | None
rate_limit_remainingx-ratelimit-remaining-requestsint | None
rate_limit_limitx-ratelimit-limit-requestsint | None
rate_limit_resetx-ratelimit-reset-requestsstr | None
credits_balance_microdollarsx-credits-balance-microdollarsint | None
Error objects also carry response_headers. Use e.response_headers.request_id when filing support tickets to correlate with server logs. See the Python SDK Reference for the complete ResponseHeaders API.

Read token usage

The Usage object on every response carries optional detail breakdowns:
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

usage = response.usage

# Prompt token breakdown
if usage.prompt_tokens_details:
    print(f"Cached: {usage.prompt_tokens_details.cached_tokens}")

# Completion token breakdown
if usage.completion_tokens_details:
    print(f"Reasoning: {usage.completion_tokens_details.reasoning_tokens}")
FieldSub-fieldsType
prompt_tokens_detailscached_tokensOptional[int]
completion_tokens_detailsreasoning_tokensOptional[int]
Availability depends on the provider. completion_tokens_details.reasoning_tokens is present for OpenAI o-series, DeepSeek, xAI, and Google Gemini. It’s None for providers that don’t report reasoning token counts (Anthropic, Moonshot, Fireworks). See Check reasoning token availability for the full breakdown.

Handle errors

Catch typed exceptions:
from auriko import Client
from auriko.errors import (
    AurikoAPIError,
    APIConnectionError,
    AuthenticationError,
    PermissionDeniedError,
    BadRequestError,
    ConflictError,
    NotFoundError,
    RateLimitError,
    InternalServerError,
    APIStatusError,
)

client = Client()

try:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}]
    )
except AuthenticationError as e:
    print(f"Check your API key (request_id={e.request_id})")
except RateLimitError as e:
    print(f"Rate limited, retry after {e.retry_after_seconds}s (code={e.code})")
except NotFoundError as e:
    print(f"Not found: {e.message}")
except BadRequestError as e:
    print(f"Bad request: {e.message} (param={e.param})")
except PermissionDeniedError as e:
    print(f"Not allowed: {e.message}")
except ConflictError as e:
    print(f"Conflict: {e.message} (code={e.code})")
except InternalServerError as e:
    print(f"Server error (request_id={e.request_id})")
except APIStatusError as e:
    print(f"Upstream error ({e.status_code}): {e.message}")
except APIConnectionError as e:
    print(f"Network error: {e.message}")
except AurikoAPIError as e:
    print(f"API error ({e.status_code}): {e.message}")
See Error Handling Guide for retry patterns and map_openai_error().

Use identity and model discovery APIs

Query identity and model information:
# Identity (discover your workspace)
identity = client.me.get()
print(f"Workspace: {identity.workspace_id}")

# Models
models = client.models.list()
model = client.models.retrieve("claude-sonnet-4-6")
registry = client.models.list_registry()
directory = client.models.list_directory()
providers = client.models.list_providers()

Model listing choices

MethodReturnsUse when
list()All models with provider availability, pricing, data policyYou need the full model catalog
retrieve(model_id)Single model: provider availability, pricing, data policyYou have a model ID and need its details
list_registry()Flat list: id, family, display_nameYou need a quick model ID lookup
list_directory()Rich detail: provider entries, context windows, capabilities, pricing tiersYou need to compare providers or check capabilities
list_providers()Provider catalog: display name, description, data policyYou need to see available providers
See the Python SDK Reference for the complete API.

Use async client

Use the async client for non-blocking requests:
from auriko import AsyncClient

async def main():
    client = AsyncClient()

    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}]
    )

    print(response.choices[0].message.content)

import asyncio
asyncio.run(main())

Async streaming

Stream responses asynchronously:
from auriko import AsyncClient

async def stream_response():
    client = AsyncClient()

    stream = await client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Count to 10"}],
        stream=True
    )

    async for chunk in stream:
        if chunk.choices and chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="", flush=True)

Async context manager

Use async with for automatic connection cleanup:
from auriko import AsyncClient

async def main():
    async with AsyncClient() as client:
        response = await client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": "Hello!"}]
        )
        print(response.choices[0].message.content)
    # client.close() called automatically
Or close explicitly: await client.close()

Use with OpenAI-compatible frameworks

AurikoAsyncOpenAI (experimental) is an AsyncOpenAI subclass that captures routing metadata automatically. Pass it to any framework that accepts an external AsyncOpenAI instance. The kwarg name varies across frameworks. Install with the optional openai-compat extra:
pip install "auriko[openai-compat]"

Basic usage

Call it directly like any AsyncOpenAI client, then read last_routing_metadata on the client after the response completes:
import asyncio
from auriko import AurikoAsyncOpenAI

async def main():
    client = AurikoAsyncOpenAI()
    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}],
    )
    print(response.choices[0].message.content)
    print(client.last_routing_metadata.provider)

asyncio.run(main())

Capture metadata per request

last_routing_metadata is a single-slot property. Under concurrent use it reflects the most recent response. For per-request capture, pass an on_response callback:
import asyncio
from auriko import AurikoAsyncOpenAI

captured = []

def handle(metadata):
    captured.append(metadata.provider)

async def main():
    client = AurikoAsyncOpenAI(on_response=handle)
    await asyncio.gather(
        client.chat.completions.create(model="gpt-4o", messages=[{"role": "user", "content": "one"}]),
        client.chat.completions.create(model="gpt-4o", messages=[{"role": "user", "content": "two"}]),
    )
    print(captured)

asyncio.run(main())
The callback must be synchronous. An async callable raises TypeError at construction.

Pass routing options

Pass routing options via the extra_body kwarg. RoutingOptions.to_extra_body() returns a dict shaped for the Auriko API:
import asyncio
from auriko import AurikoAsyncOpenAI
from auriko.route_types import RoutingOptions

async def main():
    client = AurikoAsyncOpenAI()
    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}],
        extra_body=RoutingOptions(optimize="cost").to_extra_body(),
    )
    print(response.choices[0].message.content)

asyncio.run(main())
RoutingOptions lives in auriko.route_types. It is not exported at top-level.

Framework wiring

Each supported framework accepts an external AsyncOpenAI instance via its own kwarg:
FrameworkConstructor call
OpenAI Agents SDKOpenAIChatCompletionsModel(model="gpt-4o", openai_client=client)
LangChain ChatOpenAIChatOpenAI(model="gpt-4o", async_client=client.chat.completions, api_key="placeholder")
LlamaIndex OpenAIOpenAI(model="gpt-4o", async_openai_client=client, api_key="placeholder")
LangChain takes the chat.completions resource rather than the full client. LangChain and LlamaIndex both still require an api_key argument for their own parent-class construction; pass any placeholder value. For the Agents SDK path, see OpenAI Agents SDK. For the full class reference, see AurikoAsyncOpenAI.

AurikoAsyncOpenAI (experimental) or AsyncClient?

Use AurikoAsyncOpenAI when a framework needs an AsyncOpenAI instance. Use auriko.AsyncClient for direct Python code. AsyncClient exposes routing_metadata directly on each response, so you do not need to read a separate client-level property.
AurikoAsyncOpenAI is Python-only. TypeScript consumers can use @auriko/ai-sdk-provider with the Vercel AI SDK, or the OpenAI TS SDK with baseURL: 'https://api.auriko.ai/v1'.

Use context managers

Use a context manager for automatic cleanup:
with Client() as client:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}]
    )
    print(response.choices[0].message.content)

SDK scope

The Auriko SDK covers: inference (chat completions and the Response API, both with routing), identity, and model discovery. For full platform operations, use the REST API directly.

Use type hints

The SDK provides typed responses, errors, and routing configuration. Use your IDE’s autocomplete for the best experience:
from auriko import Client
from auriko.models.chat import ChatCompletion, ChatCompletionChunk

client = Client()

response: ChatCompletion = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)