Python SDK - Auriko

The auriko Python package provides an OpenAI-compatible client for the Auriko API.

Full SDK Reference

Complete API reference with all types, parameters, and examples

Installation

pip install auriko

Requires Python 3.10 or later.

Get started

from auriko import Client

client = Client()  # reads AURIKO_API_KEY from environment

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)

Configure

API Key

import os

# Option 1: Auto-detect from AURIKO_API_KEY env var (recommended)
client = Client()

# Option 2: Pass explicitly
client = Client(api_key=os.environ["AURIKO_API_KEY"])

Base URL

# Default: https://api.auriko.ai/v1
# Override for self-hosted or proxy setups:
client = Client(base_url="https://your-proxy.example.com/v1")

Timeout

client = Client(timeout=60.0)  # seconds

Retries

client = Client(max_retries=3)  # default is 2

Create chat completions

Basic request

Send a chat completion request:

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is 2+2?"}
    ]
)

print(response.choices[0].message.content)

With routing options

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello!"}],
    gateway={
        "routing": {
            "optimize": "cost",
            "max_ttft_ms": 1000,
        },
    }
)

# Access routing metadata
print(f"Provider: {response.routing_metadata.provider}")
if response.routing_metadata.cost:
    print(f"Cost: ${response.routing_metadata.cost.usd:.6f}")

You can also pass a RoutingOptions object for IDE autocomplete and validation:

from auriko.route_types import GatewayOptions, Optimize, RoutingOptions

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello!"}],
    gateway=GatewayOptions(routing=RoutingOptions(optimize=Optimize.COST, max_ttft_ms=1000)),
)

All routing fields:

Field	Type	Description
`optimize`	`Optimize`	Strategy: `"cost"`, `"cost-focus"`, `"ttft"`, `"ttft-focus"`, `"tps"`, `"tps-focus"`, `"balanced"`
`weights`	`dict[str, float]`	Custom scoring weights: `cost`, `ttft`, `throughput`. Overrides preset.
`ttft_percentile`	`str`	TTFT scoring percentile: `"p50"` (default) or `"p95"`
`throughput_percentile`	`str`	Throughput scoring percentile: `"p50"` (default) or `"p95"`
`max_cost_per_1m`	`float`	Max $ per 1M tokens (average of input + output)
`max_ttft_ms`	`int`	Max TTFT in milliseconds
`min_throughput_tps`	`float`	Min throughput in tokens/sec
`providers`	`list[str]`	Allowlist of providers
`exclude_providers`	`list[str]`	Blocklist of providers
`prefer`	`str`	Preferred provider (soft preference)
`mode`	`Mode`	`"pool"` (default) or `"fallback"`
`allow_fallbacks`	`bool`	Enable fallback on failure
`max_fallback_attempts`	`int`	Max fallback retries
`data_policy`	`DataPolicy`	`"none"`, `"no_training"`, `"zdr"`
`only_byok`	`bool`	Only use BYOK providers
`only_platform`	`bool`	Only use platform providers

See Advanced Routing for detailed strategy guides.

Multi-model routing

Route a request across multiple models. The router picks the best option based on your routing strategy:

response = client.chat.completions.create(
    messages=[{"role": "user", "content": "Explain quantum computing briefly."}],
    gateway={
        "models": ["gpt-4o", "claude-sonnet-4-20250514", "gemini-2.5-flash"],
        "routing": {"optimize": "cost"},
    },
)

print(f"Model used: {response.model}")
print(f"Provider: {response.routing_metadata.provider}")
print(response.choices[0].message.content)

model and gateway.models are mutually exclusive. Specify exactly one. Passing both raises BadRequestError.

Reasoning effort

Enable extended reasoning for complex tasks using the reasoning_effort parameter:

response = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Solve step by step: what is 23! / 20!?"}],
    reasoning_effort="high",
)

# Access the reasoning output (if the model returns it)
if response.choices[0].message.reasoning_content:
    print(f"Reasoning: {response.choices[0].message.reasoning_content}")
print(f"Answer: {response.choices[0].message.content}")

You can also pass provider-specific parameters through extensions:

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello!"}],
    extensions={"openai": {"logit_bias": {"1234": -100}}}
)

See Extensions and Thinking for provider details and streaming thinking output.

Request metadata

Attach metadata to requests for tracking and analytics:

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello!"}],
    gateway={"metadata": {"user_id": "user-123", "tags": ["premium"]}},
)

Valid metadata fields: user_id, tags (list), trace_id, and custom_fields (dict for arbitrary key-value pairs). See the Python SDK Reference for field constraints.

Stream responses

stream = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Count to 10"}],
    stream=True
)

for chunk in stream:
    if chunk.choices and chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

After consuming all chunks, access stream-level metadata:

print(f"\nProvider: {stream.routing_metadata.provider}")
print(f"Tokens: {stream.usage.total_tokens}")
print(f"Request ID: {stream.response_headers.request_id}")

Use a context manager for automatic cleanup:

with client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Count to 10"}],
    stream=True
) as stream:
    for chunk in stream:
        if chunk.choices and chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="", flush=True)
# stream is automatically closed

Or close manually with stream.close().

Routing metadata, usage, and response headers are available only after consuming all chunks.

See Streaming Guide for full patterns including tool call streaming.

Tool calling

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string"}
                },
                "required": ["city"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "What's the weather in Paris?"}],
    tools=tools
)

if response.choices[0].message.tool_calls:
    tool_call = response.choices[0].message.tool_calls[0]
    print(f"Function: {tool_call.function.name}")
    print(f"Arguments: {tool_call.function.arguments}")

See Tool Calling Guide for multi-turn tool conversations.

Read response headers

Every response and error includes a response_headers object with typed accessors:

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello!"}]
)

response.response_headers.request_id                  # str | None
response.response_headers.rate_limit_remaining         # int | None
response.response_headers.rate_limit_limit             # int | None
response.response_headers.rate_limit_reset             # str | None
response.response_headers.credits_balance_microdollars # int | None
response.response_headers.get("x-custom-header")       # generic lookup

Property	Header	Type
`request_id`	`x-request-id`	`str \| None`
`rate_limit_remaining`	`x-ratelimit-remaining-requests`	`int \| None`
`rate_limit_limit`	`x-ratelimit-limit-requests`	`int \| None`
`rate_limit_reset`	`x-ratelimit-reset-requests`	`str \| None`
`credits_balance_microdollars`	`x-credits-balance-microdollars`	`int \| None`

Error objects also carry response_headers. Use e.response_headers.request_id when filing support tickets to correlate with server logs. See the Python SDK Reference for the complete ResponseHeaders API.

Read token usage

The Usage object on every response carries optional detail breakdowns:

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello!"}]
)

usage = response.usage

# Prompt token breakdown
if usage.prompt_tokens_details:
    print(f"Cached: {usage.prompt_tokens_details.cached_tokens}")

# Completion token breakdown
if usage.completion_tokens_details:
    print(f"Reasoning: {usage.completion_tokens_details.reasoning_tokens}")

Field	Sub-fields	Type
`prompt_tokens_details`	`cached_tokens`	`Optional[int]`
`completion_tokens_details`	`reasoning_tokens`	`Optional[int]`

Availability depends on the provider. completion_tokens_details.reasoning_tokens is present for OpenAI o-series, DeepSeek, xAI, and Google Gemini. It’s None for providers that don’t report reasoning token counts (Anthropic, Moonshot, Fireworks). See Check reasoning token availability for the full breakdown.

Handle errors

Catch typed exceptions:

from auriko import Client
from auriko.errors import (
    AurikoAPIError,
    APIConnectionError,
    AuthenticationError,
    PermissionDeniedError,
    BadRequestError,
    ConflictError,
    NotFoundError,
    RateLimitError,
    InternalServerError,
    APIStatusError,
)

client = Client()

try:
    response = client.chat.completions.create(
        model="gpt-5.4",
        messages=[{"role": "user", "content": "Hello!"}]
    )
except AuthenticationError as e:
    print(f"Check your API key (request_id={e.request_id})")
except RateLimitError as e:
    print(f"Rate limited, retry after {e.retry_after_seconds}s (code={e.code})")
except NotFoundError as e:
    print(f"Not found: {e.message}")
except BadRequestError as e:
    print(f"Bad request: {e.message} (param={e.param})")
except PermissionDeniedError as e:
    print(f"Not allowed: {e.message}")
except ConflictError as e:
    print(f"Conflict: {e.message} (code={e.code})")
except InternalServerError as e:
    print(f"Server error (request_id={e.request_id})")
except APIStatusError as e:
    print(f"Upstream error ({e.status_code}): {e.message}")
except APIConnectionError as e:
    print(f"Network error: {e.message}")
except AurikoAPIError as e:
    print(f"API error ({e.status_code}): {e.message}")

See Error Handling Guide for retry patterns and map_openai_error().

Use identity and model discovery APIs

Query identity and model information:

# Identity (discover your workspace)
identity = client.me.get()
print(f"Workspace: {identity.workspace.id}")

# Models
models = client.models.list()
model = client.models.retrieve("claude-sonnet-4-6")
registry = client.models.list_registry()
directory = client.models.list_directory()
providers = client.models.list_providers()

Model listing choices

Method	Returns	Use when
`list()`	All models with provider availability, pricing, data policy	You need the full model catalog
`retrieve(model_id)`	Single model: provider availability, pricing, data policy	You have a model ID and need its details
`list_registry()`	Flat list: `id`, `family`, `display_name`	You need a quick model ID lookup
`list_directory()`	Rich detail: provider entries, context windows, capabilities, pricing tiers	You need to compare providers or check capabilities
`list_providers()`	Provider catalog: display name, description, data policy	You need to see available providers

See the Python SDK Reference for the complete API.

Use async client

Use the async client for non-blocking requests:

from auriko import AsyncClient

async def main():
    client = AsyncClient()

    response = await client.chat.completions.create(
        model="gpt-5.4",
        messages=[{"role": "user", "content": "Hello!"}]
    )

    print(response.choices[0].message.content)

import asyncio
asyncio.run(main())

Async streaming

Stream responses asynchronously:

from auriko import AsyncClient

async def stream_response():
    client = AsyncClient()

    stream = await client.chat.completions.create(
        model="gpt-5.4",
        messages=[{"role": "user", "content": "Count to 10"}],
        stream=True
    )

    async for chunk in stream:
        if chunk.choices and chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="", flush=True)

Async context manager

Use async with for automatic connection cleanup:

from auriko import AsyncClient

async def main():
    async with AsyncClient() as client:
        response = await client.chat.completions.create(
            model="gpt-5.4",
            messages=[{"role": "user", "content": "Hello!"}]
        )
        print(response.choices[0].message.content)
    # client.close() called automatically

Or close explicitly: await client.close()

Use with OpenAI-compatible frameworks

AurikoAsyncOpenAI (experimental) is an AsyncOpenAI subclass that captures routing metadata automatically. Pass it to any framework that accepts an external AsyncOpenAI instance. The kwarg name varies across frameworks. Install with the optional openai-compat extra:

pip install "auriko[openai-compat]"

Basic usage

Call it directly like any AsyncOpenAI client, then read last_routing_metadata on the client after the response completes:

import asyncio
from auriko import AurikoAsyncOpenAI

async def main():
    client = AurikoAsyncOpenAI()
    response = await client.chat.completions.create(
        model="gpt-5.4",
        messages=[{"role": "user", "content": "Hello!"}],
    )
    print(response.choices[0].message.content)
    print(client.last_routing_metadata.provider)

asyncio.run(main())

Capture metadata per request

last_routing_metadata is a single-slot property. Under concurrent use it reflects the most recent response. For per-request capture, pass an on_response callback:

import asyncio
from auriko import AurikoAsyncOpenAI

captured = []

def handle(metadata):
    captured.append(metadata.provider)

async def main():
    client = AurikoAsyncOpenAI(on_response=handle)
    await asyncio.gather(
        client.chat.completions.create(model="gpt-5.4", messages=[{"role": "user", "content": "one"}]),
        client.chat.completions.create(model="gpt-5.4", messages=[{"role": "user", "content": "two"}]),
    )
    print(captured)

asyncio.run(main())

The callback must be synchronous. An async callable raises TypeError at construction.

Pass routing options

Pass routing options via the extra_body kwarg. RoutingOptions.to_extra_body() returns a dict shaped for the Auriko API:

import asyncio
from auriko import AurikoAsyncOpenAI
from auriko.route_types import RoutingOptions

async def main():
    client = AurikoAsyncOpenAI()
    response = await client.chat.completions.create(
        model="gpt-5.4",
        messages=[{"role": "user", "content": "Hello!"}],
        extra_body=RoutingOptions(optimize="cost").to_extra_body(),
    )
    print(response.choices[0].message.content)

asyncio.run(main())

RoutingOptions lives in auriko.route_types. It is not exported at top-level.

Framework wiring

Each supported framework accepts an external AsyncOpenAI instance via its own kwarg:

Framework	Constructor call
OpenAI Agents SDK	`OpenAIChatCompletionsModel(model="gpt-5.4", openai_client=client)`
LangChain `ChatOpenAI`	`ChatOpenAI(model="gpt-5.4", async_client=client.chat.completions, api_key="placeholder")`
LlamaIndex `OpenAI`	`OpenAI(model="gpt-5.4", async_openai_client=client, api_key="placeholder")`

LangChain takes the chat.completions resource rather than the full client. LangChain and LlamaIndex both still require an api_key argument for their own parent-class construction; pass any placeholder value. For the Agents SDK path, see OpenAI Agents SDK. For the full class reference, see AurikoAsyncOpenAI.

`AurikoAsyncOpenAI` (experimental) or `AsyncClient`?

Use AurikoAsyncOpenAI when a framework needs an AsyncOpenAI instance. Use auriko.AsyncClient for direct Python code. AsyncClient exposes routing_metadata directly on each response, so you do not need to read a separate client-level property.

AurikoAsyncOpenAI is Python-only. TypeScript consumers can use @auriko/ai-sdk-provider with the Vercel AI SDK, or the OpenAI TS SDK with baseURL: 'https://api.auriko.ai/v1'.

Use context managers

Use a context manager for automatic cleanup:

with Client() as client:
    response = client.chat.completions.create(
        model="gpt-5.4",
        messages=[{"role": "user", "content": "Hello!"}]
    )
    print(response.choices[0].message.content)

SDK scope

The Auriko SDK covers: inference (chat completions and the Response API, both with routing), identity, and model discovery. For full platform operations, use the REST API directly.

Use type hints

The SDK provides typed responses, errors, and routing configuration. Use your IDE’s autocomplete for the best experience:

from auriko import Client
from auriko.models.chat import ChatCompletion, ChatCompletionChunk

client = Client()

response: ChatCompletion = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello!"}]
)

Full SDK Reference

​Installation

​Get started

​Configure

​API Key

​Base URL

​Timeout

​Retries

​Create chat completions

​Basic request

​With routing options

​Multi-model routing

​Reasoning effort

​Request metadata

​Stream responses

​Tool calling

​Read response headers

​Read token usage

​Handle errors

​Use identity and model discovery APIs

​Model listing choices

​Use async client

​Async streaming

​Async context manager

​Use with OpenAI-compatible frameworks

​Basic usage

​Capture metadata per request

​Pass routing options

​Framework wiring

​AurikoAsyncOpenAI (experimental) or AsyncClient?

​Use context managers

​SDK scope

​Use type hints

Installation

Get started

Configure

API Key

Base URL

Timeout

Retries

Create chat completions

Basic request

With routing options

Multi-model routing

Reasoning effort

Request metadata

Stream responses

Tool calling

Read response headers

Read token usage

Handle errors

Use identity and model discovery APIs

Model listing choices

Use async client

Async streaming

Async context manager

Use with OpenAI-compatible frameworks

Basic usage

Capture metadata per request

Pass routing options

Framework wiring

`AurikoAsyncOpenAI` (experimental) or `AsyncClient`?

Use context managers

SDK scope

Use type hints