Documentation Index
Fetch the complete documentation index at: https://docs.auriko.ai/llms.txt
Use this file to discover all available pages before exploring further.
Python SDK Reference
See the Python SDK Guide for usage examples and getting started.
Client
Initialize a client with configuration options:
from auriko import Client, AsyncClient
client = Client(
api_key="ak_...", # or AURIKO_API_KEY env var
base_url="https://api.auriko.ai/v1", # default
timeout=60.0, # seconds, default 60
max_retries=2, # default 2 (0 disables)
)
Resources
| Resource | Methods |
|---|
client.chat.completions | create(...) |
client.responses | create(...) |
client.models | list(), retrieve(model_id), list_directory(), list_registry(), list_providers() |
client.me | get() |
All resources are available on both Client (sync) and AsyncClient (async).
Chat Completions
client.chat.completions.create(...)
Creates a chat completion. Supports single-model and multi-model routing.
# Non-streaming
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}],
max_tokens=100,
)
# Streaming
stream = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}],
stream=True,
)
Parameters
| Parameter | Type | Required | Description |
|---|
messages | list[dict] | Yes | Conversation messages (non-empty) |
model | str | Yes | Model ID (or use gateway.models for multi-model routing) |
stream | bool | No | Enable streaming (default: False) |
temperature | float | No | Sampling temperature (0–2) |
max_tokens | int | No | Max tokens to generate |
max_completion_tokens | int | No | Max completion tokens (alias for max_tokens) |
reasoning_effort | Literal['low', 'medium', 'high', 'xhigh', 'max', 'off'] | No | Reasoning effort for supported models — translated to provider-native control (see guide) |
top_p | float | No | Nucleus sampling (0–1) |
frequency_penalty | float | No | Frequency penalty (-2 to 2) |
presence_penalty | float | No | Presence penalty (-2 to 2) |
top_k | int | No | Top-K sampling |
min_p | float | No | Min-P sampling (0–1) |
top_a | float | No | Top-A sampling (0–1) |
repetition_penalty | float | No | Repetition penalty |
stop | str | list[str] | No | Stop sequences |
seed | int | No | Deterministic sampling seed |
n | int | No | Number of completions to generate |
tools | list[dict] | No | Function calling tool definitions |
tool_choice | str | dict | No | Tool selection: "auto", "none", "required", or function spec |
parallel_tool_calls | bool | No | Allow parallel function calls |
response_format | dict | No | Output format (e.g., {"type": "json_object"}) |
stream_options | dict | No | Stream options (e.g., {"include_usage": True}) |
logprobs | bool | No | Return log probabilities |
top_logprobs | int | No | Number of top logprobs per token (0–20) |
logit_bias | dict[str, float] | No | Token bias adjustments |
user | str | No | End-user identifier |
gateway | GatewayOptions | dict | No | Gateway namespace for routing, multi-model, and metadata options (see gateway.routing, gateway.models, gateway.metadata) |
extensions | Extensions | dict | No | Provider-specific extensions (provider passthrough) |
extra_body | dict | No | Additional body fields (merged last, except stream) |
| Field | Type | Description |
|---|
tags | list[str] | Tags for categorizing requests (max 100 items, each ≤50 chars) |
user_id | str | Your application’s user identifier for per-user analytics (max 255 chars) |
trace_id | str | Distributed tracing identifier (max 255 chars) |
custom_fields | dict[str, str] | Arbitrary key-value pairs (max 10 keys, keys ≤50 chars, values ≤200 chars) |
from auriko import Client
from auriko.route_types import GatewayOptions
client = Client()
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}],
gateway=GatewayOptions(
metadata={
"user_id": "user_123",
"trace_id": "req-abc",
"custom_fields": {"env": "prod", "team": "backend"}
}
),
)
Only the four fields above are accepted. Use custom_fields for arbitrary key-value pairs.
Response (non-streaming)
class ChatCompletion:
id: str
created: int
model: str
object: str # "chat.completion"
system_fingerprint: Optional[str] # not all models include this
choices: list[Choice]
usage: Optional[Usage]
routing_metadata: Optional[RoutingMetadata]
service_tier: Optional[str] # processing tier (OpenAI-routed models)
response_headers: Optional[ResponseHeaders]
class ChoiceMessage:
role: str
content: Optional[str]
reasoning_content: Optional[str] # chain-of-thought text (plain string)
reasoning: Optional[list[ThinkingReasoningBlock | RedactedReasoningBlock]] # structured reasoning blocks with signatures
refusal: Optional[str] # model refusal content (OpenAI passthrough)
tool_calls: Optional[list[ToolCall]]
annotations: Optional[list[Any]] # URL citations and model annotations (OpenAI-routed models)
Response (streaming)
Returns a Stream that yields ChatCompletionChunk objects.
stream = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}],
stream=True,
)
for chunk in stream:
chunk.choices[0].delta.content # incremental content
chunk.choices[0].delta.reasoning_content # incremental reasoning text (if enabled)
chunk.choices[0].delta.reasoning_signature # signature for current thinking block
chunk.choices[0].delta.reasoning_redacted_data # encrypted redacted thinking data
stream.usage # available after iteration
stream.routing_metadata # available after iteration
stream.response_headers # available immediately
stream.close() # manual cleanup (or use context manager)
Responses
client.responses.create(...)
Creates a response using the OpenAI Response API format. Supports single-model and multi-model routing.
# Non-streaming
response = client.responses.create(
model="gpt-4o",
input="Hello!",
)
# Streaming
stream = client.responses.create(
model="gpt-4o",
input="Hello!",
stream=True,
)
Parameters
| Parameter | Type | Required | Description |
|---|
input | str | list[dict] | Yes | Text string or structured input items |
model | str | None | Yes* | Model ID (*or use gateway.models for multi-model routing) |
stream | bool | No | Enable streaming (default: False) |
instructions | str | No | System instructions for the model |
tools | list[dict] | No | Tool definitions |
tool_choice | str | dict | No | Tool selection: "auto", "none", "required", or function spec |
parallel_tool_calls | bool | No | Allow parallel function calls |
max_output_tokens | int | No | Max tokens to generate |
temperature | float | No | Sampling temperature (0–2) |
top_p | float | No | Nucleus sampling (0–1) |
top_k | int | No | Top-K sampling |
top_logprobs | int | No | Number of top logprobs per token (0–20) |
reasoning | dict | No | Reasoning config: effort, summary, generate_summary |
text | dict | No | Text format config (e.g., {"format": {"type": "json_schema", ...}}) |
user | str | No | End-user identifier |
metadata | dict[str, str] | No | Arbitrary key-value metadata |
include | list[str] | No | Additional data to include in the response |
truncation | str | No | Truncation strategy for long inputs |
prompt_cache_key | str | No | Key for prompt caching |
safety_identifier | str | No | Safety policy identifier |
gateway | GatewayOptions | dict | No | Gateway namespace for routing, multi-model, and metadata options |
extensions | Extensions | dict | No | Provider-specific extensions |
extra_body | dict | No | Additional body fields (merged last) |
Response (non-streaming)
class Response:
id: str
object: str # "response"
created_at: int
model: str
status: str # "completed", "failed", "incomplete", "in_progress"
output: list[ResponseOutputItem]
output_text: str # concatenated text output
parallel_tool_calls: bool
tool_choice: Any
tools: list[Any]
usage: Optional[ResponseUsage]
error: Optional[dict]
incomplete_details: Optional[dict]
metadata: Optional[dict[str, str]]
routing_metadata: Optional[RoutingMetadata]
response_headers: Optional[ResponseHeaders] # property, set by SDK after parsing
Response (streaming)
Returns a ResponseStream that yields Response API events.
stream = client.responses.create(
model="gpt-4o",
input="Hello!",
stream=True,
)
for event in stream:
if event.type == "response.output_text.delta":
print(event.delta, end="")
# After iteration, the terminal event's response is available:
final = stream.completed_response # Response object from the terminal event
final.usage # token usage
final.routing_metadata # routing details
stream.response_headers # available immediately (before iteration)
stream.close() # manual cleanup (or use context manager)
routing_metadata on completed_response is available for both streaming and non-streaming responses. For streaming, it’s populated after iteration completes.
Models
Query the model catalog:
models = client.models.list() # GET /v1/models
model = client.models.retrieve("gpt-4o") # GET /v1/models/{model_id}
directory = client.models.list_directory() # GET /v1/directory/models
registry = client.models.list_registry() # GET /v1/registry/models
providers = client.models.list_providers() # GET /v1/registry/providers
Identity
Get current API key identity:
identity = client.me.get() # GET /v1/me
# Returns: ApiKeyIdentity { object, user_id, workspace_id, tier, rate_limit_rpm }
Error Classes
All errors extend AurikoAPIError. Dispatch is driven by the type field of the canonical error envelope (see Errors for the full envelope and retry policy).
| Error Class | HTTP | type |
|---|
BadRequestError | 400 / 413 / 422 | invalid_request_error |
AuthenticationError | 401 | authentication_error |
PermissionDeniedError | 403 | permission_error |
NotFoundError | 404 | not_found_error |
ConflictError | 409 | invalid_request_error |
RateLimitError | 429 | rate_limit_error |
InternalServerError | 500 | api_error |
APIStatusError | 502 / 503 / 504 | api_error |
APIConnectionError | — | (network failure before response) |
AurikoAPIError Fields
| Field | Type | Description |
|---|
message | str | Human-readable error description |
status_code | int | HTTP status code |
code | str | Machine-readable error code (see Error Codes) |
type | str | Canonical error type (one of six values) |
param | Optional[str] | Parameter that caused the error, when attributable |
request_id | str | Value of x-request-id on the failing response |
doc_url | Optional[str] | Link to the error’s docs page |
retry_after_seconds | Optional[int] | Retry-After header value (429 / 503 only) |
provider | Optional[str] | Upstream provider that produced this error, when attributable |
from auriko import RateLimitError, AuthenticationError
try:
client.chat.completions.create(...)
except RateLimitError as e:
print(f"retry after {e.retry_after_seconds}s (request_id={e.request_id})")
except AuthenticationError as e:
print(f"{e.message} (request_id={e.request_id})")
Unknown error responses fall through to the base AurikoAPIError class. Always keep a catch-all for forward compatibility.
Available on ChatCompletion.response_headers and Stream.response_headers:
response.response_headers.request_id # X-Request-ID
response.response_headers.rate_limit_remaining # X-RateLimit-Remaining-Requests
response.response_headers.rate_limit_limit # X-RateLimit-Limit-Requests
response.response_headers.rate_limit_reset # X-RateLimit-Reset-Requests
response.response_headers.credits_balance_microdollars # X-Credits-Balance-Microdollars
response.response_headers.get("x-custom-header") # any header by name
AurikoAsyncOpenAI (experimental)
AurikoAsyncOpenAI is an AsyncOpenAI subclass that captures routing metadata from every successful response. Use it with frameworks that accept an external AsyncOpenAI instance. This is a Tier 4 experimental integration. For most use cases, use the native SDK or AsyncOpenAI(base_url=...) directly.
Install with the optional openai-compat extra:
pip install "auriko[openai-compat]"
Constructor
class AurikoAsyncOpenAI(AsyncOpenAI):
def __init__(
self,
*,
api_key: str | None = None,
base_url: str = "https://api.auriko.ai/v1",
on_response: Callable[[RoutingMetadata], Any] | None = None,
**kwargs: Any,
) -> None: ...
All named parameters are keyword-only.
| Parameter | Type | Default | Description |
|---|
api_key | str | None | None | Falls back to AURIKO_API_KEY env var. Does not fall back to OPENAI_API_KEY. Raises AuthenticationError if neither source supplies a key. |
base_url | str | "https://api.auriko.ai/v1" | Auriko API base URL. |
on_response | Callable[[RoutingMetadata], Any] | None | None | Sync callback invoked on every successful response. Passing an async callable raises TypeError. |
**kwargs | Any | — | Forwarded to AsyncOpenAI.__init__ (for example max_retries, timeout). Passing http_client raises TypeError. |
Returns RoutingMetadata | None.
Populated after a successful response completes. Returns None before any request, after a request errors, or when the response carried no routing_metadata field.
Concurrency caveat: the property uses last-write-wins semantics on a shared client. For per-request capture across concurrent callers, use the on_response callback.
Streaming caveat: metadata is extracted during SDK byte iteration, not on stream creation. A streaming caller that does not iterate every chunk may read None.
on_response callback
Signature: Callable[[RoutingMetadata], Any].
- Sync only. Passing an async callable raises
TypeError at construction.
- Fires once per successful response with a populated
routing_metadata. Does not fire on error status, absent routing_metadata, or malformed routing_metadata.
- Use this callback for per-request capture in concurrent scenarios where the shared
last_routing_metadata property is race-prone.
Import RoutingMetadata for type annotations from auriko.route_types:
from auriko.route_types import RoutingMetadata
RoutingMetadata is not exported at top-level auriko. The import from auriko import RoutingMetadata raises ImportError.
Error behavior
AurikoAsyncOpenAI raises dual-inheritance errors for HTTP failures (4xx, 5xx). Each error is catchable as both an Auriko error and an OpenAI error:
import asyncio
import auriko
from auriko import AurikoAsyncOpenAI
async def main():
client = AurikoAsyncOpenAI()
try:
response = await client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}],
)
except auriko.RateLimitError as e: # also catchable as openai.RateLimitError
print(f"Rate limited: {e.message}")
asyncio.run(main())
The same error is also catchable as openai.RateLimitError. No map_openai_error() wrapping is needed.
Network-layer exceptions (openai.APITimeoutError, openai.APIConnectionError) propagate unchanged.
Mid-stream SSE errors (raised after the HTTP 200 during stream=True) remain unmapped openai.APIError. The bridge covers HTTP-level status errors only.
For map_openai_error() usage with plain AsyncOpenAI, see Error mapping.
Types
Client & Stream
from auriko import Client, AsyncClient, ResponseStream, AsyncResponseStream
Chat Response Types
from auriko.models.chat import (
ChatCompletion, ChatCompletionChunk, Choice, ChoiceMessage,
StreamChoice, Delta, ToolCall, ToolCallFunction,
ToolCallDelta, ToolCallDeltaFunction,
ThinkingReasoningBlock, RedactedReasoningBlock,
)
Response Types
from auriko.models.responses import Response, ResponseStreamEvent, ResponseUsage, UnknownStreamEvent
Common Types
from auriko.models.common import Usage, PromptTokensDetails, CompletionTokensDetails, ApiKeyIdentity
Routing Types
from auriko.route_types import (
RoutingOptions, RoutingMetadata, CostInfo, StructuredWarning, StructuredWarningType,
Optimize, Mode, DataPolicy,
)
Extensions
from auriko.models.extensions import Extensions
| Field | Type | Description |
|---|
anthropic | dict | Anthropic-specific parameters |
openai | dict | OpenAI-specific parameters |
google | dict | Google-specific parameters |
deepseek | dict | DeepSeek-specific parameters |
[key] | dict | Arbitrary provider passthrough |
Model Discovery Types
from auriko.models.providers import (
ModelsListResponse, CanonicalModel,
DirectoryResponse, DirectoryModel, ProviderEntry, TierEntry,
ProviderList, ProviderInfo,
)
Error Classes
from auriko.errors import (
AurikoAPIError, APIConnectionError, APIStatusError,
AuthenticationError, BadRequestError, ConflictError,
InternalServerError, NotFoundError, PermissionDeniedError,
RateLimitError,
)
Utilities
from auriko import ResponseHeaders, map_openai_error
from auriko.route_types import parse_routing_metadata
from auriko.errors import map_error_from_code
parse_routing_metadata(response) extracts RoutingMetadata from an OpenAI SDK response. Returns None if absent or unparseable. Returns None on Auriko SDK responses — use response.routing_metadata directly instead.
parse_routing_metadata is Python-only. TypeScript SDK responses include routing_metadata as a typed property.
map_error_from_code(code, message, *, param=None, doc_url=None, provider=None, suggestion=None, response_headers=None) constructs a typed AurikoAPIError subclass from an error code string (e.g., "rate_limit_error" → RateLimitError).