Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.auriko.ai/llms.txt

Use this file to discover all available pages before exploring further.

Python SDK Reference

See the Python SDK Guide for usage examples and getting started.

Client

Initialize a client with configuration options:
from auriko import Client, AsyncClient

client = Client(
    api_key="ak_...",                   # or AURIKO_API_KEY env var
    base_url="https://api.auriko.ai/v1",   # default
    timeout=60.0,                           # seconds, default 60
    max_retries=2,                          # default 2 (0 disables)
)

Resources

ResourceMethods
client.chat.completionscreate(...)
client.responsescreate(...)
client.modelslist(), retrieve(model_id), list_directory(), list_registry(), list_providers()
client.meget()
All resources are available on both Client (sync) and AsyncClient (async).

Chat Completions

client.chat.completions.create(...)

Creates a chat completion. Supports single-model and multi-model routing.
# Non-streaming
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    max_tokens=100,
)

# Streaming
stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True,
)

Parameters

ParameterTypeRequiredDescription
messageslist[dict]YesConversation messages (non-empty)
modelstrYesModel ID (or use gateway.models for multi-model routing)
streamboolNoEnable streaming (default: False)
temperaturefloatNoSampling temperature (0–2)
max_tokensintNoMax tokens to generate
max_completion_tokensintNoMax completion tokens (alias for max_tokens)
reasoning_effortLiteral['low', 'medium', 'high', 'xhigh', 'max', 'off']NoReasoning effort for supported models — translated to provider-native control (see guide)
top_pfloatNoNucleus sampling (0–1)
frequency_penaltyfloatNoFrequency penalty (-2 to 2)
presence_penaltyfloatNoPresence penalty (-2 to 2)
top_kintNoTop-K sampling
min_pfloatNoMin-P sampling (0–1)
top_afloatNoTop-A sampling (0–1)
repetition_penaltyfloatNoRepetition penalty
stopstr | list[str]NoStop sequences
seedintNoDeterministic sampling seed
nintNoNumber of completions to generate
toolslist[dict]NoFunction calling tool definitions
tool_choicestr | dictNoTool selection: "auto", "none", "required", or function spec
parallel_tool_callsboolNoAllow parallel function calls
response_formatdictNoOutput format (e.g., {"type": "json_object"})
stream_optionsdictNoStream options (e.g., {"include_usage": True})
logprobsboolNoReturn log probabilities
top_logprobsintNoNumber of top logprobs per token (0–20)
logit_biasdict[str, float]NoToken bias adjustments
userstrNoEnd-user identifier
gatewayGatewayOptions | dictNoGateway namespace for routing, multi-model, and metadata options (see gateway.routing, gateway.models, gateway.metadata)
extensionsExtensions | dictNoProvider-specific extensions (provider passthrough)
extra_bodydictNoAdditional body fields (merged last, except stream)

gateway.metadata fields

FieldTypeDescription
tagslist[str]Tags for categorizing requests (max 100 items, each ≤50 chars)
user_idstrYour application’s user identifier for per-user analytics (max 255 chars)
trace_idstrDistributed tracing identifier (max 255 chars)
custom_fieldsdict[str, str]Arbitrary key-value pairs (max 10 keys, keys ≤50 chars, values ≤200 chars)
from auriko import Client
from auriko.route_types import GatewayOptions

client = Client()
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    gateway=GatewayOptions(
        metadata={
            "user_id": "user_123",
            "trace_id": "req-abc",
            "custom_fields": {"env": "prod", "team": "backend"}
        }
    ),
)
Only the four fields above are accepted. Use custom_fields for arbitrary key-value pairs.

Response (non-streaming)

class ChatCompletion:
    id: str
    created: int
    model: str
    object: str  # "chat.completion"
    system_fingerprint: Optional[str]  # not all models include this
    choices: list[Choice]
    usage: Optional[Usage]
    routing_metadata: Optional[RoutingMetadata]
    service_tier: Optional[str]           # processing tier (OpenAI-routed models)
    response_headers: Optional[ResponseHeaders]

class ChoiceMessage:
    role: str
    content: Optional[str]
    reasoning_content: Optional[str]  # chain-of-thought text (plain string)
    reasoning: Optional[list[ThinkingReasoningBlock | RedactedReasoningBlock]]  # structured reasoning blocks with signatures
    refusal: Optional[str]            # model refusal content (OpenAI passthrough)
    tool_calls: Optional[list[ToolCall]]
    annotations: Optional[list[Any]]      # URL citations and model annotations (OpenAI-routed models)

Response (streaming)

Returns a Stream that yields ChatCompletionChunk objects.
stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True,
)

for chunk in stream:
    chunk.choices[0].delta.content                  # incremental content
    chunk.choices[0].delta.reasoning_content        # incremental reasoning text (if enabled)
    chunk.choices[0].delta.reasoning_signature      # signature for current thinking block
    chunk.choices[0].delta.reasoning_redacted_data  # encrypted redacted thinking data

stream.usage            # available after iteration
stream.routing_metadata # available after iteration
stream.response_headers # available immediately
stream.close()          # manual cleanup (or use context manager)

Responses

client.responses.create(...)

Creates a response using the OpenAI Response API format. Supports single-model and multi-model routing.
# Non-streaming
response = client.responses.create(
    model="gpt-4o",
    input="Hello!",
)

# Streaming
stream = client.responses.create(
    model="gpt-4o",
    input="Hello!",
    stream=True,
)

Parameters

ParameterTypeRequiredDescription
inputstr | list[dict]YesText string or structured input items
modelstr | NoneYes*Model ID (*or use gateway.models for multi-model routing)
streamboolNoEnable streaming (default: False)
instructionsstrNoSystem instructions for the model
toolslist[dict]NoTool definitions
tool_choicestr | dictNoTool selection: "auto", "none", "required", or function spec
parallel_tool_callsboolNoAllow parallel function calls
max_output_tokensintNoMax tokens to generate
temperaturefloatNoSampling temperature (0–2)
top_pfloatNoNucleus sampling (0–1)
top_kintNoTop-K sampling
top_logprobsintNoNumber of top logprobs per token (0–20)
reasoningdictNoReasoning config: effort, summary, generate_summary
textdictNoText format config (e.g., {"format": {"type": "json_schema", ...}})
userstrNoEnd-user identifier
metadatadict[str, str]NoArbitrary key-value metadata
includelist[str]NoAdditional data to include in the response
truncationstrNoTruncation strategy for long inputs
prompt_cache_keystrNoKey for prompt caching
safety_identifierstrNoSafety policy identifier
gatewayGatewayOptions | dictNoGateway namespace for routing, multi-model, and metadata options
extensionsExtensions | dictNoProvider-specific extensions
extra_bodydictNoAdditional body fields (merged last)

Response (non-streaming)

class Response:
    id: str
    object: str                              # "response"
    created_at: int
    model: str
    status: str                              # "completed", "failed", "incomplete", "in_progress"
    output: list[ResponseOutputItem]
    output_text: str                         # concatenated text output
    parallel_tool_calls: bool
    tool_choice: Any
    tools: list[Any]
    usage: Optional[ResponseUsage]
    error: Optional[dict]
    incomplete_details: Optional[dict]
    metadata: Optional[dict[str, str]]
    routing_metadata: Optional[RoutingMetadata]
    response_headers: Optional[ResponseHeaders]  # property, set by SDK after parsing

Response (streaming)

Returns a ResponseStream that yields Response API events.
stream = client.responses.create(
    model="gpt-4o",
    input="Hello!",
    stream=True,
)

for event in stream:
    if event.type == "response.output_text.delta":
        print(event.delta, end="")

# After iteration, the terminal event's response is available:
final = stream.completed_response  # Response object from the terminal event
final.usage                        # token usage
final.routing_metadata             # routing details

stream.response_headers            # available immediately (before iteration)
stream.close()                     # manual cleanup (or use context manager)
routing_metadata on completed_response is available for both streaming and non-streaming responses. For streaming, it’s populated after iteration completes.

Models

Query the model catalog:
models = client.models.list()                # GET /v1/models
model = client.models.retrieve("gpt-4o")     # GET /v1/models/{model_id}
directory = client.models.list_directory()    # GET /v1/directory/models
registry = client.models.list_registry()      # GET /v1/registry/models
providers = client.models.list_providers()    # GET /v1/registry/providers

Identity

Get current API key identity:
identity = client.me.get()  # GET /v1/me
# Returns: ApiKeyIdentity { object, user_id, workspace_id, tier, rate_limit_rpm }

Error Classes

All errors extend AurikoAPIError. Dispatch is driven by the type field of the canonical error envelope (see Errors for the full envelope and retry policy).
Error ClassHTTPtype
BadRequestError400 / 413 / 422invalid_request_error
AuthenticationError401authentication_error
PermissionDeniedError403permission_error
NotFoundError404not_found_error
ConflictError409invalid_request_error
RateLimitError429rate_limit_error
InternalServerError500api_error
APIStatusError502 / 503 / 504api_error
APIConnectionError(network failure before response)

AurikoAPIError Fields

FieldTypeDescription
messagestrHuman-readable error description
status_codeintHTTP status code
codestrMachine-readable error code (see Error Codes)
typestrCanonical error type (one of six values)
paramOptional[str]Parameter that caused the error, when attributable
request_idstrValue of x-request-id on the failing response
doc_urlOptional[str]Link to the error’s docs page
retry_after_secondsOptional[int]Retry-After header value (429 / 503 only)
providerOptional[str]Upstream provider that produced this error, when attributable
from auriko import RateLimitError, AuthenticationError

try:
    client.chat.completions.create(...)
except RateLimitError as e:
    print(f"retry after {e.retry_after_seconds}s (request_id={e.request_id})")
except AuthenticationError as e:
    print(f"{e.message} (request_id={e.request_id})")
Unknown error responses fall through to the base AurikoAPIError class. Always keep a catch-all for forward compatibility.

Response Headers

Available on ChatCompletion.response_headers and Stream.response_headers:
response.response_headers.request_id                 # X-Request-ID
response.response_headers.rate_limit_remaining        # X-RateLimit-Remaining-Requests
response.response_headers.rate_limit_limit            # X-RateLimit-Limit-Requests
response.response_headers.rate_limit_reset            # X-RateLimit-Reset-Requests
response.response_headers.credits_balance_microdollars # X-Credits-Balance-Microdollars
response.response_headers.get("x-custom-header")      # any header by name

AurikoAsyncOpenAI (experimental)

AurikoAsyncOpenAI is an AsyncOpenAI subclass that captures routing metadata from every successful response. Use it with frameworks that accept an external AsyncOpenAI instance. This is a Tier 4 experimental integration. For most use cases, use the native SDK or AsyncOpenAI(base_url=...) directly. Install with the optional openai-compat extra:
pip install "auriko[openai-compat]"

Constructor

class AurikoAsyncOpenAI(AsyncOpenAI):
    def __init__(
        self,
        *,
        api_key: str | None = None,
        base_url: str = "https://api.auriko.ai/v1",
        on_response: Callable[[RoutingMetadata], Any] | None = None,
        **kwargs: Any,
    ) -> None: ...
All named parameters are keyword-only.
ParameterTypeDefaultDescription
api_keystr | NoneNoneFalls back to AURIKO_API_KEY env var. Does not fall back to OPENAI_API_KEY. Raises AuthenticationError if neither source supplies a key.
base_urlstr"https://api.auriko.ai/v1"Auriko API base URL.
on_responseCallable[[RoutingMetadata], Any] | NoneNoneSync callback invoked on every successful response. Passing an async callable raises TypeError.
**kwargsAnyForwarded to AsyncOpenAI.__init__ (for example max_retries, timeout). Passing http_client raises TypeError.

last_routing_metadata property

Returns RoutingMetadata | None. Populated after a successful response completes. Returns None before any request, after a request errors, or when the response carried no routing_metadata field. Concurrency caveat: the property uses last-write-wins semantics on a shared client. For per-request capture across concurrent callers, use the on_response callback. Streaming caveat: metadata is extracted during SDK byte iteration, not on stream creation. A streaming caller that does not iterate every chunk may read None.

on_response callback

Signature: Callable[[RoutingMetadata], Any].
  • Sync only. Passing an async callable raises TypeError at construction.
  • Fires once per successful response with a populated routing_metadata. Does not fire on error status, absent routing_metadata, or malformed routing_metadata.
  • Use this callback for per-request capture in concurrent scenarios where the shared last_routing_metadata property is race-prone.
Import RoutingMetadata for type annotations from auriko.route_types:
from auriko.route_types import RoutingMetadata
RoutingMetadata is not exported at top-level auriko. The import from auriko import RoutingMetadata raises ImportError.

Error behavior

AurikoAsyncOpenAI raises dual-inheritance errors for HTTP failures (4xx, 5xx). Each error is catchable as both an Auriko error and an OpenAI error:
import asyncio
import auriko
from auriko import AurikoAsyncOpenAI

async def main():
    client = AurikoAsyncOpenAI()
    try:
        response = await client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": "Hello!"}],
        )
    except auriko.RateLimitError as e:  # also catchable as openai.RateLimitError
        print(f"Rate limited: {e.message}")

asyncio.run(main())
The same error is also catchable as openai.RateLimitError. No map_openai_error() wrapping is needed. Network-layer exceptions (openai.APITimeoutError, openai.APIConnectionError) propagate unchanged.
Mid-stream SSE errors (raised after the HTTP 200 during stream=True) remain unmapped openai.APIError. The bridge covers HTTP-level status errors only.
For map_openai_error() usage with plain AsyncOpenAI, see Error mapping.

Types

Client & Stream

from auriko import Client, AsyncClient, ResponseStream, AsyncResponseStream

Chat Response Types

from auriko.models.chat import (
    ChatCompletion, ChatCompletionChunk, Choice, ChoiceMessage,
    StreamChoice, Delta, ToolCall, ToolCallFunction,
    ToolCallDelta, ToolCallDeltaFunction,
    ThinkingReasoningBlock, RedactedReasoningBlock,
)

Response Types

from auriko.models.responses import Response, ResponseStreamEvent, ResponseUsage, UnknownStreamEvent

Common Types

from auriko.models.common import Usage, PromptTokensDetails, CompletionTokensDetails, ApiKeyIdentity

Routing Types

from auriko.route_types import (
    RoutingOptions, RoutingMetadata, CostInfo, StructuredWarning, StructuredWarningType,
    Optimize, Mode, DataPolicy,
)

Extensions

from auriko.models.extensions import Extensions
FieldTypeDescription
anthropicdictAnthropic-specific parameters
openaidictOpenAI-specific parameters
googledictGoogle-specific parameters
deepseekdictDeepSeek-specific parameters
[key]dictArbitrary provider passthrough

Model Discovery Types

from auriko.models.providers import (
    ModelsListResponse, CanonicalModel,
    DirectoryResponse, DirectoryModel, ProviderEntry, TierEntry,
    ProviderList, ProviderInfo,
)

Error Classes

from auriko.errors import (
    AurikoAPIError, APIConnectionError, APIStatusError,
    AuthenticationError, BadRequestError, ConflictError,
    InternalServerError, NotFoundError, PermissionDeniedError,
    RateLimitError,
)

Utilities

from auriko import ResponseHeaders, map_openai_error
from auriko.route_types import parse_routing_metadata
from auriko.errors import map_error_from_code
parse_routing_metadata(response) extracts RoutingMetadata from an OpenAI SDK response. Returns None if absent or unparseable. Returns None on Auriko SDK responses — use response.routing_metadata directly instead. parse_routing_metadata is Python-only. TypeScript SDK responses include routing_metadata as a typed property. map_error_from_code(code, message, *, param=None, doc_url=None, provider=None, suggestion=None, response_headers=None) constructs a typed AurikoAPIError subclass from an error code string (e.g., "rate_limit_error"RateLimitError).