Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.auriko.ai/llms.txt

Use this file to discover all available pages before exploring further.

Track changes to the Auriko API, SDKs, and supported models.

0.2.0 — 2026-05-18

Routing & Providers

  • Priority tier routing: gateway.routing.tier: "priority" enables Anthropic fast mode (2.5x speed, 6x cost). Premium-tier offerings excluded from routing by default
  • Requests with tool_choice="required" route only to providers that honor the constraint. Returns tool_choice_required_not_supported if no capable provider is available
  • SiliconFlow-hosted models are routable, including 13 exclusive models and reasoning variants (DeepSeek-R1, Qwen3 thinking)
  • qwen-3.6-plus routes correctly through Fireworks AI and Together AI
  • Fireworks AI no longer serves 9 open-weight Qwen 3 models. All remain routable through other providers

Parameters & Request Handling

  • Default timeouts support reasoning models with longer first-token times. deadline_ms is opt-in only
  • developer message role works across all providers
  • Empty assistant messages (content: null, "", or empty text blocks) route correctly to Anthropic

Model Compatibility

  • Anthropic response_format supports type arrays (e.g., type: ["string", "null"]) with correct nullable enum and object handling
  • Zero-parameter tools validate correctly on Anthropic and Google models
  • reasoning_effort coexists with temperature, top_p, and top_k on Claude models. Incompatible sampling parameters are stripped with a warning
  • reasoning_effort normalization across providers:
    • Claude: "xhigh" preserved on Opus 4.7, clamped to "high" on Opus 4.5, mapped to "max" on other models
    • Gemini 3.x: "xhigh" and "max" normalize to "high"
    • OpenAI and xAI: "max" normalizes to each model’s ceiling
    • All normalizations include an unsupported_parameter warning
  • reasoning_effort with extreme max_tokens on Gemini clamps to each model’s accepted range
  • DeepInfra Gemma 3, Qwen3 VL, and MiMo return correct tool call responses, streaming and non-streaming
  • MiniMax models separate reasoning into reasoning_content cleanly
  • output_tokens in Anthropic-format responses includes reasoning tokens for Gemini and xAI
  • llama-3.1-8b-instruct-turbo returns a clean capability error for unsupported structured output

Error Handling

  • Streaming errors return structured messages (message, type, code, provider) instead of empty objects
  • Multi-provider fallback errors include per-provider failure details

Platform

  • New signups receive a personalized welcome email
  • Playground credit for new signups drops from 1.00to1.00 to 0.50

SDKs

Integrations


0.1.1 — 2026-05-12

Parameters & Request Handling

  • User-configurable timeouts via gateway.routing.timeout_ms (per-attempt) and gateway.routing.deadline_ms (request deadline)
    • Streaming uses timeout_ms as time-to-first-byte
    • Defaults: 10s streaming first-byte / 120s streaming deadline; non-streaming 120s / 180s
  • reasoning_effort accepts "xhigh" and "max" levels. Anthropic models use adaptive thinking with automatic effort-to-budget translation
  • Timeout errors include actionable guidance naming the configurable parameters
    • provider_timeout warning emitted when fallback succeeds after a timeout
  • timeout_ms and deadline_ms apply to all execution paths including allow_fallbacks: false

Error Handling

  • Inference validation returns specific error codes — missing_required_parameter, invalid_parameter_value, unknown_field, operation_not_allowed — for precise programmatic error handling
    • invalid_request is reserved for malformed request bodies. HTTP status codes unchanged
  • Platform operations return state_precondition_failed, duplicate_resource, and operation_not_allowed. 405 responses return method_not_allowed
  • New platform codes: field_immutable, invalid_recovery_code, mfa_verification_failed

Model Compatibility

  • Gemini tool calls return correct finish_reason: tool_calls for client-side tool execution
  • Gemini accepts non-object tool results (numbers, booleans, arrays)
  • Multi-turn tool calling works correctly with the OpenAI Python SDK on all providers
  • reasoning_effort works on Gemma-4, qwen-3-vl-thinking, and gpt-5.4+ (when combined with tools)
  • Explicit thinking.budget_tokens requests (e.g. from Claude Code) route correctly to models requiring adaptive thinking
  • Pricing values display with correct precision

SDKs

  • Python SDK auriko 0.1.1
  • TypeScript SDK @auriko/sdk 0.1.1
  • Response models include service_tier, annotations, and prediction token fields

Integrations


0.1.0 — 2026-05-04

API

  • OpenAI-compatible chat completions with multi-provider routing and automatic failover
  • Anthropic Messages API compatibility — POST /v1/messages and token counting
  • Streaming and non-streaming responses across all supported providers
  • Model directory with pricing, capability metadata, and provider health status
  • Workspace billing and credit balance (Preview)
  • Structured reasoning support with cryptographic signatures for multi-turn round-trip

SDKs

Frameworks

Models

  • GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro, Llama 3.1, Mistral Large, DeepSeek V4, and more — see Models for the complete list