Changelog - Auriko

Track changes to the Auriko API, SDKs, and supported models.

1.0.1 — 2026-06-27

API

API key identity restructure: GET /v1/me returns full credential introspection — credential_type, status, profile, key_id, key_prefix, key_name, nested workspace, scopes, structured rate_limits, expires_at, created_at. The object, user_id, and tier fields are removed
API key list default: GET /v1/workspaces/{workspace_id}/api-keys excludes revoked keys by default; pass include_revoked=true to include them
Credit balance fields removed: current_tier, platform_fee_rate, tier_volume_usd, next_tier_threshold_usd from the credit balance API response
/api/v1/pricing/tiers endpoint removed
Management API: Programmatic workspace management via API key — BYOK key lifecycle (byok:read/byok:write), BYOK discovery, workspace settings (workspace:write), routing defaults (routing:read/routing:write), billing and subscription status (billing:read), member listing (members:read), invite lifecycle (members:read/invites:write), audit events (audit:read), budget management (budgets:write), API key lifecycle (keys:write). All write operations return 202
Analytics dashboard API: Summary, performance trend, routing evidence, paginated request log, and CSV export with truncation headers (X-Export-Row-Count, X-Export-Row-Limit, X-Export-Truncated)
/v1/models contract: endpoint query parameter and supported_endpoints/context_window response fields
Model directory aliases: Each entry in /v1/directory/models includes an aliases list
Budget headers on /v1/me: X-Budget-* headers when budgets are configured
Response API phase field: output[].phase ("commentary" | "final_answer") typed in both SDKs and the API reference schema
Usage dashboard access: All workspace members can view usage analytics

Error Handling

response_api_only error code: Response-API-only models return this dedicated code with an actionable suggestion field
client_disconnected error code: Error code for client-initiated disconnections
idempotency_replay_unavailable: Idempotent endpoints return 409 when replay data is unavailable
Model-not-found suggestions: model_not_found errors include an optional suggestion field with similar model names
X-Rate-Limit-Source header: Rate limit errors distinguish between provider and Auriko limits (provider or routing)
Double /v1/ path detection: Requests to /v1/v1/* return 400 naming the duplicate prefix
Retry-After minimum: Rate-limited responses return Retry-After: 1 minimum

Parameters & Request Handling

Response API parameters: frequency_penalty, presence_penalty, max_tool_calls on responses.create()
Response API echo-back fields: temperature, top_p, frequency_penalty, presence_penalty, top_logprobs, truncation reflect request values in responses
max_tool_calls clarification: Limits built-in tool calls only (e.g., web_search, code_interpreter), not user-defined function tools
reasoning.effort on Claude: Enables thinking and produces reasoning output through the Response API. Values exceeding a model’s range are clamped with an unsupported_parameter warning
Forced tool_choice with reasoning on Anthropic: tool_choice (any/tool) coexists with reasoning_effort; incompatible parameters are dropped with a warning

Routing & Providers

Per-model rate limit isolation: Rate limits track each model independently; custom per-key rate_limit_rpm values are enforced
Provider error messages: Include the provider name in error.message with status-appropriate messages. Provider 403s are retryable
Timeout reporting: Stream first-byte timeouts report reason: "timeout" with provider_timeout warnings
tool_choice="required" on GLM models: Targets capable providers; models with no capable provider return tool_choice_required_not_supported

Model Compatibility

Claude Opus 4.8: Tools, structured output, vision, web search, code execution, prompt caching (1,024-token minimum). Adaptive thinking only
Claude Fable 5: Adaptive thinking, 1M context, 128K output. Tools, structured output, vision, web search, code execution, computer use, prompt caching (512-token minimum)
Gemini 3 Pro Image and Gemini 3.1 Flash Image: Image generation with text-and-image I/O, tool calling, web search, reasoning
MiniMax M3: Tiered pricing, tool calling, reasoning, vision, prompt caching. Additional provider via SiliconFlow
Nemotron 3 Ultra 550B: Reasoning, tool calling, JSON mode via Together AI. Additional provider via DeepInfra
Nex N2 Pro: 262K context, adaptive thinking, vision via SiliconFlow (free tier)
Undated Claude model aliases resolve to dated counterparts
12 responses-only models (o1-pro, o3-pro, gpt-5/5.2/5.4/5.5-pro, codex family, deep-research) discoverable in /v1/directory/models
Qwen 3.7 Max via DeepInfra; MiniMax M2.7 via DeepInfra
Structured output works with array-typed schemas (e.g. ["string", "null"]) on Gemini
SiliconFlow Qwen 3 VL 32B/8B does not support response_format: json_schema
reasoning_effort with low max_tokens on Gemini 3.x returns finish_reason: "length"
Response API streaming usage includes token counts for all providers

SDKs

Python SDK auriko 1.0.1
TypeScript SDK @auriko/sdk 1.0.1
Vercel AI SDK provider @auriko/ai-sdk-provider 1.0.1

Documentation

Response API guides: overview, streaming, tool calling, structured output, reasoning, and routing

0.3.0 — 2026-05-24

API

Response API: POST /v1/responses creates model responses using the OpenAI Response API format, with streaming, tool calling, reasoning summaries, and multi-model routing
Gemini image generation: Image-capable Gemini models return generated images inline as images[] in chat completions, with image_tokens in usage
Gemini 3.5 Flash: Routing to gemini-3.5-flash on Google AI Studio with function calling, structured output, streaming, vision, reasoning, prompt caching, and code execution

Routing & Providers

Streaming-only provider routing: Providers that only support streaming are excluded from non-streaming requests. Requests targeting a specific provider return non_streaming_not_supported with guidance to set stream: true
kimi-k2.5, kimi-k2.6, gpt-oss, and glm-5.1 support the Response API
tool_choice: "required" downgrades to "auto" on reasoning models with an unsupported_parameter warning. Models with no capable provider return tool_choice_required_not_supported
tool_choice: "none" is honored on all endpoints

Parameters & Request Handling

Default timeouts: 120s streaming first-byte, 5-minute non-streaming total, 18-minute non-streaming deadline. User-set timeout_ms / deadline_ms override defaults
Unsupported parameter warnings: Human-readable format (e.g., 'metadata' is not supported.)
system as top-level parameter: Returns a structured warning directing to the messages array

Model Compatibility

claude-opus-4-7 sampling parameters: temperature, top_p, and top_k are accepted with an unsupported_parameter warning on all endpoints
Anthropic reasoning with low max_tokens: Requests with reasoning_effort and low max_tokens return successful responses with an unsupported_parameter warning
Response API returns structured reasoning output for DeepSeek R1 0528, Qwen 3 30B A3B, and Qwen 3 32B

SDKs

Python SDK auriko 0.3.0
TypeScript SDK @auriko/sdk 0.3.0

0.2.0 — 2026-05-18

Routing & Providers

Priority tier routing: gateway.routing.tier: "priority" enables faster inference (2.5x speed, 6x cost). Premium-tier offerings not included by default
Requests with tool_choice="required" target providers that honor the constraint. Returns tool_choice_required_not_supported if no capable provider is available
SiliconFlow-hosted models are supported, including 13 exclusive models and reasoning variants (DeepSeek-R1, Qwen3 thinking)
qwen-3.6-plus available through additional providers
9 Qwen 3 models are available through alternative providers

Parameters & Request Handling

Default timeouts support reasoning models with longer first-token times. deadline_ms is opt-in only

Model Compatibility

Anthropic response_format supports type arrays (e.g., type: ["string", "null"]) with nullable enum and object handling
Zero-parameter tools validate on Anthropic and Google models
reasoning_effort coexists with temperature, top_p, and top_k on Claude models. Incompatible sampling parameters are omitted with a warning
reasoning_effort mapping across providers:
- Claude: "xhigh" preserved on Opus 4.7, clamped to "high" on Opus 4.5, mapped to "max" on other models
- Gemini 3.x: "xhigh" and "max" map to "high"
- OpenAI and xAI: "max" maps to each model’s ceiling
- All mappings include an unsupported_parameter warning
reasoning_effort with extreme max_tokens on Gemini clamps to each model’s accepted range
DeepInfra Gemma 3, Qwen3 VL, and MiMo return tool call responses, streaming and non-streaming
MiniMax models return structured reasoning output
output_tokens in Anthropic-format responses includes reasoning tokens
llama-3.1-8b-instruct-turbo returns a capability error for unsupported structured output

Error Handling

Streaming errors return structured messages with message, type, code, and provider fields
Multi-provider fallback errors include per-provider failure details

SDKs

Python SDK auriko 0.2.0
TypeScript SDK @auriko/sdk 0.2.0
Vercel AI SDK provider @auriko/ai-sdk-provider 0.2.0

Integrations

Kilo Code integration guide

0.1.1 — 2026-05-12

Parameters & Request Handling

User-configurable timeouts via gateway.routing.timeout_ms (per-attempt) and gateway.routing.deadline_ms (request deadline)
- Streaming uses timeout_ms as time-to-first-byte
- Defaults: 10s streaming first-byte / 120s streaming deadline; non-streaming 120s / 180s
reasoning_effort accepts "xhigh" and "max" levels. Anthropic models use adaptive thinking with automatic effort-to-budget translation
Timeout errors include actionable guidance naming the configurable parameters
- provider_timeout warning emitted when fallback succeeds after a timeout
timeout_ms and deadline_ms apply to all execution paths including allow_fallbacks: false

Error Handling

Error codes unknown_parameter, capability_mismatch, routing_constraint_unsatisfiable, tokens_per_min_exceeded, concurrent_requests_exceeded, and routing_failed — capability-specific codes listed below
Inference validation returns specific error codes — missing_required_parameter, invalid_parameter_value, unknown_field, operation_not_allowed — for precise programmatic error handling
- invalid_request is reserved for malformed request bodies. HTTP status codes unchanged
Platform operations return state_precondition_failed, duplicate_resource, and operation_not_allowed. 405 responses return method_not_allowed
New platform codes: field_immutable, invalid_recovery_code, mfa_verification_failed

Model Compatibility

Gemini tool calls return finish_reason: tool_calls for client-side tool execution
Gemini accepts non-object tool results (numbers, booleans, arrays)
reasoning_effort works on Gemma-4, qwen-3-vl-thinking, and gpt-5.4+ (when combined with tools)
Explicit thinking.budget_tokens requests (e.g. from Claude Code) work on models that support them

SDKs

Python SDK auriko 0.1.1
TypeScript SDK @auriko/sdk 0.1.1
Response models include service_tier, annotations, and prediction token fields

Integrations

OpenCode models.dev definitions (15 curated models)
Claude Code integration guide

0.1.0 — 2026-05-04

API

OpenAI-compatible chat completions with multi-provider routing and automatic failover
Anthropic Messages API compatibility — POST /v1/messages and token counting
Streaming and non-streaming responses across all supported providers
Model directory with pricing, capability metadata, and provider health status
Workspace billing and credit balance (Preview)
Structured reasoning support with cryptographic signatures for multi-turn round-trip
Structured error responses with machine-readable code and type fields for programmatic error handling

SDKs

Python SDK auriko 0.1.0 — with OpenAI Agents SDK compatibility via auriko[openai-compat]
TypeScript SDK @auriko/sdk 0.1.0
Vercel AI SDK provider @auriko/ai-sdk-provider 0.1.0

Frameworks

LangChain, LlamaIndex, CrewAI, Google ADK, OpenAI Agents SDK
Claude Agent SDK via ANTHROPIC_BASE_URL
Vercel AI SDK

Models

GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro, Llama 3.1, Mistral Large, DeepSeek V4, and more — see Models for the complete list

​1.0.1 — 2026-06-27

​API

​Error Handling

​Parameters & Request Handling

​Routing & Providers

​Model Compatibility

​SDKs

​Documentation

​0.3.0 — 2026-05-24

​API

​Routing & Providers

​Parameters & Request Handling

​Model Compatibility

​SDKs

​0.2.0 — 2026-05-18

​Routing & Providers

​Parameters & Request Handling

​Model Compatibility

​Error Handling

​SDKs

​Integrations

​0.1.1 — 2026-05-12

​Parameters & Request Handling

​Error Handling

​Model Compatibility

​SDKs

​Integrations

​0.1.0 — 2026-05-04

​API

​SDKs

​Frameworks

​Models

1.0.1 — 2026-06-27

API

Error Handling

Parameters & Request Handling

Routing & Providers

Model Compatibility

SDKs

Documentation

0.3.0 — 2026-05-24

API

Routing & Providers

Parameters & Request Handling

Model Compatibility

SDKs

0.2.0 — 2026-05-18

Routing & Providers

Parameters & Request Handling

Model Compatibility

Error Handling

SDKs

Integrations

0.1.1 — 2026-05-12

Parameters & Request Handling

Error Handling

Model Compatibility

SDKs

Integrations

0.1.0 — 2026-05-04

API

SDKs

Frameworks

Models