AddDocumentation Index
Fetch the complete documentation index at: https://docs.auriko.ai/llms.txt
Use this file to discover all available pages before exploring further.
reasoning_effort to your request. Auriko translates it into each provider’s native format. Use extensions keyed by provider name to pass through provider-specific parameters like Anthropic’s metadata or Google’s safety_settings.
Prerequisites
- An Auriko API key
- Python 3.10+ with the OpenAI SDK (
pip install openai) or the auriko SDK (pip install auriko)- OR Node.js 18+ with the OpenAI SDK (
npm install openai) or@auriko/sdk(npm install @auriko/sdk)
- OR Node.js 18+ with the OpenAI SDK (
- A model that supports reasoning (see provider support table)
Enable thinking
Passreasoning_effort in your request to control extended reasoning:
Check provider support
Auriko translatesreasoning_effort for each provider:
| Provider | Models | Behavior |
|---|---|---|
| Anthropic | Claude 4.6 (Opus, Sonnet) | Adaptive thinking with effort control |
| Anthropic | Claude 4.5 Opus | Thinking budget + effort control |
| Anthropic | Claude 4.5 Sonnet/Haiku | Thinking budget derived from effort level |
| OpenAI | o3, o4-mini, GPT-5 | Native reasoning_effort (dropped when tools present on GPT-5.4+) |
| Gemini 3.x | Thinking level (low/medium/high) | |
| Gemini 2.5 Flash/Pro | Thinking budget derived from effort level | |
| DeepSeek | V4 Flash, V4 Pro | Thinking budget derived from effort level |
| xAI | Grok 3 mini, Grok 4.3 | Native reasoning_effort (low/high on Grok 3 mini, low/medium/high on Grok 4.3) |
| MiniMax | M2 series | Built-in reasoning; reasoning_effort dropped |
| Moonshot | Kimi K2.5, Kimi K2.6 | Native reasoning_effort |
Non-reasoning models (e.g. GPT-4o, GPT-4.1, Llama) reject
reasoning_effort with 400 reasoning_not_supported. The one exception is GPT-5.4+ with tools: Auriko drops reasoning_effort to prevent an upstream 400.Read thinking output
Some providers surface the model’s chain-of-thought in thereasoning_content field on the response message:
Not all reasoning models populate
reasoning_content, so check before accessing. OpenAI keeps reasoning internal, and other providers vary by model.Preserve reasoning across turns
Some providers return reasoning context you echo back for multi-turn continuity. Anthropic and Google use structuredreasoning blocks with cryptographic signatures, while DeepSeek uses a plain-text reasoning_content field. Include the relevant fields from the assistant response in your next request to preserve context.
Read structured reasoning
type:
thinking: containsthinking(the reasoning text) andsignature(cryptographic signature)redacted: containsdata(encrypted, opaque to the client)
Round-trip reasoning
To continue a multi-turn conversation with reasoning context, include the full assistant message (withreasoning) in your next request:
DeepSeek reasoning content
DeepSeek models return reasoning as a plainreasoning_content string instead of structured reasoning blocks. For multi-turn conversations with DeepSeek, include reasoning_content on assistant messages you send back. To preserve it, serialize the full message object:
reasoning_content, Auriko sets it to an empty string. Echo back the original value from the response.
Stream reasoning fields
When streaming with extended thinking, two additional delta fields carry reasoning block data:delta.reasoning_signature: cryptographic signature for the current thinking blockdelta.reasoning_redacted_data: encrypted data for a redacted thinking block (complete in one event)
delta.reasoning_content (the incremental reasoning text).
Use provider passthrough
For provider-specific features beyond reasoning effort, use provider-keyed extensions. Auriko forwards these to the provider:google, google_ai, googleai, and gemini are interchangeable.
Transform-controlled fields
If you setreasoning_effort, Auriko controls each provider’s thinking budget. Thinking-budget parameters in extensions are overwritten. If you don’t set reasoning_effort, your passthrough values are preserved.
Passthrough fields
Fields that aren’t transform-controlled pass through to the provider unchanged. Examples:- Anthropic:
metadata - OpenAI:
store,metadata - Google Gemini:
safety_settings
Handle sampling constraints
On Anthropic models,temperature, top_p, and top_k are incompatible with active thinking. If you send reasoning_effort alongside these parameters, Auriko drops the incompatible values and returns a warning in routing_metadata.warnings:
| Parameter | Constraint |
|---|---|
temperature | Must be exactly 1, or omitted |
top_p | Must be >= 0.95, or omitted |
top_k | Must be omitted |
Check effort normalization
Some models support only a subset ofreasoning_effort levels. If you request a level above the model’s maximum, Auriko normalizes it to the highest supported value and includes a warning in routing_metadata.warnings:
| Provider | Models affected | xhigh/max normalized to |
|---|---|---|
| OpenAI | GPT-5, GPT-5 mini, o3-pro | high |
| Anthropic | Claude Opus 4.5 | high |
| xAI | Grok 4.3 | high |
| xAI | Grok 3 mini | high |
| Gemini 3.x | high |
xhigh and max without a warning. For the full provider support table, see Check provider support.
Handle max_tokens constraints
Anthropic models that use thinking budgets require max_tokens above 1024. If you send reasoning_effort with max_tokens at or below 1024, Auriko skips thinking and returns a warning in routing_metadata.warnings:
Estimate cost and latency
Thereasoning_effort level (low/medium/high/xhigh/max) determines the thinking budget per provider. Exact token budgets aren’t guaranteed; reasoning_effort="off" disables thinking on supported models. See Check reasoning token availability for which providers report a breakdown.
Check reasoning token availability
Thecompletion_tokens_details.reasoning_tokens field reports how many tokens the model spent on reasoning. Auriko passes through what the upstream provider reports.
| Provider | Model examples | reasoning_tokens reported? | Notes |
|---|---|---|---|
| OpenAI | o1, o3, o4-mini | Yes | Native field |
| DeepSeek | deepseek-v4-flash, deepseek-v4-pro | Yes | Native field |
| xAI | grok-4-fast-reasoning | Yes | Native field |
| Gemini 2.5 Flash | Yes | Derived from provider token counts | |
| Anthropic | All Claude models | No | Reports combined output tokens only |
| Moonshot | kimi-k2-thinking, kimi-k2-thinking-turbo | No | Token breakdown not reported |
| Fireworks | deepseek-v3.2 | No | Token breakdown not reported for hosted models |
completion_tokens_details in the response.
Check for the field before accessing it:
When
completion_tokens_details isn’t available, completion_tokens reflects the combined total of reasoning and content tokens. You can still use it for cost tracking.