Access provider-specific features like thinking tokens through a normalized interface. Auriko translates a single extensions.thinking configuration into provider-native formats automatically.
Prerequisites
- An Auriko API key
- Python 3.10+ with
auriko SDK installed (pip install auriko)
- OR Node.js 18+ with
@auriko/sdk installed (npm install @auriko/sdk)
- A model that supports reasoning (Claude 3.5+, o1, o3, o4-mini, DeepSeek R1, Gemini 2.0 Flash Thinking)
Enable thinking
Pass extensions.thinking in your request to enable extended reasoning:
import os
from auriko import Client
client = Client(
api_key=os.environ["AURIKO_API_KEY"],
base_url="https://api.auriko.ai/v1"
)
response = client.chat.completions.create(
model="claude-sonnet-4-20250514",
messages=[{"role": "user", "content": "Solve this step by step: what is 23! / 20!?"}],
extensions={"thinking": {"enabled": True, "budget_tokens": 10000}}
)
print(response.choices[0].message.content)
Check provider support
Auriko translates extensions.thinking into provider-native formats:
| Provider | Models | Translation |
|---|
| Anthropic | Claude 3.5+, Claude 4 | thinking: {type: "enabled", budget_tokens: <value>} — budget passed directly |
| OpenAI | o1, o3, o4-mini | reasoning_effort: "low" / "medium" / "high" — mapped from budget_tokens thresholds |
| DeepSeek | R1 | thinking: {enabled: true, max_tokens: <value>} — budget passed directly |
| Google AI Studio | Gemini 2.0 Flash Thinking | thinking_config: {thinking_budget: <value>} — budget passed directly |
| Other providers | Varies | OpenAI-compatible reasoning_effort format (default translator) |
OpenAI budget mapping
Since OpenAI uses discrete reasoning_effort levels instead of a token budget, Auriko maps budget_tokens to the appropriate level:
| Budget tokens | Reasoning effort |
|---|
| < 5,000 | low |
| 5,000 – 14,999 | medium |
| >= 15,000 | high |
If budget_tokens is omitted, the default is 8,000 (maps to medium).
Read thinking output
When a model supports reasoning, the thinking output appears in the reasoning_content field on the response message:
response = client.chat.completions.create(
model="claude-sonnet-4-20250514",
messages=[{"role": "user", "content": "Solve step by step: what is 23! / 20!?"}],
extensions={"thinking": {"enabled": True, "budget_tokens": 10000}}
)
# Access the reasoning (if the model returns it)
if response.choices[0].message.reasoning_content:
print(f"Reasoning: {response.choices[0].message.reasoning_content}")
print(f"Answer: {response.choices[0].message.content}")
Providers with reasoning_content
| Provider | reasoning_content populated? | Notes |
|---|
| Anthropic | Yes | Extracted from thinking block |
| DeepSeek | Yes | Extracted from thinking content |
| Google | Yes | Extracted from thinking_config response |
| Fireworks AI | Yes | Extracted from <think> tags in content (Qwen3 models) |
| OpenAI | No | Reasoning is internal; not exposed in response |
Fireworks AI Qwen3 models populate reasoning_content by default, without extensions.thinking.
Use provider passthrough
For provider-specific features beyond thinking, use provider-keyed extensions. Auriko forwards these as-is after security sanitization:
import os
from auriko import Client
client = Client(
api_key=os.environ["AURIKO_API_KEY"],
base_url="https://api.auriko.ai/v1"
)
response = client.chat.completions.create(
model="claude-sonnet-4-20250514",
messages=[{"role": "user", "content": "Hello!"}],
extensions={
"thinking": {"enabled": True, "budget_tokens": 10000},
"anthropic": {
"custom_metadata": {"session_id": "abc123"}
}
}
)
Auriko normalizes provider aliases automatically. The aliases google, google_ai, googleai, and gemini all map to google_ai_studio.
Precedence
When both normalized features and provider passthrough contain the same field, the provider passthrough wins. For example, if you set extensions.thinking.budget_tokens: 10000 and extensions.anthropic.thinking.budget_tokens: 15000, Anthropic receives 15000.
Security filtering
Auriko blocks authentication-related keys (api_key, authorization, token, etc.) at all nesting levels in passthrough extensions. Auriko also blocks core request fields (model, messages, temperature, etc.) at the top level to prevent routing bypass.
Cost and latency
Thinking tokens count toward output tokens and increase both cost and latency. Use budget_tokens to cap the reasoning budget for your use case. For cost-sensitive workloads, see Cost optimization. See Check reasoning token availability for which providers report a breakdown.
Check reasoning token availability
The completion_tokens_details.reasoning_tokens field reports how many tokens the model spent on reasoning. Auriko passes through what the upstream provider reports.
| Provider | Model examples | reasoning_tokens reported? | Notes |
|---|
| OpenAI | o1, o3, o4-mini | Yes | Native field |
| DeepSeek | deepseek-v3.2-thinking | Yes | Native field (routed to DeepSeek API) |
| xAI | grok-4-fast-reasoning | Yes | Native field |
| Google | Gemini 2.5 Flash | Yes | Mapped from thoughtsTokenCount |
| Anthropic | All Claude models | No | Upstream returns combined output_tokens only |
| Moonshot | kimi-k2-thinking, kimi-k2-thinking-turbo | No | Upstream doesn’t include token details |
| Fireworks | deepseek-v3.2 | No | Upstream doesn’t include token details for hosted models |
When the provider doesn’t report a reasoning token breakdown, Auriko doesn’t include completion_tokens_details in the response.
Check for the field before accessing it:
if response.usage.completion_tokens_details:
print(f"Reasoning: {response.usage.completion_tokens_details.reasoning_tokens}")
When completion_tokens_details isn’t available, completion_tokens reflects the combined total of reasoning and content tokens. You can still use it for cost tracking.