Skip to main content
Access provider-specific features like thinking tokens through a normalized interface. Auriko translates a single extensions.thinking configuration into provider-native formats automatically.

Prerequisites

  • An Auriko API key
  • Python 3.10+ with auriko SDK installed (pip install auriko)
    • OR Node.js 18+ with @auriko/sdk installed (npm install @auriko/sdk)
  • A model that supports reasoning (Claude 3.5+, o1, o3, o4-mini, DeepSeek R1, Gemini 2.0 Flash Thinking)

Enable thinking

Pass extensions.thinking in your request to enable extended reasoning:
import os
from auriko import Client

client = Client(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

response = client.chat.completions.create(
    model="claude-sonnet-4-20250514",
    messages=[{"role": "user", "content": "Solve this step by step: what is 23! / 20!?"}],
    extensions={"thinking": {"enabled": True, "budget_tokens": 10000}}
)
print(response.choices[0].message.content)

Check provider support

Auriko translates extensions.thinking into provider-native formats:
ProviderModelsTranslation
AnthropicClaude 3.5+, Claude 4thinking: {type: "enabled", budget_tokens: <value>} — budget passed directly
OpenAIo1, o3, o4-minireasoning_effort: "low" / "medium" / "high" — mapped from budget_tokens thresholds
DeepSeekR1thinking: {enabled: true, max_tokens: <value>} — budget passed directly
Google AI StudioGemini 2.0 Flash Thinkingthinking_config: {thinking_budget: <value>} — budget passed directly
Other providersVariesOpenAI-compatible reasoning_effort format (default translator)

OpenAI budget mapping

Since OpenAI uses discrete reasoning_effort levels instead of a token budget, Auriko maps budget_tokens to the appropriate level:
Budget tokensReasoning effort
< 5,000low
5,000 – 14,999medium
>= 15,000high
If budget_tokens is omitted, the default is 8,000 (maps to medium).

Read thinking output

When a model supports reasoning, the thinking output appears in the reasoning_content field on the response message:
response = client.chat.completions.create(
    model="claude-sonnet-4-20250514",
    messages=[{"role": "user", "content": "Solve step by step: what is 23! / 20!?"}],
    extensions={"thinking": {"enabled": True, "budget_tokens": 10000}}
)

# Access the reasoning (if the model returns it)
if response.choices[0].message.reasoning_content:
    print(f"Reasoning: {response.choices[0].message.reasoning_content}")
print(f"Answer: {response.choices[0].message.content}")

Providers with reasoning_content

Providerreasoning_content populated?Notes
AnthropicYesExtracted from thinking block
DeepSeekYesExtracted from thinking content
GoogleYesExtracted from thinking_config response
Fireworks AIYesExtracted from <think> tags in content (Qwen3 models)
OpenAINoReasoning is internal; not exposed in response
Fireworks AI Qwen3 models populate reasoning_content by default, without extensions.thinking.

Use provider passthrough

For provider-specific features beyond thinking, use provider-keyed extensions. Auriko forwards these as-is after security sanitization:
import os
from auriko import Client

client = Client(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

response = client.chat.completions.create(
    model="claude-sonnet-4-20250514",
    messages=[{"role": "user", "content": "Hello!"}],
    extensions={
        "thinking": {"enabled": True, "budget_tokens": 10000},
        "anthropic": {
            "custom_metadata": {"session_id": "abc123"}
        }
    }
)
Auriko normalizes provider aliases automatically. The aliases google, google_ai, googleai, and gemini all map to google_ai_studio.

Precedence

When both normalized features and provider passthrough contain the same field, the provider passthrough wins. For example, if you set extensions.thinking.budget_tokens: 10000 and extensions.anthropic.thinking.budget_tokens: 15000, Anthropic receives 15000.

Security filtering

Auriko blocks authentication-related keys (api_key, authorization, token, etc.) at all nesting levels in passthrough extensions. Auriko also blocks core request fields (model, messages, temperature, etc.) at the top level to prevent routing bypass.

Cost and latency

Thinking tokens count toward output tokens and increase both cost and latency. Use budget_tokens to cap the reasoning budget for your use case. For cost-sensitive workloads, see Cost optimization. See Check reasoning token availability for which providers report a breakdown.

Check reasoning token availability

The completion_tokens_details.reasoning_tokens field reports how many tokens the model spent on reasoning. Auriko passes through what the upstream provider reports.
ProviderModel examplesreasoning_tokens reported?Notes
OpenAIo1, o3, o4-miniYesNative field
DeepSeekdeepseek-v3.2-thinkingYesNative field (routed to DeepSeek API)
xAIgrok-4-fast-reasoningYesNative field
GoogleGemini 2.5 FlashYesMapped from thoughtsTokenCount
AnthropicAll Claude modelsNoUpstream returns combined output_tokens only
Moonshotkimi-k2-thinking, kimi-k2-thinking-turboNoUpstream doesn’t include token details
Fireworksdeepseek-v3.2NoUpstream doesn’t include token details for hosted models
When the provider doesn’t report a reasoning token breakdown, Auriko doesn’t include completion_tokens_details in the response. Check for the field before accessing it:
if response.usage.completion_tokens_details:
    print(f"Reasoning: {response.usage.completion_tokens_details.reasoning_tokens}")
When completion_tokens_details isn’t available, completion_tokens reflects the combined total of reasoning and content tokens. You can still use it for cost tracking.