Extensions and Thinking

Access provider-specific features like thinking tokens through a normalized interface. Auriko translates a single extensions.thinking configuration into provider-native formats automatically.

Prerequisites

An Auriko API key
Python 3.10+ with auriko SDK installed (pip install auriko)
- OR Node.js 18+ with @auriko/sdk installed (npm install @auriko/sdk)
A model that supports reasoning (Claude 3.5+, o1, o3, o4-mini, DeepSeek R1, Gemini 2.0 Flash Thinking)

Enable thinking

Pass extensions.thinking in your request to enable extended reasoning:

import os
from auriko import Client

client = Client(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

response = client.chat.completions.create(
    model="claude-sonnet-4-20250514",
    messages=[{"role": "user", "content": "Solve this step by step: what is 23! / 20!?"}],
    extensions={"thinking": {"enabled": True, "budget_tokens": 10000}}
)
print(response.choices[0].message.content)

Check provider support

Auriko translates extensions.thinking into provider-native formats:

Provider	Models	Translation
Anthropic	Claude 3.5+, Claude 4	`thinking: {type: "enabled", budget_tokens: <value>}` — budget passed directly
OpenAI	o1, o3, o4-mini	`reasoning_effort: "low" / "medium" / "high"` — mapped from budget_tokens thresholds
DeepSeek	R1	`thinking: {enabled: true, max_tokens: <value>}` — budget passed directly
Google AI Studio	Gemini 2.0 Flash Thinking	`thinking_config: {thinking_budget: <value>}` — budget passed directly
Other providers	Varies	OpenAI-compatible `reasoning_effort` format (default translator)

OpenAI budget mapping

Since OpenAI uses discrete reasoning_effort levels instead of a token budget, Auriko maps budget_tokens to the appropriate level:

Budget tokens	Reasoning effort
< 5,000	`low`
5,000 – 14,999	`medium`
>= 15,000	`high`

If budget_tokens is omitted, the default is 8,000 (maps to medium).

Read thinking output

When a model supports reasoning, the thinking output appears in the reasoning_content field on the response message:

response = client.chat.completions.create(
    model="claude-sonnet-4-20250514",
    messages=[{"role": "user", "content": "Solve step by step: what is 23! / 20!?"}],
    extensions={"thinking": {"enabled": True, "budget_tokens": 10000}}
)

# Access the reasoning (if the model returns it)
if response.choices[0].message.reasoning_content:
    print(f"Reasoning: {response.choices[0].message.reasoning_content}")
print(f"Answer: {response.choices[0].message.content}")

Providers with `reasoning_content`

Provider	`reasoning_content` populated?	Notes
Anthropic	Yes	Extracted from thinking block
DeepSeek	Yes	Extracted from thinking content
Google	Yes	Extracted from thinking_config response
Fireworks AI	Yes	Extracted from `<think>` tags in content (Qwen3 models)
OpenAI	No	Reasoning is internal; not exposed in response

Fireworks AI Qwen3 models populate reasoning_content by default, without extensions.thinking.

Use provider passthrough

For provider-specific features beyond thinking, use provider-keyed extensions. Auriko forwards these as-is after security sanitization:

import os
from auriko import Client

client = Client(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

response = client.chat.completions.create(
    model="claude-sonnet-4-20250514",
    messages=[{"role": "user", "content": "Hello!"}],
    extensions={
        "thinking": {"enabled": True, "budget_tokens": 10000},
        "anthropic": {
            "custom_metadata": {"session_id": "abc123"}
        }
    }
)

Auriko normalizes provider aliases automatically. The aliases google, google_ai, googleai, and gemini all map to google_ai_studio.

Precedence

When both normalized features and provider passthrough contain the same field, the provider passthrough wins. For example, if you set extensions.thinking.budget_tokens: 10000 and extensions.anthropic.thinking.budget_tokens: 15000, Anthropic receives 15000.

Security filtering

Auriko blocks authentication-related keys (api_key, authorization, token, etc.) at all nesting levels in passthrough extensions. Auriko also blocks core request fields (model, messages, temperature, etc.) at the top level to prevent routing bypass.

Cost and latency

Thinking tokens count toward output tokens and increase both cost and latency. Use budget_tokens to cap the reasoning budget for your use case. For cost-sensitive workloads, see Cost optimization. See Check reasoning token availability for which providers report a breakdown.

Check reasoning token availability

The completion_tokens_details.reasoning_tokens field reports how many tokens the model spent on reasoning. Auriko passes through what the upstream provider reports.

Provider	Model examples	`reasoning_tokens` reported?	Notes
OpenAI	o1, o3, o4-mini	Yes	Native field
DeepSeek	deepseek-v3.2-thinking	Yes	Native field (routed to DeepSeek API)
xAI	grok-4-fast-reasoning	Yes	Native field
Google	Gemini 2.5 Flash	Yes	Mapped from `thoughtsTokenCount`
Anthropic	All Claude models	No	Upstream returns combined `output_tokens` only
Moonshot	kimi-k2-thinking, kimi-k2-thinking-turbo	No	Upstream doesn’t include token details
Fireworks	deepseek-v3.2	No	Upstream doesn’t include token details for hosted models

When the provider doesn’t report a reasoning token breakdown, Auriko doesn’t include completion_tokens_details in the response. Check for the field before accessing it:

if response.usage.completion_tokens_details:
    print(f"Reasoning: {response.usage.completion_tokens_details.reasoning_tokens}")

When completion_tokens_details isn’t available, completion_tokens reflects the combined total of reasoning and content tokens. You can still use it for cost tracking.

Get Started

Guides

Frameworks

Extensions and Thinking

Prerequisites

Enable thinking

Check provider support

OpenAI budget mapping

Read thinking output

Providers with `reasoning_content`

Use provider passthrough

Precedence

Security filtering

Cost and latency

Check reasoning token availability

Get Started

Guides

Frameworks

​Prerequisites

​Enable thinking

​Check provider support

​OpenAI budget mapping

​Read thinking output

​Providers with reasoning_content

​Use provider passthrough

​Precedence

​Security filtering

​Cost and latency

​Check reasoning token availability

Prerequisites

Enable thinking

Check provider support

OpenAI budget mapping

Read thinking output

Providers with `reasoning_content`

Use provider passthrough

Precedence

Security filtering

Cost and latency

Check reasoning token availability