Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.auriko.ai/llms.txt

Use this file to discover all available pages before exploring further.

Add reasoning_effort to your request. Auriko translates it into each provider’s native format. Use extensions keyed by provider name to pass through provider-specific parameters like Anthropic’s metadata or Google’s safety_settings.

Prerequisites

  • An Auriko API key
  • Python 3.10+ with the OpenAI SDK (pip install openai) or the auriko SDK (pip install auriko)
    • OR Node.js 18+ with the OpenAI SDK (npm install openai) or @auriko/sdk (npm install @auriko/sdk)
  • A model that supports reasoning (see provider support table)

Enable thinking

Pass reasoning_effort in your request to control extended reasoning:
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

response = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Solve this step by step: what is 23! / 20!?"}],
    extra_body={"reasoning_effort": "high"}
)
print(response.choices[0].message.content)

Check provider support

Auriko translates reasoning_effort for each provider:
ProviderModelsBehavior
AnthropicClaude 4.6 (Opus, Sonnet)Adaptive thinking with effort control
AnthropicClaude 4.5 OpusThinking budget + effort control
AnthropicClaude 4.5 Sonnet/HaikuThinking budget derived from effort level
OpenAIo3, o4-mini, GPT-5Native reasoning_effort (dropped when tools present on GPT-5.4+)
GoogleGemini 3.xThinking level (low/medium/high)
GoogleGemini 2.5 Flash/ProThinking budget derived from effort level
DeepSeekV4 Flash, V4 ProThinking budget derived from effort level
xAIGrok 3 mini, Grok 4.3Native reasoning_effort (low/high on Grok 3 mini, low/medium/high on Grok 4.3)
MiniMaxM2 seriesBuilt-in reasoning; reasoning_effort dropped
MoonshotKimi K2.5, Kimi K2.6Native reasoning_effort
Non-reasoning models (e.g. GPT-4o, GPT-4.1, Llama) reject reasoning_effort with 400 reasoning_not_supported. The one exception is GPT-5.4+ with tools: Auriko drops reasoning_effort to prevent an upstream 400.

Read thinking output

Some providers surface the model’s chain-of-thought in the reasoning_content field on the response message:
response = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Solve step by step: what is 23! / 20!?"}],
    extra_body={"reasoning_effort": "high"}
)

msg = response.choices[0].message
reasoning = getattr(msg, "reasoning_content", None)
if reasoning:
    print(f"Reasoning: {reasoning}")
print(f"Answer: {msg.content}")
Not all reasoning models populate reasoning_content, so check before accessing. OpenAI keeps reasoning internal, and other providers vary by model.

Preserve reasoning across turns

Some providers return reasoning context you echo back for multi-turn continuity. Anthropic and Google use structured reasoning blocks with cryptographic signatures, while DeepSeek uses a plain-text reasoning_content field. Include the relevant fields from the assistant response in your next request to preserve context.

Read structured reasoning

response = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Analyze this problem step by step."}],
    extra_body={"reasoning_effort": "high"}
)

msg = response.choices[0].message
reasoning = getattr(msg, "reasoning", None)
if reasoning:
    for block in reasoning:
        if block.get("type") == "thinking":
            print(f"Thinking: {block['thinking'][:80]}...")
            print(f"Signature: {block['signature'][:20]}...")
        elif block.get("type") == "redacted":
            print("Redacted block (encrypted)")
Each block has a type:
  • thinking: contains thinking (the reasoning text) and signature (cryptographic signature)
  • redacted: contains data (encrypted, opaque to the client)

Round-trip reasoning

To continue a multi-turn conversation with reasoning context, include the full assistant message (with reasoning) in your next request:
messages = [
    {"role": "user", "content": "What are the trade-offs of microservices vs monoliths?"},
]

first = client.chat.completions.create(
    model="claude-sonnet-4-6", messages=messages,
    extra_body={"reasoning_effort": "high"}
)

assistant_msg = first.choices[0].message
messages.append(assistant_msg.model_dump(exclude_none=True))
messages.append({"role": "user", "content": "Now apply that analysis to a 5-person startup."})

second = client.chat.completions.create(
    model="claude-sonnet-4-6", messages=messages,
    extra_body={"reasoning_effort": "high"}
)

DeepSeek reasoning content

DeepSeek models return reasoning as a plain reasoning_content string instead of structured reasoning blocks. For multi-turn conversations with DeepSeek, include reasoning_content on assistant messages you send back. To preserve it, serialize the full message object:
first = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[{"role": "user", "content": "Explain quantum entanglement step by step."}],
    extra_body={"reasoning_effort": "high"}
)

msg = first.choices[0].message
messages = [
    {"role": "user", "content": "Explain quantum entanglement step by step."},
    msg.model_dump(exclude_none=True),  # preserves reasoning_content
    {"role": "user", "content": "Now explain it to a five-year-old."},
]

second = client.chat.completions.create(
    model="deepseek-v4-flash", messages=messages,
    extra_body={"reasoning_effort": "high"}
)
If you construct assistant messages manually and omit reasoning_content, Auriko sets it to an empty string. Echo back the original value from the response.

Stream reasoning fields

When streaming with extended thinking, two additional delta fields carry reasoning block data:
  • delta.reasoning_signature: cryptographic signature for the current thinking block
  • delta.reasoning_redacted_data: encrypted data for a redacted thinking block (complete in one event)
These appear alongside delta.reasoning_content (the incremental reasoning text).

Use provider passthrough

For provider-specific features beyond reasoning effort, use provider-keyed extensions. Auriko forwards these to the provider:
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

response = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Hello!"}],
    extra_body={
        "reasoning_effort": "high",
        "extensions": {
            "anthropic": {
                "metadata": {"user_id": "user-123"}
            }
        }
    }
)
Auriko normalizes provider aliases. google, google_ai, googleai, and gemini are interchangeable.

Transform-controlled fields

If you set reasoning_effort, Auriko controls each provider’s thinking budget. Thinking-budget parameters in extensions are overwritten. If you don’t set reasoning_effort, your passthrough values are preserved.

Passthrough fields

Fields that aren’t transform-controlled pass through to the provider unchanged. Examples:
  • Anthropic: metadata
  • OpenAI: store, metadata
  • Google Gemini: safety_settings

Handle sampling constraints

On Anthropic models, temperature, top_p, and top_k are incompatible with active thinking. If you send reasoning_effort alongside these parameters, Auriko drops the incompatible values and returns a warning in routing_metadata.warnings:
{
  "type": "unsupported_parameter",
  "code": "temperature",
  "message": "temperature dropped — incompatible with thinking on anthropic (must be exactly 1 or unset)"
}
Anthropic’s constraints when thinking is active:
ParameterConstraint
temperatureMust be exactly 1, or omitted
top_pMust be >= 0.95, or omitted
top_kMust be omitted
Values within these bounds pass through unchanged. Other providers don’t enforce these constraints.

Check effort normalization

Some models support only a subset of reasoning_effort levels. If you request a level above the model’s maximum, Auriko normalizes it to the highest supported value and includes a warning in routing_metadata.warnings:
{
  "type": "unsupported_parameter",
  "code": "reasoning_effort",
  "message": "reasoning_effort adjusted to 'high' — exceeds model maximum on openai"
}
ProviderModels affectedxhigh/max normalized to
OpenAIGPT-5, GPT-5 mini, o3-prohigh
AnthropicClaude Opus 4.5high
xAIGrok 4.3high
xAIGrok 3 minihigh
GoogleGemini 3.xhigh
Models not listed above accept xhigh and max without a warning. For the full provider support table, see Check provider support.

Handle max_tokens constraints

Anthropic models that use thinking budgets require max_tokens above 1024. If you send reasoning_effort with max_tokens at or below 1024, Auriko skips thinking and returns a warning in routing_metadata.warnings:
{
  "type": "unsupported_parameter",
  "code": "reasoning_effort",
  "message": "reasoning_effort dropped — max_tokens (200) is below the 1025 minimum required for thinking on anthropic"
}
Claude 4.6+ models use adaptive thinking rather than thinking budgets. For the full model list, see Check provider support.

Estimate cost and latency

The reasoning_effort level (low/medium/high/xhigh/max) determines the thinking budget per provider. Exact token budgets aren’t guaranteed; reasoning_effort="off" disables thinking on supported models. See Check reasoning token availability for which providers report a breakdown.

Check reasoning token availability

The completion_tokens_details.reasoning_tokens field reports how many tokens the model spent on reasoning. Auriko passes through what the upstream provider reports.
ProviderModel examplesreasoning_tokens reported?Notes
OpenAIo1, o3, o4-miniYesNative field
DeepSeekdeepseek-v4-flash, deepseek-v4-proYesNative field
xAIgrok-4-fast-reasoningYesNative field
GoogleGemini 2.5 FlashYesDerived from provider token counts
AnthropicAll Claude modelsNoReports combined output tokens only
Moonshotkimi-k2-thinking, kimi-k2-thinking-turboNoToken breakdown not reported
Fireworksdeepseek-v3.2NoToken breakdown not reported for hosted models
When the provider doesn’t report a reasoning token breakdown, Auriko doesn’t include completion_tokens_details in the response. Check for the field before accessing it:
if response.usage.completion_tokens_details:
    print(f"Reasoning: {response.usage.completion_tokens_details.reasoning_tokens}")
When completion_tokens_details isn’t available, completion_tokens reflects the combined total of reasoning and content tokens. You can still use it for cost tracking.