Skip to main content
POST
/
v1
/
responses
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)
response = client.responses.create(
    model="gpt-4o",
    input="Hello!"
)
print(response.output_text)
{
  "id": "<string>",
  "object": "<unknown>",
  "created_at": 123,
  "model": "<string>",
  "output": [
    {
      "type": "<unknown>",
      "id": "<string>",
      "role": "<unknown>",
      "content": [
        {
          "type": "<unknown>",
          "text": "<string>",
          "annotations": [
            "<unknown>"
          ],
          "logprobs": [
            "<unknown>"
          ]
        }
      ]
    }
  ],
  "output_text": "<string>",
  "parallel_tool_calls": true,
  "tool_choice": "<unknown>",
  "tools": [
    "<unknown>"
  ],
  "usage": {
    "input_tokens": 123,
    "output_tokens": 123,
    "total_tokens": 123,
    "input_tokens_details": {
      "cached_tokens": 123
    },
    "output_tokens_details": {
      "reasoning_tokens": 123
    }
  },
  "error": {
    "code": "<string>",
    "message": "<string>"
  },
  "incomplete_details": {
    "reason": "<string>"
  },
  "metadata": {},
  "routing_metadata": {
    "provider": "<string>",
    "provider_model_id": "<string>",
    "model_canonical": "<string>",
    "routing_strategy": "<string>",
    "ttft_ms": 123,
    "throughput_tps": 123,
    "cost": {
      "usd": 123,
      "cache_savings_percent": 123,
      "cache_savings_usd": 123
    },
    "warnings": [
      {
        "code": "<string>",
        "message": "<string>"
      }
    ]
  }
}

Documentation Index

Fetch the complete documentation index at: https://docs.auriko.ai/llms.txt

Use this file to discover all available pages before exploring further.

Auriko routes the request to the optimal provider based on your routing preferences (cost, latency, throughput, etc.).
The Response API is in preview. The interface may change before GA.

Auriko extensions

Auriko adds these capabilities to the standard Response API:
  • Multi-model routing: Use gateway.models[] instead of model to route across multiple models
  • Routing options: Control provider selection with the gateway.routing object
  • Provider extensions: Pass provider-specific parameters with extensions
  • Cost transparency: Response includes routing_metadata with cost breakdown

Authorizations

Authorization
string
header
required

API key authentication. Keys start with ak_ prefix. Example: Authorization: Bearer ak_live_xxxxxxxxxxxx

Body

application/json

Request body for creating a response via the Response API.

model
string
required

Model ID to use (e.g., "gpt-4o", "claude-sonnet-4-20250514").

input
required

The input to generate a response for.

instructions
string

System instructions for the model.

tools
object[]

Tools available to the model.

A tool available to the model.

tool_choice

How the model should use tools.

Available options:
auto,
none,
required
parallel_tool_calls
boolean

Whether the model can make multiple tool calls in parallel.

max_output_tokens
integer

Maximum number of output tokens.

temperature
number

Sampling temperature.

Required range: 0 <= x <= 2
top_p
number

Nucleus sampling parameter.

Required range: 0 <= x <= 1
top_k
integer

Top-k sampling parameter.

top_logprobs
integer

Number of top logprobs to return per token position. Requires provider logprobs support.

Required range: 0 <= x <= 20
stream
boolean

Whether to stream the response.

text
object

Text generation configuration.

reasoning
object

Reasoning/thinking configuration.

truncation
string

Truncation strategy for long inputs.

metadata
object

Arbitrary key-value metadata.

include
string[]

Additional data to include in the response.

user
string

End-user identifier for abuse detection.

store
enum<boolean>

Omit this field or set to false. Sending true returns 400.

Available options:
false
gateway
object

Auriko gateway directives.

extensions
object

Auriko extensions for provider-specific passthrough.

For reasoning control, use the top-level reasoning_effort parameter instead of extensions.

Provider Passthrough

Pass provider-specific parameters directly:

  • anthropic: Anthropic-specific parameters
  • openai: OpenAI-specific parameters
  • google: Google/Gemini-specific parameters
  • deepseek: DeepSeek-specific parameters

Passthrough parameters are forwarded as-is to the target provider.

prompt_cache_key
string

Key for prompt caching.

safety_identifier
string

Safety policy identifier.

Response

Successful response.

For non-streaming requests, returns a ResponseObject. For streaming (stream: true), returns Server-Sent Events in the Response API format: event: <type>\ndata: <json>\n\n.

A completed Response API response.

id
string
required

Unique response identifier.

object
any
required
created_at
integer
required

Unix timestamp of creation.

model
string
required

Model used for generation.

status
enum<string>
required

Response status.

Available options:
completed,
failed,
incomplete,
in_progress
output
object[]
required

Output items generated by the model. Known types include message, function_call, and reasoning. Additional types from the provider (e.g., web_search_call, file_search_call) are passed through verbatim.

An output item from the model. Discriminated on type. Known types: message, function_call, reasoning. Unknown types from the provider are preserved verbatim.

output_text
string

Concatenated text output for convenience.

parallel_tool_calls
boolean

Whether parallel tool calls were enabled.

tool_choice
any

Tool choice setting used.

tools
any[]

Tools that were available.

usage
object

Token usage for a Response API request.

error
object

Error details if status is "failed".

incomplete_details
object

Details if status is "incomplete".

metadata
object
routing_metadata
object

Routing decision metadata included in successful responses. 10 STABLE fields (4 required + 6 optional) in the current public contract.