Skip to main content
POST
/
v1
/
responses
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)
response = client.responses.create(
    model="gpt-4o",
    input="Hello!"
)
print(response.output_text)
{
  "id": "<string>",
  "object": "<string>",
  "created_at": 123,
  "model": "<string>",
  "output": [
    {
      "type": "<string>",
      "id": "<string>",
      "role": "<string>",
      "content": [
        {
          "type": "<string>",
          "text": "<string>",
          "annotations": [
            "<unknown>"
          ],
          "logprobs": [
            "<unknown>"
          ]
        }
      ],
      "phase": "<string>"
    }
  ],
  "output_text": "<string>",
  "parallel_tool_calls": true,
  "tool_choice": "<unknown>",
  "tools": [
    "<unknown>"
  ],
  "usage": {
    "input_tokens": 123,
    "output_tokens": 123,
    "total_tokens": 123,
    "input_tokens_details": {
      "cached_tokens": 123
    },
    "output_tokens_details": {
      "reasoning_tokens": 123,
      "image_tokens": 123
    }
  },
  "error": {
    "code": "<string>",
    "message": "<string>"
  },
  "incomplete_details": {
    "reason": "<string>"
  },
  "metadata": {},
  "routing_metadata": {
    "provider": "<string>",
    "provider_model_id": "<string>",
    "model_canonical": "<string>",
    "routing_strategy": "<string>",
    "ttft_ms": 123,
    "throughput_tps": 123,
    "cost": {
      "usd": 123,
      "cache_savings_percent": 123,
      "cache_savings_usd": 123
    },
    "warnings": [
      {
        "code": "<string>",
        "message": "<string>"
      }
    ]
  },
  "temperature": 123,
  "top_p": 123,
  "max_output_tokens": 123,
  "frequency_penalty": 123,
  "presence_penalty": 123,
  "top_logprobs": 123,
  "instructions": "<string>",
  "truncation": "<string>",
  "reasoning": {},
  "text": {},
  "user": "<string>",
  "prompt_cache_key": "<string>",
  "safety_identifier": "<string>",
  "max_tool_calls": 123,
  "store": true,
  "previous_response_id": "<string>",
  "background": true,
  "completed_at": 123,
  "service_tier": "<string>",
  "tool_usage": {}
}
Auriko routes the request to the optimal provider based on your routing preferences (cost, latency, throughput, etc.). For guide-level documentation, see the Response API overview.
The Response API is in preview. The interface may change before GA.

Auriko extensions

Auriko adds these capabilities to the standard Response API:
  • Multi-model routing: Use gateway.models[] instead of model to route across multiple models
  • Routing options: Control provider selection with the gateway.routing object
  • Provider extensions: Pass provider-specific parameters with extensions
  • Cost transparency: Response includes routing_metadata with cost breakdown

Authorizations

Authorization
string
header
required

API key authentication. Keys start with ak_ prefix. Example: Authorization: Bearer ak_live_xxxxxxxxxxxx

Body

application/json

Request body for creating a response via the Response API.

model
string
required

Model ID to use (e.g., "gpt-4o", "claude-sonnet-4-20250514").

input
required

The input to generate a response for.

instructions
string

System instructions for the model.

tools
object[]

Tools available to the model.

A tool available to the model.

tool_choice

How the model should use tools.

Available options:
auto,
none,
required
parallel_tool_calls
boolean

Whether the model can make multiple tool calls in parallel.

max_output_tokens
integer

Maximum number of output tokens.

temperature
number

Sampling temperature.

Required range: 0 <= x <= 2
top_p
number

Nucleus sampling parameter.

Required range: 0 <= x <= 1
top_k
integer

Top-k sampling parameter.

top_logprobs
integer

Number of top logprobs to return per token position. Requires provider logprobs support.

Required range: 0 <= x <= 20
stream
boolean

Whether to stream the response.

text
object

Text generation configuration.

reasoning
object

Reasoning/thinking configuration.

truncation
string

Truncation strategy for long inputs.

metadata
object

Arbitrary key-value metadata.

include
string[]

Additional data to include in the response.

user
string

End-user identifier for abuse detection.

store
enum<boolean>

Omit this field or set to false. Sending true returns 400.

Available options:
false
gateway
object

Auriko gateway directives.

extensions
object

Auriko extensions for provider-specific passthrough.

For reasoning control, use the top-level reasoning_effort parameter instead of extensions.

Provider Passthrough

Pass provider-specific parameters directly:

  • anthropic: Anthropic-specific parameters
  • openai: OpenAI-specific parameters
  • google: Google/Gemini-specific parameters
  • deepseek: DeepSeek-specific parameters

Passthrough parameters are forwarded as-is to the target provider.

prompt_cache_key
string

Key for prompt caching.

safety_identifier
string

Safety policy identifier.

frequency_penalty
number

Penalizes new tokens based on their frequency in the text so far.

presence_penalty
number

Penalizes new tokens based on whether they appear in the text so far.

max_tool_calls
integer

Maximum number of built-in tool calls (e.g., web_search, code_interpreter) allowed in a response.

Required range: x >= 1

Response

Successful response.

For non-streaming requests, returns a ResponseObject. For streaming (stream: true), returns Server-Sent Events in the Response API format: event: <type>\ndata: <json>\n\n.

A completed Response API response.

id
string
required

Unique response identifier.

object
string
required
Allowed value: "response"
created_at
integer
required

Unix timestamp of creation.

model
string
required

Model used for generation.

status
enum<string>
required

Response status.

Available options:
completed,
failed,
incomplete,
in_progress
output
object[]
required

Output items generated by the model. Known types include message, function_call, and reasoning. Additional types from the provider (e.g., web_search_call, file_search_call) are passed through verbatim.

An output item from the model. Discriminated on type. Known types: message, function_call, reasoning, image_generation_call. Unknown types from the provider are preserved verbatim.

output_text
string

Concatenated text output for convenience.

parallel_tool_calls
boolean

Whether parallel tool calls were enabled.

tool_choice
any

Tool choice setting used.

tools
any[]

Tools that were available.

usage
object

Token usage for a Response API request.

error
object | null

Error details if status is "failed".

incomplete_details
object | null

Details if status is "incomplete".

metadata
object | null
routing_metadata
object

Routing decision metadata included in successful responses. 8 STABLE fields (4 required + 4 optional) in the current public contract.

temperature
number

Sampling temperature used.

top_p
number

Nucleus sampling parameter used.

max_output_tokens
integer | null

Maximum output tokens setting.

frequency_penalty
number

Frequency penalty setting.

presence_penalty
number

Presence penalty setting.

top_logprobs
integer

Number of top logprobs returned.

instructions
string | null

System instructions used.

truncation
string

Truncation strategy used.

reasoning
object | null

Reasoning configuration used.

text
object

Text generation configuration used.

user
string | null

End-user identifier.

prompt_cache_key
string | null

Prompt cache key used.

safety_identifier
string | null

Safety policy identifier used.

max_tool_calls
integer | null

Maximum number of built-in tool calls allowed in a response.

store
boolean

Whether the response is stored.

previous_response_id
string | null

Previous response ID for conversation continuity.

background
boolean

Whether this was a background response.

completed_at
integer | null

Unix timestamp when the response completed.

service_tier
string | null

Service tier used by the provider.

tool_usage
object | null

Built-in tool consumption metrics from the provider (e.g. image generation tokens, web search request counts). Present on OpenAI responses; absent for other providers.