Create response
Create a model response using the OpenAI Response API format
Auriko extensions
Auriko adds these capabilities to the standard Response API:- Multi-model routing: Use
gateway.models[]instead ofmodelto route across multiple models - Routing options: Control provider selection with the
gateway.routingobject - Provider extensions: Pass provider-specific parameters with
extensions - Cost transparency: Response includes
routing_metadatawith cost breakdown
Authorizations
API key authentication.
Keys start with ak_ prefix.
Example: Authorization: Bearer ak_live_xxxxxxxxxxxx
Body
- Option 1
- Option 2
Request body for creating a response via the Response API.
Model ID to use (e.g., "gpt-4o", "claude-sonnet-4-20250514").
The input to generate a response for.
System instructions for the model.
Tools available to the model.
A tool available to the model.
- Option 1
- Option 2
How the model should use tools.
auto, none, required Whether the model can make multiple tool calls in parallel.
Maximum number of output tokens.
Sampling temperature.
0 <= x <= 2Nucleus sampling parameter.
0 <= x <= 1Top-k sampling parameter.
Number of top logprobs to return per token position. Requires provider logprobs support.
0 <= x <= 20Whether to stream the response.
Text generation configuration.
Reasoning/thinking configuration.
Truncation strategy for long inputs.
Arbitrary key-value metadata.
Additional data to include in the response.
End-user identifier for abuse detection.
Omit this field or set to false. Sending true returns 400.
false Auriko gateway directives.
Auriko extensions for provider-specific passthrough.
For reasoning control, use the top-level reasoning_effort parameter
instead of extensions.
Provider Passthrough
Pass provider-specific parameters directly:
anthropic: Anthropic-specific parametersopenai: OpenAI-specific parametersgoogle: Google/Gemini-specific parametersdeepseek: DeepSeek-specific parameters
Passthrough parameters are forwarded as-is to the target provider.
Key for prompt caching.
Safety policy identifier.
Penalizes new tokens based on their frequency in the text so far.
Penalizes new tokens based on whether they appear in the text so far.
Maximum number of built-in tool calls (e.g., web_search, code_interpreter) allowed in a response.
x >= 1Response
Successful response.
For non-streaming requests, returns a ResponseObject.
For streaming (stream: true), returns Server-Sent Events in
the Response API format: event: <type>\ndata: <json>\n\n.
A completed Response API response.
Unique response identifier.
"response"Unix timestamp of creation.
Model used for generation.
Response status.
completed, failed, incomplete, in_progress Output items generated by the model. Known types include
message, function_call, and reasoning. Additional types
from the provider (e.g., web_search_call, file_search_call)
are passed through verbatim.
An output item from the model. Discriminated on type.
Known types: message, function_call, reasoning, image_generation_call.
Unknown types from the provider are preserved verbatim.
- Option 1
- Option 2
- Option 3
- Option 4
- Option 5
Concatenated text output for convenience.
Whether parallel tool calls were enabled.
Tool choice setting used.
Tools that were available.
Token usage for a Response API request.
Error details if status is "failed".
Details if status is "incomplete".
Routing decision metadata included in successful responses. 8 STABLE fields (4 required + 4 optional) in the current public contract.
Sampling temperature used.
Nucleus sampling parameter used.
Maximum output tokens setting.
Frequency penalty setting.
Presence penalty setting.
Number of top logprobs returned.
System instructions used.
Truncation strategy used.
Reasoning configuration used.
Text generation configuration used.
End-user identifier.
Prompt cache key used.
Safety policy identifier used.
Maximum number of built-in tool calls allowed in a response.
Whether the response is stored.
Previous response ID for conversation continuity.
Whether this was a background response.
Unix timestamp when the response completed.
Service tier used by the provider.
Built-in tool consumption metrics from the provider (e.g. image generation tokens, web search request counts). Present on OpenAI responses; absent for other providers.