Create Chat Completion
Create a model response for a chat conversation with intelligent routing
Creates a model response for the given chat conversation. Auriko routes the request to the optimal provider based on your routing preferences (cost, latency, throughput, etc.).Documentation Index
Fetch the complete documentation index at: https://docs.auriko.ai/llms.txt
Use this file to discover all available pages before exploring further.
Auriko Extensions
Beyond OpenAI compatibility, this endpoint supports:- Multi-model routing: Use
models[]instead ofmodelto route across multiple models - Routing options: Control provider selection with the
routingobject - Provider extensions: Pass provider-specific parameters with
extensions - Cost transparency: Response includes
routing_metadatawith cost breakdown
Authorizations
API key authentication.
Keys start with ak_ prefix.
Example: Authorization: Bearer ak_live_xxxxxxxxxxxx
Body
- Option 1
- Option 2
Model to route to, required for single-model requests.
Mutually exclusive with gateway.models; providing both returns 400.
Examples: gpt-4o, claude-sonnet-4-6, llama-3.3-70b-instruct
"gpt-4o"
The messages to generate a completion for
1- Option 1
- Option 2
- Option 3
- Option 4
- Option 5
- Option 6
Sampling temperature (0-2). Some providers restrict this value when reasoning is enabled.
0 <= x <= 2Nucleus sampling parameter. Some providers restrict this value when reasoning is enabled.
0 <= x <= 1Maximum tokens to generate (legacy, use max_completion_tokens)
x >= 1Maximum tokens to generate. Reasoning models (o1/o3) use this field instead of max_tokens.
x >= 1Controls reasoning effort for supported models. Auriko translates this into each provider's native reasoning control.
See /guides/extensions-and-thinking for per-provider behavior.
low, medium, high, xhigh, max, off Stop sequences. Restrictions vary by provider and model.
Presence penalty (-2 to 2). Not supported by all providers.
-2 <= x <= 2Frequency penalty (-2 to 2). Not supported by all providers.
-2 <= x <= 2Token logit bias. Not supported by all providers.
Random seed for reproducibility
Top-K sampling. Restricted by some providers when reasoning is enabled. Supported by Anthropic, Google, and vLLM.
x >= 1Min-P sampling. Supported by some vLLM providers.
0 <= x <= 1Top-A sampling. Supported by some vLLM providers.
0 <= x <= 1Repetition penalty. Supported by vLLM providers.
Tools the model can call
auto, required, none Allow parallel tool calls
Deprecated. Use tools instead. Auto-converted.
Deprecated. Use tool_choice instead. Auto-converted.
none, auto Enable streaming responses
User identifier for abuse detection
Number of completions to generate. Not supported by all providers.
1 <= x <= 10Return log probabilities. Not supported by all providers.
Number of top logprobs to return. Requires logprobs support.
0 <= x <= 20Web search configuration. Supported by OpenAI.
Output verbosity control. Supported by OpenAI.
Prompt caching identifier. Supported by OpenAI.
Safety policy identifier. Supported by OpenAI.
Auriko routing, metadata, and multi-model configuration. Omit for default single-model routing.
Auriko extensions for provider-specific passthrough.
For reasoning control, use the top-level reasoning_effort parameter
instead of extensions.
Provider Passthrough
Pass provider-specific parameters directly:
anthropic: Anthropic-specific parametersopenai: OpenAI-specific parametersgoogle: Google/Gemini-specific parametersdeepseek: DeepSeek-specific parameters
Passthrough parameters are forwarded as-is to the target provider.
Response
Successful completion.
For streaming (stream: true), responses are Server-Sent Events.
Each event is a ChatCompletionChunk. The final chunk has choices: []
(empty) and contains usage and routing_metadata. Stream ends with
data: [DONE].
Unique completion identifier
"chat.completion"Unix timestamp of creation
Model used for completion
Completion choices
Token usage statistics. Present in virtually all responses. May be absent in rare cases where upstream provider does not report usage.
Backend fingerprint for reproducibility. Not all models include this field.
Routing decision metadata included in successful responses. 10 STABLE fields (4 required + 6 optional) in the current public contract.
The service tier used for processing the request. Present for OpenAI-routed models.