Skip to main content
POST
/
v1
/
chat
/
completions
from auriko import Client

client = Client()
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1745100000,
  "model": "gpt-4o-2024-08-06",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 14,
    "completion_tokens": 7,
    "total_tokens": 21,
    "prompt_tokens_details": {
      "cached_tokens": 0
    },
    "completion_tokens_details": {
      "reasoning_tokens": 0
    }
  },
  "system_fingerprint": "fp_d08c293973",
  "routing_metadata": {
    "provider": "openai",
    "provider_model_id": "gpt-4o-2024-08-06",
    "model_canonical": "gpt-4o",
    "routing_strategy": "cost-focus",
    "throughput_tps": 9.2,
    "cost": {
      "usd": 0.000105
    }
  }
}

Documentation Index

Fetch the complete documentation index at: https://docs.auriko.ai/llms.txt

Use this file to discover all available pages before exploring further.

Creates a model response for the given chat conversation. Auriko routes the request to the optimal provider based on your routing preferences (cost, latency, throughput, etc.).

Auriko Extensions

Beyond OpenAI compatibility, this endpoint supports:
  • Multi-model routing: Use models[] instead of model to route across multiple models
  • Routing options: Control provider selection with the routing object
  • Provider extensions: Pass provider-specific parameters with extensions
  • Cost transparency: Response includes routing_metadata with cost breakdown
All parameters, request/response schemas, and examples are auto-generated from the OpenAPI specification.

Authorizations

Authorization
string
header
required

API key authentication. Keys start with ak_ prefix. Example: Authorization: Bearer ak_live_xxxxxxxxxxxx

Body

application/json
model
string
required

Model to route to, required for single-model requests. Mutually exclusive with gateway.models; providing both returns 400.

Examples: gpt-4o, claude-sonnet-4-6, llama-3.3-70b-instruct

Example:

"gpt-4o"

messages
object[]
required

The messages to generate a completion for

Minimum array length: 1
temperature
number

Sampling temperature (0-2). Some providers restrict this value when reasoning is enabled.

Required range: 0 <= x <= 2
top_p
number

Nucleus sampling parameter. Some providers restrict this value when reasoning is enabled.

Required range: 0 <= x <= 1
max_tokens
integer

Maximum tokens to generate (legacy, use max_completion_tokens)

Required range: x >= 1
max_completion_tokens
integer

Maximum tokens to generate. Reasoning models (o1/o3) use this field instead of max_tokens.

Required range: x >= 1
reasoning_effort
enum<string>

Controls reasoning effort for supported models. Auriko translates this into each provider's native reasoning control.

See /guides/extensions-and-thinking for per-provider behavior.

Available options:
low,
medium,
high,
xhigh,
max,
off
stop

Stop sequences. Restrictions vary by provider and model.

presence_penalty
number

Presence penalty (-2 to 2). Not supported by all providers.

Required range: -2 <= x <= 2
frequency_penalty
number

Frequency penalty (-2 to 2). Not supported by all providers.

Required range: -2 <= x <= 2
logit_bias
object

Token logit bias. Not supported by all providers.

seed
integer

Random seed for reproducibility

top_k
integer

Top-K sampling. Restricted by some providers when reasoning is enabled. Supported by Anthropic, Google, and vLLM.

Required range: x >= 1
min_p
number

Min-P sampling. Supported by some vLLM providers.

Required range: 0 <= x <= 1
top_a
number

Top-A sampling. Supported by some vLLM providers.

Required range: 0 <= x <= 1
repetition_penalty
number

Repetition penalty. Supported by vLLM providers.

tools
object[]

Tools the model can call

tool_choice
Available options:
auto,
required,
none
parallel_tool_calls
boolean

Allow parallel tool calls

functions
object[]
deprecated

Deprecated. Use tools instead. Auto-converted.

function_call
deprecated

Deprecated. Use tool_choice instead. Auto-converted.

Available options:
none,
auto
response_format
object
stream
boolean
default:false

Enable streaming responses

stream_options
object
user
string

User identifier for abuse detection

n
integer
default:1

Number of completions to generate. Not supported by all providers.

Required range: 1 <= x <= 10
logprobs
boolean

Return log probabilities. Not supported by all providers.

top_logprobs
integer

Number of top logprobs to return. Requires logprobs support.

Required range: 0 <= x <= 20
web_search_options
object

Web search configuration. Supported by OpenAI.

verbosity
string

Output verbosity control. Supported by OpenAI.

prompt_cache_key
string

Prompt caching identifier. Supported by OpenAI.

safety_identifier
string

Safety policy identifier. Supported by OpenAI.

gateway
object

Auriko routing, metadata, and multi-model configuration. Omit for default single-model routing.

extensions
object

Auriko extensions for provider-specific passthrough.

For reasoning control, use the top-level reasoning_effort parameter instead of extensions.

Provider Passthrough

Pass provider-specific parameters directly:

  • anthropic: Anthropic-specific parameters
  • openai: OpenAI-specific parameters
  • google: Google/Gemini-specific parameters
  • deepseek: DeepSeek-specific parameters

Passthrough parameters are forwarded as-is to the target provider.

Response

Successful completion.

For streaming (stream: true), responses are Server-Sent Events. Each event is a ChatCompletionChunk. The final chunk has choices: [] (empty) and contains usage and routing_metadata. Stream ends with data: [DONE].

id
string
required

Unique completion identifier

object
string
required
Allowed value: "chat.completion"
created
integer
required

Unix timestamp of creation

model
string
required

Model used for completion

choices
object[]
required

Completion choices

usage
object

Token usage statistics. Present in virtually all responses. May be absent in rare cases where upstream provider does not report usage.

system_fingerprint
string

Backend fingerprint for reproducibility. Not all models include this field.

routing_metadata
object

Routing decision metadata included in successful responses. 10 STABLE fields (4 required + 6 optional) in the current public contract.

service_tier
string | null

The service tier used for processing the request. Present for OpenAI-routed models.