> ## Documentation Index
> Fetch the complete documentation index at: https://docs.auriko.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Create response

> Create a model response using the OpenAI Response API format

Auriko routes the request to the optimal provider based on your routing preferences (cost, latency, throughput, etc.). For guide-level documentation, see the [Response API overview](/response-api/overview).

<Note>
  The Response API is in preview. The interface may change before GA.
</Note>

## Auriko extensions

Auriko adds these capabilities to the standard Response API:

* **Multi-model routing**: Use `gateway.models[]` instead of `model` to route across multiple models
* **Routing options**: Control provider selection with the `gateway.routing` object
* **Provider extensions**: Pass provider-specific parameters with `extensions`
* **Cost transparency**: Response includes `routing_metadata` with cost breakdown


## OpenAPI

````yaml POST /v1/responses
openapi: 3.1.0
info:
  title: Auriko API
  version: 1.0.0
  description: >
    Intelligent LLM routing API with OpenAI-compatible interface.


    ## Overview


    Auriko provides a unified API for accessing multiple LLM providers with
    intelligent

    routing based on cost, latency, throughput, and availability.


    ## OpenAI Compatibility


    The API is compatible with OpenAI's Chat Completions API. You can use any

    OpenAI SDK by changing the base URL and API key.


    ## Auriko Extensions


    Beyond OpenAI compatibility, Auriko adds:


    - **Multi-model routing**: Request multiple models with `gateway.models[]`
    array

    - **Routing options**: Control provider selection with `routing` object

    - **Provider extensions**: Pass provider-specific parameters with
    `extensions` object

    - **Routing metadata**: Get transparency into routing decisions via
    `routing_metadata`
  contact:
    name: Auriko Support
    url: https://auriko.ai
  license:
    name: Proprietary
    url: https://auriko.ai/terms
servers:
  - url: https://api.auriko.ai
    description: Production
security:
  - ApiKeyAuth: []
tags:
  - name: Chat
    description: Chat completion endpoints
  - name: Responses
    description: OpenAI Response API endpoints
  - name: Callable Models
    description: Model IDs available for authenticated inference requests
  - name: Model Catalog
    description: Public model and provider catalog endpoints
  - name: Workspaces
    description: Workspace management
  - name: Budgets
    description: Spend control and budget management
  - name: API Keys
    description: API key management
  - name: BYOK
    description: Bring-your-own-key provider key management
  - name: Billing
    description: Credit balance and billing
  - name: Account
    description: API key introspection
  - name: Messages
    description: Anthropic Messages API endpoints
  - name: Routing
    description: Workspace routing configuration
  - name: Usage
    description: Usage statistics and request lookup
  - name: Members
    description: Workspace member and invite management
  - name: Audit
    description: Workspace audit event log
paths:
  /v1/responses:
    post:
      tags:
        - Responses
      summary: Create a response (OpenAI Response API)
      description: >
        Auriko routes the request to the optimal provider based on your

        routing preferences (cost, latency, throughput, etc.).


        ## Streaming


        When `stream: true`, Auriko delivers responses as Server-Sent Events
        (SSE)

        using the Response API event format: `event: <type>\ndata: <json>\n\n`.


        Terminal events: `response.completed`, `response.incomplete`,
        `response.failed`.

        There's no `data: [DONE]` terminator.


        ## Multi-Model Routing


        Use `gateway.models[]` instead of top-level `model` to enable
        multi-model routing.
      operationId: createResponse
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/CreateResponseRequest'
            examples:
              basic:
                summary: Basic text response
                value:
                  model: gpt-4o
                  input: Hello!
              with_tools:
                summary: With function tools
                value:
                  model: gpt-4o
                  input:
                    - type: message
                      role: user
                      content: What's the weather in NYC?
                  tools:
                    - type: function
                      name: get_weather
                      parameters:
                        type: object
                        properties:
                          city:
                            type: string
      responses:
        '200':
          description: |
            Successful response.

            For non-streaming requests, returns a `ResponseObject`.
            For streaming (`stream: true`), returns Server-Sent Events in
            the Response API format: `event: <type>\ndata: <json>\n\n`.
          headers:
            X-Request-ID:
              $ref: '#/components/headers/X-Request-ID'
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ResponseObject'
              example:
                id: resp_abc123def456
                object: response
                created_at: 1779308695
                model: gpt-4o-2024-08-06
                status: completed
                output:
                  - type: message
                    id: msg_abc123
                    role: assistant
                    status: completed
                    content:
                      - type: output_text
                        text: Hi there! How can I assist you today?
                        annotations: []
                output_text: Hi there! How can I assist you today?
                usage:
                  input_tokens: 9
                  output_tokens: 11
                  total_tokens: 20
                  input_tokens_details:
                    cached_tokens: 0
                  output_tokens_details:
                    reasoning_tokens: 0
                routing_metadata:
                  provider: openai
                  provider_model_id: gpt-4o-2024-08-06
                  model_canonical: gpt-4o
                  routing_strategy: cost-focus
                  throughput_tps: 48.3
                  cost:
                    usd: 0.000133
            text/event-stream:
              schema:
                type: string
                description: |
                  Response API SSE stream. Events use the format:
                  `event: <type>\ndata: <json>\n\n`

                  No `data: [DONE]` terminator is used.
        '400':
          $ref: '#/components/responses/BadRequest'
        '401':
          $ref: '#/components/responses/Unauthorized'
        '404':
          $ref: '#/components/responses/ModelNotFound'
        '429':
          $ref: '#/components/responses/RateLimited'
        '500':
          $ref: '#/components/responses/InternalError'
        '502':
          $ref: '#/components/responses/ProviderError'
        '503':
          $ref: '#/components/responses/ServiceUnavailable'
        '504':
          $ref: '#/components/responses/ProviderTimeout'
      security:
        - ApiKeyAuth: []
      x-codeSamples:
        - lang: bash
          label: curl
          source: |
            curl https://api.auriko.ai/v1/responses \
              -H "Authorization: Bearer $AURIKO_API_KEY" \
              -H "Content-Type: application/json" \
              -d '{
                "model": "gpt-4o",
                "input": "Hello!"
              }'
        - lang: python
          label: Python (OpenAI-compatible)
          source: |
            import os
            from openai import OpenAI

            client = OpenAI(
                api_key=os.environ["AURIKO_API_KEY"],
                base_url="https://api.auriko.ai/v1"
            )
            response = client.responses.create(
                model="gpt-4o",
                input="Hello!"
            )
            print(response.output_text)
        - lang: typescript
          label: TypeScript (OpenAI-compatible)
          source: |
            import OpenAI from "openai";

            const client = new OpenAI({
                apiKey: process.env.AURIKO_API_KEY,
                baseURL: "https://api.auriko.ai/v1",
            });
            const response = await client.responses.create({
                model: "gpt-4o",
                input: "Hello!",
            });
            console.log(response.output_text);
        - lang: python
          label: Python
          source: |
            from auriko import Client

            client = Client()
            response = client.responses.create(
                model="gpt-4o",
                input="Hello!"
            )
            print(response.output_text)
        - lang: typescript
          label: TypeScript
          source: |
            import { Client } from "@auriko/sdk";

            const client = new Client();
            const response = await client.responses.create({
                model: "gpt-4o",
                input: "Hello!",
            });
            console.log(response.output_text);
components:
  schemas:
    CreateResponseRequest:
      type: object
      required:
        - input
      oneOf:
        - required:
            - model
        - required:
            - gateway
          properties:
            gateway:
              required:
                - models
      description: Request body for creating a response via the Response API.
      properties:
        model:
          type: string
          description: Model ID to use (e.g., "gpt-4o", "claude-sonnet-4-20250514").
        input:
          oneOf:
            - type: string
              description: Simple text input.
            - type: array
              items:
                $ref: '#/components/schemas/ResponseInputItem'
              description: Structured input with message history and tool results.
          description: The input to generate a response for.
        instructions:
          type: string
          description: System instructions for the model.
        tools:
          type: array
          items:
            $ref: '#/components/schemas/ResponseTool'
          description: Tools available to the model.
        tool_choice:
          description: How the model should use tools.
          oneOf:
            - type: string
              enum:
                - auto
                - none
                - required
            - type: object
              properties:
                type:
                  type: string
                  const: function
                name:
                  type: string
              required:
                - type
                - name
            - type: object
              description: >-
                Non-function tool choice for hosted tools (e.g.,
                web_search_preview).
              properties:
                type:
                  type: string
                  not:
                    enum:
                      - function
              required:
                - type
              additionalProperties: true
        parallel_tool_calls:
          type: boolean
          description: Whether the model can make multiple tool calls in parallel.
        max_output_tokens:
          type: integer
          description: Maximum number of output tokens.
        temperature:
          type: number
          minimum: 0
          maximum: 2
          description: Sampling temperature.
        top_p:
          type: number
          minimum: 0
          maximum: 1
          description: Nucleus sampling parameter.
        top_k:
          type: integer
          description: Top-k sampling parameter.
        top_logprobs:
          type: integer
          minimum: 0
          maximum: 20
          description: >-
            Number of top logprobs to return per token position. Requires
            provider logprobs support.
        stream:
          type: boolean
          description: Whether to stream the response.
        text:
          type: object
          description: Text generation configuration.
          properties:
            format:
              $ref: '#/components/schemas/ResponseTextFormat'
        reasoning:
          type: object
          description: Reasoning/thinking configuration.
          properties:
            effort:
              type: string
              enum:
                - none
                - minimal
                - low
                - medium
                - high
                - xhigh
                - max
                - 'off'
              description: Level of reasoning effort.
            summary:
              type: string
              enum:
                - auto
                - concise
                - detailed
              description: Reasoning summary format.
            generate_summary:
              type: string
              enum:
                - auto
                - concise
                - detailed
              description: Reasoning summary format (alternative field name).
        truncation:
          type: string
          description: Truncation strategy for long inputs.
        metadata:
          type: object
          additionalProperties:
            type: string
          description: Arbitrary key-value metadata.
        include:
          type: array
          items:
            type: string
          description: Additional data to include in the response.
        user:
          type: string
          description: End-user identifier for abuse detection.
        store:
          type: boolean
          enum:
            - false
          description: Omit this field or set to false. Sending true returns 400.
        gateway:
          type: object
          description: Auriko gateway directives.
          properties:
            routing:
              $ref: '#/components/schemas/RoutingOptions'
            metadata:
              $ref: '#/components/schemas/RequestMetadata'
            models:
              type: array
              items:
                type: string
              description: Model pool for multi-model routing.
        extensions:
          $ref: '#/components/schemas/Extensions'
        prompt_cache_key:
          type: string
          description: Key for prompt caching.
        safety_identifier:
          type: string
          description: Safety policy identifier.
        frequency_penalty:
          type: number
          description: Penalizes new tokens based on their frequency in the text so far.
        presence_penalty:
          type: number
          description: >-
            Penalizes new tokens based on whether they appear in the text so
            far.
        max_tool_calls:
          type: integer
          minimum: 1
          description: >-
            Maximum number of built-in tool calls (e.g., web_search,
            code_interpreter) allowed in a response.
    ResponseObject:
      type: object
      description: A completed Response API response.
      required:
        - id
        - object
        - created_at
        - model
        - status
        - output
      properties:
        id:
          type: string
          description: Unique response identifier.
        object:
          type: string
          const: response
        created_at:
          type: integer
          description: Unix timestamp of creation.
        model:
          type: string
          description: Model used for generation.
        status:
          type: string
          enum:
            - completed
            - failed
            - incomplete
            - in_progress
          description: Response status.
        output:
          type: array
          items:
            $ref: '#/components/schemas/ResponseOutputItem'
          description: |
            Output items generated by the model. Known types include
            `message`, `function_call`, and `reasoning`. Additional types
            from the provider (e.g., `web_search_call`, `file_search_call`)
            are passed through verbatim.
        output_text:
          type: string
          description: Concatenated text output for convenience.
        parallel_tool_calls:
          type: boolean
          description: Whether parallel tool calls were enabled.
        tool_choice:
          description: Tool choice setting used.
        tools:
          type: array
          items: {}
          description: Tools that were available.
        usage:
          $ref: '#/components/schemas/ResponseUsageSchema'
        error:
          type:
            - object
            - 'null'
          properties:
            code:
              type: string
            message:
              type: string
          description: Error details if status is "failed".
        incomplete_details:
          type:
            - object
            - 'null'
          properties:
            reason:
              type: string
          description: Details if status is "incomplete".
        metadata:
          type:
            - object
            - 'null'
          additionalProperties:
            type: string
        routing_metadata:
          $ref: '#/components/schemas/RoutingMetadata'
        temperature:
          type: number
          description: Sampling temperature used.
        top_p:
          type: number
          description: Nucleus sampling parameter used.
        max_output_tokens:
          type:
            - integer
            - 'null'
          description: Maximum output tokens setting.
        frequency_penalty:
          type: number
          description: Frequency penalty setting.
        presence_penalty:
          type: number
          description: Presence penalty setting.
        top_logprobs:
          type: integer
          description: Number of top logprobs returned.
        instructions:
          type:
            - string
            - 'null'
          description: System instructions used.
        truncation:
          type: string
          description: Truncation strategy used.
        reasoning:
          type:
            - object
            - 'null'
          description: Reasoning configuration used.
        text:
          type: object
          description: Text generation configuration used.
        user:
          type:
            - string
            - 'null'
          description: End-user identifier.
        prompt_cache_key:
          type:
            - string
            - 'null'
          description: Prompt cache key used.
        safety_identifier:
          type:
            - string
            - 'null'
          description: Safety policy identifier used.
        max_tool_calls:
          type:
            - integer
            - 'null'
          description: Maximum number of built-in tool calls allowed in a response.
        store:
          type: boolean
          description: Whether the response is stored.
        previous_response_id:
          type:
            - string
            - 'null'
          description: Previous response ID for conversation continuity.
        background:
          type: boolean
          description: Whether this was a background response.
        completed_at:
          type:
            - integer
            - 'null'
          description: Unix timestamp when the response completed.
        service_tier:
          type:
            - string
            - 'null'
          description: Service tier used by the provider.
        tool_usage:
          type:
            - object
            - 'null'
          description: >-
            Built-in tool consumption metrics from the provider (e.g. image
            generation tokens, web search request counts). Present on OpenAI
            responses; absent for other providers.
    ResponseInputItem:
      description: An input item in the conversation history.
      oneOf:
        - type: object
          description: A message in the conversation.
          required:
            - role
            - content
          properties:
            type:
              type: string
              description: Optional, defaults to "message".
            role:
              type: string
              enum:
                - user
                - assistant
                - system
                - developer
            content:
              oneOf:
                - type: string
                - type: array
                  items:
                    $ref: '#/components/schemas/ResponseContentPart'
        - type: object
          description: A function call made by the model.
          required:
            - type
            - call_id
            - name
            - arguments
          properties:
            type:
              type: string
              const: function_call
            call_id:
              type: string
            name:
              type: string
            arguments:
              type: string
        - type: object
          description: Output of a function call.
          required:
            - type
            - call_id
            - output
          properties:
            type:
              type: string
              const: function_call_output
            call_id:
              type: string
            output:
              type: string
        - type: object
          description: >-
            Reasoning item from a previous response. Include in input for
            multi-turn conversations with reasoning models. Only available for
            models that support the Response API.
          required:
            - type
          properties:
            type:
              type: string
              const: reasoning
            id:
              type: string
            summary:
              type: array
              items:
                $ref: '#/components/schemas/ResponseReasoningSummaryBlock'
            encrypted_content:
              type: string
    ResponseTool:
      description: A tool available to the model.
      oneOf:
        - type: object
          description: A function tool.
          required:
            - type
            - name
          properties:
            type:
              type: string
              const: function
            name:
              type: string
            description:
              type: string
            parameters: c9d9a8d1-d209-4403-8b63-231c3bbb2318
            strict:
              type: boolean
        - type: object
          description: A built-in hosted tool (e.g., web_search, code_interpreter).
          required:
            - type
          properties:
            type:
              type: string
              description: Tool type identifier.
          additionalProperties: true
    ResponseTextFormat:
      description: Text output format configuration.
      oneOf:
        - type: object
          required:
            - type
          properties:
            type:
              type: string
              const: text
        - type: object
          required:
            - type
          properties:
            type:
              type: string
              const: json_object
        - type: object
          required:
            - type
            - name
            - schema
          properties:
            type:
              type: string
              const: json_schema
            name:
              type: string
            schema:
              type: object
            strict:
              type: boolean
    RoutingOptions:
      type: object
      description: >
        Auriko routing configuration (20 fields).


        Controls how Auriko selects providers for your request.

        All fields are optional. Setting a field to `null` is equivalent to
        omitting it.
      properties:
        optimize:
          type:
            - string
            - 'null'
          enum:
            - cost
            - cost-focus
            - ttft
            - ttft-focus
            - tps
            - tps-focus
            - balanced
            - null
          default: cost-focus
          description: |
            Optimization strategy:
            - `cost`: Minimize cost per token (well-rounded)
            - `cost-focus`: Aggressively minimize cost (default)
            - `ttft`: Minimize time to first token (well-rounded)
            - `ttft-focus`: Aggressively minimize time to first token
            - `tps`: Maximize tokens per second (well-rounded)
            - `tps-focus`: Aggressively maximize tokens per second
            - `balanced`: All dimensions weighted evenly
        weights:
          type:
            - object
            - 'null'
          additionalProperties: false
          description: >
            Custom scoring weights for routing optimization.

            When provided, overrides the `optimize` preset coefficients.

            All values must be non-negative. At least one dimension must be > 0.

            Unspecified dimensions default to 0. Server normalizes to sum to
            1.0.
          properties:
            cost:
              type:
                - number
                - 'null'
              minimum: 0
              description: Weight for cost minimization.
            ttft:
              type:
                - number
                - 'null'
              minimum: 0
              description: Weight for time-to-first-token optimization.
            throughput:
              type:
                - number
                - 'null'
              minimum: 0
              description: Weight for tokens-per-second optimization.
        ttft_percentile:
          type:
            - string
            - 'null'
          enum:
            - p50
            - p95
            - null
          default: p50
          description: >
            Which percentile to use for TTFT metrics in scoring and constraint
            filtering.

            p50 uses median, p95 uses worst-case. Default: p50.
        throughput_percentile:
          type:
            - string
            - 'null'
          enum:
            - p50
            - p95
            - null
          default: p50
          description: >
            Which percentile to use for throughput metrics in scoring and
            constraint filtering.

            p50 uses median, p95 uses worst-case. Default: p50.
        max_cost_per_1m:
          type:
            - number
            - 'null'
          exclusiveMinimum: 0
          description: |
            Maximum cost per 1 million tokens (USD).
            Providers exceeding this cost are excluded from selection.
        max_ttft_ms:
          type:
            - integer
            - 'null'
          exclusiveMinimum: 0
          description: |
            Maximum time to first token in milliseconds.
            Providers with estimated TTFT above this threshold are excluded.
        min_throughput_tps:
          type:
            - number
            - 'null'
          exclusiveMinimum: 0
          description: >
            Minimum throughput in tokens per second.

            Providers with estimated throughput below this threshold are
            excluded.
        providers:
          type:
            - array
            - 'null'
          items:
            type: string
          description: |
            Provider allowlist. Only consider these providers.

            Examples: `["openai", "anthropic", "fireworks_ai"]`
        exclude_providers:
          type:
            - array
            - 'null'
          items:
            type: string
          description: |
            Provider blocklist. Exclude these providers.

            Examples: `["together_ai"]`
        prefer:
          type:
            - string
            - 'null'
          description: |
            Preference boost for this provider.
            Provider will be selected if it meets constraints.
        mode:
          type:
            - string
            - 'null'
          enum:
            - pool
            - fallback
            - null
          default: pool
          description: |
            How to interpret `gateway.models[]` array:
            - `pool` (default): Route to best provider across all models
            - `fallback`: Try models in order until one succeeds
        allow_fallbacks:
          type:
            - boolean
            - 'null'
          default: true
          description: Enable automatic fallback to alternative providers on failure
        max_fallback_attempts:
          type:
            - integer
            - 'null'
          minimum: 1
          maximum: 19
          default: 19
          description: >-
            Maximum fallback attempts before giving up (chain length 20 =
            primary + 19 fallbacks). Range: 1-19.
        timeout_ms:
          type:
            - integer
            - 'null'
          minimum: 1
          description: >
            Per-attempt timeout in milliseconds. For streaming requests: time to
            first SSE byte. For non-streaming requests: time to complete
            response. Default: 120000 (120s streaming first-byte), 300000 (5min
            non-streaming total).
        deadline_ms:
          type:
            - integer
            - 'null'
          minimum: 1
          description: >
            Request deadline in milliseconds. Hard wall-clock cap across all
            fallback attempts. Must be >= timeout_ms when both are set.
            Non-streaming requests default to 1080000 (18 minutes); streaming
            has no default.
        data_policy:
          type:
            - string
            - 'null'
          enum:
            - none
            - no_training
            - zdr
            - null
          default: none
          description: |
            Data retention policy requirement:
            - `none`: No restrictions (default)
            - `no_training`: Provider must not use data for training
            - `zdr`: Zero Data Retention (strictest)
        only_byok:
          type:
            - boolean
            - 'null'
          default: false
          description: >
            Only use Bring Your Own Key (BYOK) providers.

            Mutually exclusive with `only_platform`. Returns 400 if both are
            set.
        only_platform:
          type:
            - boolean
            - 'null'
          default: false
          description: |
            Only use platform-managed API keys.
            Mutually exclusive with `only_byok`. Returns 400 if both are set.
        require_parameters:
          type:
            - boolean
            - 'null'
          default: false
          description: |
            When true, only route to providers whose accepted_params includes
            all optional parameters sent in this request. Default: false.
        tier:
          type:
            - string
            - 'null'
          description: >
            Opt in to a pricing tier. Currently supported: `priority` (Anthropic
            fast mode — 2.5x speed at 6x cost).

            Note: Auriko's "priority" tier refers to Anthropic Fast Mode, not
            Anthropic's separate Priority Tier (committed capacity SLA).

            Omitting this field (default) excludes premium-tier offerings from
            routing.
          enum:
            - priority
            - null
    RequestMetadata:
      type: object
      description: |
        Optional request metadata for tracking and observability.
        Attached via the `gateway.metadata` field on chat completion requests.
      properties:
        tags:
          type: array
          items:
            type: string
            maxLength: 50
          maxItems: 100
          description: Tags for categorizing requests (max 100 items, each ≤50 chars)
        user_id:
          type: string
          maxLength: 255
          description: Your application's user identifier for per-user analytics
        trace_id:
          type: string
          maxLength: 255
          description: >-
            Distributed tracing identifier to correlate with your observability
            stack
        custom_fields:
          type: object
          maxProperties: 10
          additionalProperties:
            type: string
            maxLength: 200
          description: >
            Arbitrary key-value pairs (max 10 keys, keys ≤50 chars, values ≤200
            chars)
      additionalProperties: false
    Extensions:
      type: object
      description: |
        Auriko extensions for provider-specific passthrough.

        For reasoning control, use the top-level `reasoning_effort` parameter
        instead of extensions.

        ## Provider Passthrough

        Pass provider-specific parameters directly:
        - `anthropic`: Anthropic-specific parameters
        - `openai`: OpenAI-specific parameters
        - `google`: Google/Gemini-specific parameters
        - `deepseek`: DeepSeek-specific parameters

        Passthrough parameters are forwarded as-is to the target provider.
      properties:
        anthropic:
          type: object
          additionalProperties: true
          description: Anthropic-specific parameters (passed through)
        openai:
          type: object
          additionalProperties: true
          description: OpenAI-specific parameters (passed through)
        google:
          type: object
          additionalProperties: true
          description: Google/Gemini-specific parameters (passed through)
        deepseek:
          type: object
          additionalProperties: true
          description: DeepSeek-specific parameters (passed through)
      additionalProperties:
        type: object
        description: Other provider-specific passthrough
      not:
        required:
          - thinking
    ResponseOutputItem:
      description: >
        An output item from the model. Discriminated on `type`.

        Known types: `message`, `function_call`, `reasoning`,
        `image_generation_call`.

        Unknown types from the provider are preserved verbatim.
      oneOf:
        - $ref: '#/components/schemas/ResponseMessageOutputItem'
        - $ref: '#/components/schemas/ResponseFunctionCallOutputItem'
        - $ref: '#/components/schemas/ResponseReasoningOutputItem'
        - $ref: '#/components/schemas/ResponseImageGenerationCallOutputItem'
        - type: object
          description: >
            Unknown output item type from the provider (e.g., web_search_call,
            file_search_call). Preserved verbatim.
          required:
            - type
          properties:
            type:
              type: string
              not:
                enum:
                  - message
                  - function_call
                  - reasoning
                  - image_generation_call
          additionalProperties: true
    ResponseUsageSchema:
      type: object
      description: Token usage for a Response API request.
      required:
        - input_tokens
        - output_tokens
        - total_tokens
      properties:
        input_tokens:
          type: integer
        output_tokens:
          type: integer
        total_tokens:
          type: integer
        input_tokens_details:
          type: object
          properties:
            cached_tokens:
              type: integer
        output_tokens_details:
          type: object
          properties:
            reasoning_tokens:
              type: integer
            image_tokens:
              type: integer
    RoutingMetadata:
      type: object
      description: >
        Routing decision metadata included in successful responses.

        8 STABLE fields (4 required + 4 optional) in the current public
        contract.
      required:
        - provider
        - provider_model_id
        - model_canonical
        - routing_strategy
      properties:
        provider:
          type: string
          description: Provider name (e.g., "fireworks_ai", "anthropic")
        provider_model_id:
          type: string
          description: Provider's model ID
        model_canonical:
          type: string
          description: Canonical model ID requested
        routing_strategy:
          type: string
          description: >
            Strategy used for routing. Known values:

            `cost`, `cost-focus`, `ttft`, `ttft-focus`, `tps`, `tps-focus`,
            `balanced`, `custom`.

            `custom` is returned when explicit `routing.weights` are provided.

            Additional strategies may be added in future versions.
        ttft_ms:
          type: number
          description: Time to first token (streaming only)
        throughput_tps:
          type: number
          description: Output throughput (tokens per second)
        cost:
          $ref: '#/components/schemas/CostInfo'
        warnings:
          type: array
          items:
            $ref: '#/components/schemas/StructuredWarning'
          description: |
            Structured warnings emitted when the gateway modifies or ignores
            part of the request (e.g., unsupported parameters, blocked fields).
    ErrorResponse:
      type: object
      required:
        - error
      properties:
        error:
          $ref: '#/components/schemas/ErrorObject'
    ResponseContentPart:
      description: A content part in a message.
      oneOf:
        - type: object
          required:
            - type
            - text
          properties:
            type:
              type: string
              const: input_text
            text:
              type: string
        - type: object
          required:
            - type
          properties:
            type:
              type: string
              const: input_image
            image_url:
              type: string
            detail:
              type: string
              enum:
                - auto
                - low
                - high
        - type: object
          description: >-
            Output text from a previous assistant response, used in multi-turn
            input.
          required:
            - type
            - text
          properties:
            type:
              type: string
              const: output_text
            text:
              type: string
            annotations:
              type: array
              items: {}
    ResponseReasoningSummaryBlock:
      type: object
      required:
        - type
        - text
      properties:
        type:
          type: string
          const: summary_text
        text:
          type: string
    ResponseMessageOutputItem:
      type: object
      required:
        - type
        - id
        - role
        - status
        - content
      properties:
        type:
          type: string
          const: message
        id:
          type: string
        role:
          type: string
          const: assistant
        status:
          type: string
          enum:
            - in_progress
            - completed
            - incomplete
        content:
          type: array
          items:
            $ref: '#/components/schemas/ResponseOutputContentPart'
        phase:
          type: string
          description: >-
            Message phase label ("commentary" or "final_answer") emitted by
            gpt-5.4+ OpenAI models; absent otherwise — never synthesized for
            providers that did not emit it. Preserve it when resending output
            items as input.
    ResponseFunctionCallOutputItem:
      type: object
      required:
        - type
        - id
        - call_id
        - name
        - arguments
        - status
      properties:
        type:
          type: string
          const: function_call
        id:
          type: string
        call_id:
          type: string
        name:
          type: string
        arguments:
          type: string
          description: JSON-encoded function arguments.
        status:
          type: string
          enum:
            - in_progress
            - completed
            - incomplete
    ResponseReasoningOutputItem:
      type: object
      required:
        - type
        - id
        - summary
      properties:
        type:
          type: string
          const: reasoning
        id:
          type: string
        summary:
          type: array
          items:
            $ref: '#/components/schemas/ResponseReasoningSummaryBlock'
        status:
          type: string
          enum:
            - in_progress
            - completed
            - incomplete
    ResponseImageGenerationCallOutputItem:
      type: object
      required:
        - type
        - id
        - status
        - result
      properties:
        type:
          type: string
          const: image_generation_call
        id:
          type: string
        status:
          type: string
          enum:
            - in_progress
            - completed
            - generating
            - failed
        result:
          type:
            - string
            - 'null'
          description: Base64-encoded image data, or null while in progress.
    CostInfo:
      type: object
      description: Cost for the request
      required:
        - usd
      properties:
        usd:
          type: number
          description: Billable cost in USD
        cache_savings_percent:
          type: integer
          description: >-
            Cache savings as integer percentage (0-100). Present only when
            savings > 0.
        cache_savings_usd:
          type: number
          description: Cache savings in USD. Present only when savings > 0.
    StructuredWarning:
      type: object
      description: |
        Structured warning emitted in routing_metadata.warnings when the
        gateway modifies or ignores part of the request.
      required:
        - type
        - code
        - message
      properties:
        type:
          type: string
          enum:
            - unsupported_parameter
            - unsupported_feature
            - unsupported_field
            - ignored_extension
            - unknown_provider
            - blocked_field
            - provider_timeout
          description: Warning category
        code:
          type: string
          description: >-
            Concrete identifier (parameter name, feature name, provider name, or
            field name)
        message:
          type: string
          description: Human-readable explanation
    ErrorObject:
      type: object
      description: OpenAI-compatible error format
      required:
        - message
        - type
        - param
        - code
      properties:
        message:
          type: string
          description: Human-readable error message
        type:
          type: string
          description: Error category per guide §4.
          enum:
            - invalid_request_error
            - authentication_error
            - permission_error
            - not_found_error
            - rate_limit_error
            - api_error
        param:
          type:
            - string
            - 'null'
          description: Parameter that caused the error (null if not applicable)
        code:
          type: string
          description: Canonical code per guide §5 and docs/reference/error_codes.json.
          enum:
            - invalid_api_key
            - expired_api_key
            - insufficient_permissions
            - feature_disabled
            - mfa_required
            - invalid_recovery_code
            - mfa_verification_failed
            - invalid_request
            - missing_required_parameter
            - invalid_parameter_value
            - payload_too_large
            - context_length_exceeded
            - content_filtered
            - idempotency_conflict
            - idempotency_replay_unavailable
            - field_immutable
            - operation_not_allowed
            - unknown_field
            - state_precondition_failed
            - duplicate_resource
            - method_not_allowed
            - tools_not_supported
            - json_mode_not_supported
            - structured_output_not_supported
            - tools_with_structured_output_not_supported
            - vision_not_supported
            - reasoning_not_supported
            - thinking_disable_not_supported
            - streaming_not_supported
            - non_streaming_not_supported
            - batch_only
            - tier_opt_in_required
            - tool_choice_required_not_supported
            - cost_constraint_exceeded
            - latency_constraint_exceeded
            - throughput_constraint_not_met
            - provider_not_in_allowlist
            - provider_blocked
            - required_params_not_supported
            - byok_keys_required
            - platform_keys_unavailable
            - unsupported_modalities
            - no_compatible_endpoint
            - no_responses_endpoint
            - response_api_only
            - hosted_tool_not_supported
            - model_not_found
            - resource_not_found
            - rate_limit_exceeded
            - budget_exhausted
            - insufficient_quota
            - no_provider_available
            - upstream_error
            - upstream_timeout
            - client_disconnected
            - model_unavailable
            - internal_error
            - service_unavailable
        doc_url:
          type: string
          description: Link to the canonical documentation page for this error code.
        provider:
          type:
            - string
            - 'null'
          description: >-
            Upstream provider that generated the error (e.g.,
            "google_ai_studio", "openai"). Null for errors not attributable to
            an upstream provider (local validation, auth, internal config).
        suggestion:
          type: string
          description: >-
            Actionable fix hint for routing, capability, or model-not-found
            errors. Absent for other error types.
    ResponseOutputContentPart:
      type: object
      required:
        - type
        - text
      properties:
        type:
          type: string
          const: output_text
        text:
          type: string
        annotations:
          type: array
          items: {}
        logprobs:
          type: array
          items: {}
          description: Token-level log probabilities. Present when top_logprobs is set.
  headers:
    X-Request-ID:
      description: |
        Unique request identifier for debugging and support.
        Include this in support requests for fast resolution.
      schema:
        type: string
      example: req_abc123def456
    X-Error-Type:
      description: |
        Same value as the `error.type` field in the response body, exposed as a
        header for programmatic routing without body parsing.
      schema:
        type: string
        enum:
          - invalid_request_error
          - rate_limit_error
          - api_error
          - authentication_error
          - not_found_error
          - permission_error
      example: invalid_request_error
    X-Error-Retryable:
      description: |
        Whether the client should retry this request.
        'true' for api_error and rate_limit_error types; 'false' for all others.
      schema:
        type: string
        enum:
          - 'true'
          - 'false'
      example: 'false'
  responses:
    BadRequest:
      description: Bad request - invalid parameters
      headers:
        X-Request-ID:
          $ref: '#/components/headers/X-Request-ID'
        X-Error-Type:
          $ref: '#/components/headers/X-Error-Type'
        X-Error-Retryable:
          $ref: '#/components/headers/X-Error-Retryable'
      content:
        application/json:
          schema:
            $ref: '#/components/schemas/ErrorResponse'
          examples:
            missing_model:
              summary: Missing model parameter
              value:
                error:
                  message: 'Missing required parameter: ''model''.'
                  type: invalid_request_error
                  param: model
                  code: missing_required_parameter
            invalid_models:
              summary: Invalid models array
              value:
                error:
                  message: gateway.models array cannot exceed 10 models
                  type: invalid_request_error
                  param: gateway.models
                  code: invalid_request
            vision_not_supported:
              summary: Model doesn't support requested capability
              value:
                error:
                  message: Model 'gpt-4.1-mini' does not support vision/image input.
                  type: invalid_request_error
                  param: null
                  code: vision_not_supported
            cost_constraint_exceeded:
              summary: Routing constraints excluded all providers
              value:
                error:
                  message: No provider meets the cost constraint for model 'gpt-4o'.
                  type: invalid_request_error
                  param: gateway.routing.max_cost_per_1m
                  code: cost_constraint_exceeded
    Unauthorized:
      description: Authentication failed
      headers:
        X-Request-ID:
          $ref: '#/components/headers/X-Request-ID'
        X-Error-Type:
          $ref: '#/components/headers/X-Error-Type'
        X-Error-Retryable:
          $ref: '#/components/headers/X-Error-Retryable'
      content:
        application/json:
          schema:
            $ref: '#/components/schemas/ErrorResponse'
          example:
            error:
              message: API key is invalid.
              type: authentication_error
              param: null
              code: invalid_api_key
    ModelNotFound:
      description: Model not found
      headers:
        X-Request-ID:
          $ref: '#/components/headers/X-Request-ID'
        X-Error-Type:
          $ref: '#/components/headers/X-Error-Type'
        X-Error-Retryable:
          $ref: '#/components/headers/X-Error-Retryable'
      content:
        application/json:
          schema:
            $ref: '#/components/schemas/ErrorResponse'
          example:
            error:
              message: >-
                Model 'unknown-model' is not in the catalog. See
                https://api.auriko.ai/v1/directory/models for available models.
              type: not_found_error
              param: model
              code: model_not_found
    RateLimited:
      description: |
        Rate limit exceeded. Two sub-causes share this status:
        - `rate_limit_exceeded`: throughput-based rate limit.
        - `budget_exhausted`: account or project budget reached (self-service;
          not auto-retried by SDKs — requires adding credits or raising the limit).
      headers:
        X-Request-ID:
          $ref: '#/components/headers/X-Request-ID'
        X-Error-Type:
          $ref: '#/components/headers/X-Error-Type'
        X-Error-Retryable:
          $ref: '#/components/headers/X-Error-Retryable'
        Retry-After:
          description: >-
            Seconds until rate limit resets. Standard HTTP header for retry
            backoff.
          schema:
            type: integer
          example: 60
        X-Rate-Limit-Source:
          description: >-
            Identifies whether the rate limit originated from an upstream
            provider or from Auriko's routing layer.
          schema:
            type: string
            enum:
              - provider
              - routing
          required: false
          example: provider
        X-RateLimit-Limit-Requests:
          description: Request limit per window (present on throughput-based rate limits)
          schema:
            type: integer
          required: false
          example: 60
        X-RateLimit-Remaining-Requests:
          description: >-
            Remaining requests in window (present on throughput-based rate
            limits)
          schema:
            type: integer
          required: false
          example: 0
        X-RateLimit-Reset-Requests:
          description: >-
            When limit resets (ISO 8601 format; present on throughput-based rate
            limits)
          schema:
            type: string
            format: date-time
          required: false
          example: '2026-06-15T12:01:00.000Z'
        X-Budget-Exceeded:
          description: Present on budget-exhaustion 429s.
          schema:
            type: string
            enum:
              - 'true'
              - 'false'
          required: false
          example: 'true'
        X-Budget-Exceeded-Period:
          description: Period of the exceeded budget (present on budget-exhaustion 429s).
          schema:
            type: string
            enum:
              - daily
              - weekly
              - monthly
          required: false
          example: monthly
        X-Budget-Exceeded-Scope:
          description: Scope of the exceeded budget (present on budget-exhaustion 429s).
          schema:
            type: string
            enum:
              - workspace
              - api_key
              - byok_provider
          required: false
          example: workspace
      content:
        application/json:
          schema:
            $ref: '#/components/schemas/ErrorResponse'
          examples:
            rate_limit_exceeded:
              summary: Throughput-based rate limit
              value:
                error:
                  message: Rate limit exceeded. Retry after 60 seconds.
                  type: rate_limit_error
                  param: null
                  code: rate_limit_exceeded
            budget_exhausted:
              summary: Account budget exhausted
              value:
                error:
                  message: Account budget exhausted. Add credits to continue.
                  type: rate_limit_error
                  param: null
                  code: budget_exhausted
    InternalError:
      description: Internal server error
      headers:
        X-Request-ID:
          $ref: '#/components/headers/X-Request-ID'
        X-Error-Type:
          $ref: '#/components/headers/X-Error-Type'
        X-Error-Retryable:
          $ref: '#/components/headers/X-Error-Retryable'
      content:
        application/json:
          schema:
            $ref: '#/components/schemas/ErrorResponse'
          example:
            error:
              message: >-
                An unexpected error occurred. Contact support with the request
                ID.
              type: api_error
              param: null
              code: internal_error
    ProviderError:
      description: >
        Upstream provider failure (Bad Gateway).


        All non-timeout 5xx errors from upstream providers are normalized to 502
        (code: `upstream_error`).

        Upstream failures may also surface as:

        - 429: Provider rate limit (code: `rate_limit_exceeded`)

        - 504: Provider timed out (code: `upstream_timeout`)


        These use their respective status codes with the same `ErrorResponse`
        body format.
      headers:
        X-Request-ID:
          $ref: '#/components/headers/X-Request-ID'
        X-Error-Type:
          description: Canonical error type (one of the X-Error-Type enum values)
          schema:
            type: string
          required: false
        X-Error-Retryable:
          description: Whether the error is retryable ('true' or 'false')
          schema:
            type: string
            enum:
              - 'true'
              - 'false'
          required: false
        Retry-After:
          description: Seconds to wait before retrying (present on rate limit errors)
          schema:
            type: integer
          required: false
      content:
        application/json:
          schema:
            $ref: '#/components/schemas/ErrorResponse'
          example:
            error:
              message: >-
                Model 'gpt-4o' is temporarily unavailable. Retry in a few
                seconds.
              type: api_error
              param: null
              code: upstream_error
    ServiceUnavailable:
      description: >
        Service unavailable — transient issue. Possible causes:

        - All providers for the model are rate-limited or temporarily
        unavailable (`no_provider_available`)

        - Transient infrastructure issue (`service_unavailable`)


        Note: If the model doesn't support a requested capability (e.g.,
        reasoning),

        the response is 400 with a specific code (e.g.,
        `reasoning_not_supported`), not 503.

        If routing constraints excluded all providers, the response is 400 with
        a

        specific constraint code (e.g., `cost_constraint_exceeded`).
      headers:
        X-Request-ID:
          $ref: '#/components/headers/X-Request-ID'
        X-Error-Type:
          $ref: '#/components/headers/X-Error-Type'
        X-Error-Retryable:
          $ref: '#/components/headers/X-Error-Retryable'
        Retry-After:
          description: Seconds to wait before retrying (present on transient failures)
          schema:
            type: integer
          required: false
      content:
        application/json:
          schema:
            $ref: '#/components/schemas/ErrorResponse'
          examples:
            no_providers:
              summary: No providers available for the model
              value:
                error:
                  message: >-
                    No provider available for model 'gpt-4o'. Try a different
                    model.
                  type: api_error
                  param: null
                  code: no_provider_available
            service_unavailable:
              summary: Transient infrastructure issue
              value:
                error:
                  message: Service is temporarily unavailable. Retry in a few seconds.
                  type: api_error
                  param: null
                  code: service_unavailable
    ProviderTimeout:
      description: |
        Upstream provider timed out. The client may retry with a longer timeout.
      headers:
        X-Request-ID:
          $ref: '#/components/headers/X-Request-ID'
        X-Error-Type:
          description: Error type (always 'api_error' for timeout responses)
          schema:
            type: string
          required: false
        X-Error-Retryable:
          description: Whether the error is retryable ('true' or 'false')
          schema:
            type: string
            enum:
              - 'true'
              - 'false'
          required: false
      content:
        application/json:
          schema:
            $ref: '#/components/schemas/ErrorResponse'
          example:
            error:
              message: Model 'gpt-4o' timed out. Retry or use a smaller input.
              type: api_error
              param: null
              code: upstream_timeout
  securitySchemes:
    ApiKeyAuth:
      type: http
      scheme: bearer
      description: |
        API key authentication.
        Keys start with `ak_` prefix.
        Example: `Authorization: Bearer ak_live_xxxxxxxxxxxx`

````