Skip to main content
Each streaming event carries a type field like response.output_text.delta or response.completed. The stream ends with a terminal event (response.completed, response.incomplete, or response.failed) instead of a data: [DONE] sentinel.

Prerequisites

  • An Auriko API key
  • Python 3.10+ with the OpenAI SDK (pip install openai) or the Auriko SDK (pip install auriko)
    • OR Node.js 18+ with the OpenAI SDK (npm install openai) or @auriko/sdk (npm install @auriko/sdk)

Stream text

Stream a response and print each text token:
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

stream = client.responses.create(
    model="gpt-4o",
    input="Count from 1 to 10",
    stream=True
)

for event in stream:
    if event.type == "response.output_text.delta":
        print(event.delta, end="", flush=True)

Handle event types

A basic text response emits events in this order: response.createdresponse.in_progressresponse.output_item.addedresponse.content_part.addedresponse.output_text.delta (repeated) → response.output_text.doneresponse.content_part.doneresponse.output_item.doneresponse.completed

Lifecycle events

EventDescription
response.createdResponse object created, status is in_progress
response.in_progressProcessing has started
response.completedResponse finished, includes final response object
response.incompleteResponse stopped early (token limit, content filter)
response.failedResponse failed, includes error details

Content events

EventDescription
response.output_item.addedNew output item started (text, function call, or reasoning)
response.output_item.doneOutput item finished
response.content_part.addedNew content part within an output item
response.content_part.doneContent part finished
response.output_text.deltaText chunk, access via event.delta
response.output_text.doneText output complete, access full text via event.text

Reasoning events

EventDescription
response.reasoning_summary_part.addedReasoning summary part started
response.reasoning_summary_part.doneReasoning summary part finished
response.reasoning_summary_text.deltaReasoning summary text chunk
response.reasoning_summary_text.doneReasoning summary text complete

Tool call events

EventDescription
response.function_call_arguments.deltaFunction call arguments chunk
response.function_call_arguments.doneFunction call arguments complete

Error event

EventDescription
errorStream-level error
Terminal events (response.completed, response.incomplete, response.failed) carry the final response object with usage and routing_metadata.

Access completed response

You can read response_headers before iterating. After iteration, the stream exposes the terminal event’s full response object.
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["AURIKO_API_KEY"],
    base_url="https://api.auriko.ai/v1"
)

stream = client.responses.create(
    model="gpt-4o",
    input="What is 2 + 2?",
    stream=True
)

completed = None
for event in stream:
    if event.type == "response.output_text.delta":
        print(event.delta, end="", flush=True)
    elif event.type == "response.completed":
        completed = event.response

print(f"\nModel: {completed.model}")
print(f"Usage: {completed.usage.input_tokens} in, {completed.usage.output_tokens} out")
cURL streams raw SSE events. See Read raw SSE for parsing terminal events. For routing metadata with the OpenAI SDK, see OpenAI Compatibility.

Stream asynchronously

Stream with the async client:
import os
import asyncio
from openai import AsyncOpenAI

async def stream_response():
    client = AsyncOpenAI(
        api_key=os.environ["AURIKO_API_KEY"],
        base_url="https://api.auriko.ai/v1"
    )

    stream = await client.responses.create(
        model="gpt-4o",
        input="Write a haiku about code",
        stream=True
    )

    async for event in stream:
        if event.type == "response.output_text.delta":
            print(event.delta, end="", flush=True)

asyncio.run(stream_response())
TypeScript’s SDK is inherently async. See the Stream text example above.

Read raw SSE

The raw wire format uses event: and data: lines. A basic text response looks like this:
event: response.created
data: {"type":"response.created","response":{"id":"resp_abc123","object":"response","status":"in_progress",...}}

event: response.in_progress
data: {"type":"response.in_progress","response":{"id":"resp_abc123","object":"response","status":"in_progress",...}}

event: response.output_item.added
data: {"type":"response.output_item.added","output_index":0,"item":{"type":"message","role":"assistant","content":[]}}

event: response.output_text.delta
data: {"type":"response.output_text.delta","output_index":0,"content_index":0,"delta":"The"}

event: response.output_text.delta
data: {"type":"response.output_text.delta","output_index":0,"content_index":0,"delta":" capital"}

event: response.completed
data: {"type":"response.completed","response":{"id":"resp_abc123","object":"response","status":"completed","output":[...],"usage":{...},"routing_metadata":{...}}}
See Chat Completions streaming for the data: [DONE] format used by the other endpoint. See Error Handling for error recovery patterns.