Streaming

The Kilo AI Gateway supports streaming responses from all models using Server-Sent Events (SSE). Streaming allows your application to display tokens as they're generated, providing a more responsive user experience.

Enabling streaming

Set stream: true in your request body to enable streaming:

{
	"model": "anthropic/claude-sonnet-4.5",
	"messages": [{ "role": "user", "content": "Write a short story" }],
	"stream": true
}
â„šī¸Info

The gateway automatically injects stream_options.include_usage = true on all streaming requests, so you always receive token usage information in the final chunk.

Streaming with the Vercel AI SDK

The Vercel AI SDK handles SSE parsing and provides a clean streaming interface:

import { streamText } from "ai"
import { createOpenAI } from "@ai-sdk/openai"

const kilo = createOpenAI({
	baseURL: "https://api.kilo.ai/api/gateway",
	apiKey: process.env.KILO_API_KEY,
})

const result = streamText({
	model: kilo("anthropic/claude-sonnet-4.5"),
	prompt: "Write a short story about a robot.",
})

for await (const textPart of result.textStream) {
	process.stdout.write(textPart)
}

// Access usage data after streaming completes
const usage = await result.usage
console.log("Tokens used:", usage)

Streaming with the OpenAI SDK

import OpenAI from "openai"

const client = new OpenAI({
	apiKey: process.env.KILO_API_KEY,
	baseURL: "https://api.kilo.ai/api/gateway",
})

const stream = await client.chat.completions.create({
	model: "anthropic/claude-sonnet-4.5",
	messages: [{ role: "user", content: "Write a short story" }],
	stream: true,
})

for await (const chunk of stream) {
	const content = chunk.choices[0]?.delta?.content
	if (content) {
		process.stdout.write(content)
	}
}

Raw SSE format

When streaming, the gateway returns data in SSE format. Each event is a JSON object prefixed with data: :

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1234567890,"model":"anthropic/claude-sonnet-4.5","choices":[{"index":0,"delta":{"role":"assistant","content":"Once"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1234567890,"model":"anthropic/claude-sonnet-4.5","choices":[{"index":0,"delta":{"content":" upon"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1234567890,"model":"anthropic/claude-sonnet-4.5","choices":[{"index":0,"delta":{"content":" a"},"finish_reason":null}]}

data: [DONE]

Usage in the final chunk

Token usage data is included in the final chunk before [DONE], with an empty choices array:

{
	"id": "chatcmpl-abc123",
	"object": "chat.completion.chunk",
	"usage": {
		"prompt_tokens": 12,
		"completion_tokens": 150,
		"total_tokens": 162
	},
	"choices": []
}

Stream cancellation

You can cancel a streaming request by aborting the connection. This stops token generation and billing for ungenerated tokens:

const controller = new AbortController()

const response = await fetch("https://api.kilo.ai/api/gateway/chat/completions", {
	method: "POST",
	headers: {
		Authorization: `Bearer ${process.env.KILO_API_KEY}`,
		"Content-Type": "application/json",
	},
	body: JSON.stringify({
		model: "anthropic/claude-sonnet-4.5",
		messages: [{ role: "user", content: "Write a long essay" }],
		stream: true,
	}),
	signal: controller.signal,
})

// Cancel after 5 seconds
setTimeout(() => controller.abort(), 5000)
âš ī¸Warning

Stream cancellation behavior depends on the upstream provider. Some providers stop processing immediately, while others may continue processing after disconnection. The gateway handles partial usage tracking for cancelled streams.

Error handling during streaming

Errors before streaming starts

If an error occurs before any tokens are sent, the gateway returns a standard JSON error response with the appropriate HTTP status code:

{
	"error": {
		"message": "Insufficient balance",
		"code": 402
	}
}

Errors during streaming

If an error occurs after tokens have already been sent, the HTTP status (200) cannot be changed. The error appears as an SSE event:

data: {"error":{"message":"Provider disconnected","code":502},"choices":[{"index":0,"delta":{"content":""},"finish_reason":"error"}]}

Check for finish_reason: "error" to detect mid-stream errors in your client code.

For parsing SSE streams, we recommend these libraries: