Streaming
The Kilo AI Gateway supports streaming responses from all models using Server-Sent Events (SSE). Streaming allows your application to display tokens as they're generated, providing a more responsive user experience.
Enabling streaming
Set stream: true in your request body to enable streaming:
{
"model": "anthropic/claude-sonnet-4.5",
"messages": [{ "role": "user", "content": "Write a short story" }],
"stream": true
}
The gateway automatically injects stream_options.include_usage = true on all streaming requests, so you always receive token usage information in the final chunk.
Streaming with the Vercel AI SDK
The Vercel AI SDK handles SSE parsing and provides a clean streaming interface:
import { streamText } from "ai"
import { createOpenAI } from "@ai-sdk/openai"
const kilo = createOpenAI({
baseURL: "https://api.kilo.ai/api/gateway",
apiKey: process.env.KILO_API_KEY,
})
const result = streamText({
model: kilo("anthropic/claude-sonnet-4.5"),
prompt: "Write a short story about a robot.",
})
for await (const textPart of result.textStream) {
process.stdout.write(textPart)
}
// Access usage data after streaming completes
const usage = await result.usage
console.log("Tokens used:", usage)
Streaming with the OpenAI SDK
import OpenAI from "openai"
const client = new OpenAI({
apiKey: process.env.KILO_API_KEY,
baseURL: "https://api.kilo.ai/api/gateway",
})
const stream = await client.chat.completions.create({
model: "anthropic/claude-sonnet-4.5",
messages: [{ role: "user", content: "Write a short story" }],
stream: true,
})
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content
if (content) {
process.stdout.write(content)
}
}
Raw SSE format
When streaming, the gateway returns data in SSE format. Each event is a JSON object prefixed with data: :
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1234567890,"model":"anthropic/claude-sonnet-4.5","choices":[{"index":0,"delta":{"role":"assistant","content":"Once"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1234567890,"model":"anthropic/claude-sonnet-4.5","choices":[{"index":0,"delta":{"content":" upon"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1234567890,"model":"anthropic/claude-sonnet-4.5","choices":[{"index":0,"delta":{"content":" a"},"finish_reason":null}]}
data: [DONE]
Usage in the final chunk
Token usage data is included in the final chunk before [DONE], with an empty choices array:
{
"id": "chatcmpl-abc123",
"object": "chat.completion.chunk",
"usage": {
"prompt_tokens": 12,
"completion_tokens": 150,
"total_tokens": 162
},
"choices": []
}
Stream cancellation
You can cancel a streaming request by aborting the connection. This stops token generation and billing for ungenerated tokens:
const controller = new AbortController()
const response = await fetch("https://api.kilo.ai/api/gateway/chat/completions", {
method: "POST",
headers: {
Authorization: `Bearer ${process.env.KILO_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "anthropic/claude-sonnet-4.5",
messages: [{ role: "user", content: "Write a long essay" }],
stream: true,
}),
signal: controller.signal,
})
// Cancel after 5 seconds
setTimeout(() => controller.abort(), 5000)
Stream cancellation behavior depends on the upstream provider. Some providers stop processing immediately, while others may continue processing after disconnection. The gateway handles partial usage tracking for cancelled streams.
Error handling during streaming
Errors before streaming starts
If an error occurs before any tokens are sent, the gateway returns a standard JSON error response with the appropriate HTTP status code:
{
"error": {
"message": "Insufficient balance",
"code": 402
}
}
Errors during streaming
If an error occurs after tokens have already been sent, the HTTP status (200) cannot be changed. The error appears as an SSE event:
data: {"error":{"message":"Provider disconnected","code":502},"choices":[{"index":0,"delta":{"content":""},"finish_reason":"error"}]}
Check for finish_reason: "error" to detect mid-stream errors in your client code.
Recommended SSE clients
For parsing SSE streams, we recommend these libraries:
- eventsource-parser -- Lightweight SSE parser
- OpenAI SDK -- Built-in streaming support
- Vercel AI SDK -- High-level streaming abstractions