API Reference

The Kilo AI Gateway provides an OpenAI-compatible API. All endpoints use the base URL:

https://api.kilo.ai/api/gateway

Chat completions

Create a chat completion. This is the primary endpoint for interacting with AI models.

POST /chat/completions

Request body

type ChatCompletionRequest = {
	// Required
	model: string // Model ID (e.g., "anthropic/claude-sonnet-4.5")
	messages: Message[] // Array of conversation messages

	// Streaming
	stream?: boolean // Enable SSE streaming (default: false)

	// Generation parameters
	max_tokens?: number // Maximum tokens to generate
	temperature?: number // Sampling temperature (0-2)
	top_p?: number // Nucleus sampling (0-1)
	stop?: string | string[] // Stop sequences
	frequency_penalty?: number // Frequency penalty (-2 to 2)
	presence_penalty?: number // Presence penalty (-2 to 2)

	// Tool calling
	tools?: Tool[] // Available tools/functions
	tool_choice?: ToolChoice // Tool selection strategy

	// Structured output
	response_format?: ResponseFormat

	// Other
	user?: string // End-user identifier for safety
	seed?: number // Deterministic sampling seed
}

Message types

type Message =
	| { role: "system"; content: string }
	| { role: "user"; content: string | ContentPart[] }
	| { role: "assistant"; content: string | null; tool_calls?: ToolCall[] }
	| { role: "tool"; content: string; tool_call_id: string }

type ContentPart = { type: "text"; text: string } | { type: "image_url"; image_url: { url: string; detail?: string } }

type Tool = {
	type: "function"
	function: {
		name: string
		description?: string
		parameters: object // JSON Schema
	}
}

type ToolChoice = "none" | "auto" | "required" | { type: "function"; function: { name: string } }

Response (non-streaming)

type ChatCompletionResponse = {
	id: string
	object: "chat.completion"
	created: number
	model: string
	choices: Array<{
		index: number
		message: {
			role: "assistant"
			content: string | null
			tool_calls?: ToolCall[]
		}
		finish_reason: "stop" | "length" | "tool_calls" | "content_filter"
	}>
	usage: {
		prompt_tokens: number
		completion_tokens: number
		total_tokens: number
	}
}

Response (streaming)

When stream: true, the response is a series of SSE events:

type ChatCompletionChunk = {
	id: string
	object: "chat.completion.chunk"
	created: number
	model: string
	choices: Array<{
		index: number
		delta: {
			role?: "assistant"
			content?: string
			tool_calls?: ToolCall[]
		}
		finish_reason: string | null
	}>
	// Only in the final chunk
	usage?: {
		prompt_tokens: number
		completion_tokens: number
		total_tokens: number
	}
}

Example request

curl -X POST "https://api.kilo.ai/api/gateway/chat/completions" \
  -H "Authorization: Bearer $KILO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4.5",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is quantum computing?"}
    ],
    "max_tokens": 500,
    "temperature": 0.7
  }'

Example response

{
	"id": "gen-abc123",
	"object": "chat.completion",
	"created": 1739000000,
	"model": "anthropic/claude-sonnet-4.5",
	"choices": [
		{
			"index": 0,
			"message": {
				"role": "assistant",
				"content": "Quantum computing is a type of computation that uses quantum mechanics..."
			},
			"finish_reason": "stop"
		}
	],
	"usage": {
		"prompt_tokens": 25,
		"completion_tokens": 150,
		"total_tokens": 175
	}
}

Tool calling

The gateway supports function/tool calling with automatic repair for common issues like duplicate tool calls and orphan cleanup.

Request with tools

{
	"model": "anthropic/claude-sonnet-4.5",
	"messages": [{ "role": "user", "content": "What's the weather in San Francisco?" }],
	"tools": [
		{
			"type": "function",
			"function": {
				"name": "get_weather",
				"description": "Get the current weather for a location",
				"parameters": {
					"type": "object",
					"properties": {
						"location": {
							"type": "string",
							"description": "City name"
						}
					},
					"required": ["location"]
				}
			}
		}
	],
	"tool_choice": "auto"
}

Tool call response

{
	"choices": [
		{
			"message": {
				"role": "assistant",
				"content": null,
				"tool_calls": [
					{
						"id": "call_abc123",
						"type": "function",
						"function": {
							"name": "get_weather",
							"arguments": "{\"location\":\"San Francisco\"}"
						}
					}
				]
			},
			"finish_reason": "tool_calls"
		}
	]
}

Tool call repair

The gateway automatically handles common tool calling issues:

  • Deduplication: Removes duplicate tool calls with the same ID
  • Orphan cleanup: Removes tool result messages without matching tool calls
  • Missing results: Inserts placeholder results for tool calls without responses
  • ID normalization: Normalizes tool call IDs per provider requirements (Anthropic, Mistral)

FIM completions

Fill-in-the-middle completions for code generation, powered by Mistral Codestral.

POST /api/fim/completions

Request body

type FIMRequest = {
	model: string // Must be a Mistral model (e.g., "mistralai/codestral-2508")
	prompt: string // Code before the cursor
	suffix?: string // Code after the cursor
	max_tokens?: number // Maximum tokens (capped at 1000)
	temperature?: number
	stop?: string[]
	stream?: boolean
}

Example request

curl -X POST "https://api.kilo.ai/api/fim/completions" \
  -H "Authorization: Bearer $KILO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mistralai/codestral-2508",
    "prompt": "def fibonacci(n):\n    if n <= 1:\n        return n\n    ",
    "suffix": "\n\nprint(fibonacci(10))",
    "max_tokens": 200,
    "stream": false
  }'
ℹ️Info

FIM completions are limited to Mistral models (model IDs starting with mistralai/). BYOK is supported with the codestral key type.

List models

Retrieve the list of available models.

GET /models

No authentication required.

Response

Returns an OpenAI-compatible model list:

{
	"data": [
		{
			"id": "anthropic/claude-sonnet-4.5",
			"object": "model",
			"created": 1739000000,
			"owned_by": "anthropic",
			"name": "Claude Sonnet 4.5",
			"context_length": 200000,
			"pricing": {
				"prompt": "0.000003",
				"completion": "0.000015"
			}
		}
	]
}

List providers

Retrieve the list of available providers.

GET /providers

No authentication required.

Error codes

HTTP StatusDescription
400Bad request -- invalid parameters or model ID
401Unauthorized -- invalid or missing API key
402Insufficient balance -- add credits to continue
403Forbidden -- model not allowed by organization policy
429Rate limited -- too many requests
500Internal server error
502Provider error -- upstream provider returned an error
503Service unavailable -- provider temporarily unavailable

Error response format

{
	"error": {
		"message": "Human-readable error description",
		"code": 400
	}
}
ℹ️Info

When the gateway receives a 402 (Payment Required) from an upstream provider, it returns 503 to the client to avoid exposing internal billing details.

Context length errors

If your request exceeds the model's context window, you'll receive a descriptive error:

{
	"error": {
		"message": "This request exceeds the model's context window of 200000 tokens. Your request contains approximately 250000 tokens.",
		"code": 400
	}
}