Models & Providers

The Kilo AI Gateway provides access to hundreds of AI models through a single unified API. You can switch between models by changing the model ID string -- no code changes required.

Specifying a model

Models are identified using the format provider/model-name. Pass this as the model parameter in your request:

const result = streamText({
  model: kilo.chat("anthropic/claude-sonnet-4.6"),
  prompt: "Hello!",
})

Or in a raw API request:

{
  "model": "anthropic/claude-sonnet-4.6",
  "messages": [{ "role": "user", "content": "Hello!" }]
}

Available models

You can browse the full list of available models via the models endpoint:

GET https://api.kilo.ai/api/gateway/models

This returns model information including pricing, context window, and supported features. No authentication is required.

Model IDProviderDescription
anthropic/claude-opus-4.6AnthropicMost capable Claude model for complex reasoning
anthropic/claude-sonnet-4.6AnthropicBalanced performance and cost
anthropic/claude-haiku-4.5AnthropicFast and cost-effective
openai/gpt-5.4OpenAILatest GPT model
openai/gpt-5.4-miniOpenAIFast and efficient
google/gemini-3.1-pro-previewGoogleAdvanced reasoning
google/gemini-2.5-flashGoogleFast and efficient
x-ai/grok-4xAIMost capable Grok model
x-ai/grok-code-fast-1xAIOptimized for code tasks
deepseek/deepseek-v3.2DeepSeekStrong coding and reasoning model
moonshotai/kimi-k2.5MoonshotStrong coding and multilingual model
minimax/minimax-m2.7MiniMaxHigh-performance MoE model

Free models

Several models are available at no cost, subject to rate limits:

Model IDDescription
bytedance-seed/dola-seed-2.0-pro:freeByteDance Dola Seed 2.0 Pro
x-ai/grok-code-fast-1:optimized:freexAI Grok Code Fast 1 Optimized
nvidia/nemotron-3-super-120b-a12b:freeNVIDIA Nemotron 3 Super 120B
arcee-ai/trinity-large-thinking:freeArcee Trinity Large
openrouter/freeBest available free model

Free models are available to both authenticated and anonymous users. Anonymous users are rate-limited to 200 requests per hour per IP address.

⚠️Nemotron 3 Super Free (NVIDIA free endpoints)

Provided under the NVIDIA API Trial Terms of Service. Trial use only — not for production or sensitive data. Prompts and outputs are logged by NVIDIA to improve its models and services. Do not submit personal or confidential data.

Auto models

Kilo Auto virtual models automatically select the best underlying model based on the task type. The selection is controlled by the x-kilocode-mode request header.

kilo-auto/frontier

Highest performance and capability for any task.

ModeResolved Model
plan, general, architect, orchestrator, ask, debuganthropic/claude-opus-4.6
build, explore, codeanthropic/claude-sonnet-4.6
Default (no mode specified)anthropic/claude-sonnet-4.6

kilo-auto/balanced

Great balance of price and capability.

ModeResolved Model
plan, general, architect, orchestrator, ask, debugopenai/gpt-5.3-codex
build, explore, codeopenai/gpt-5.3-codex
Default (no mode specified)openai/gpt-5.3-codex

kilo-auto/free

Free with limited capability. No credits required.

ModeResolved Model
All modesminimax/minimax-m2.5

kilo-auto/small

Automatically routes to a small, fast model.

ModeResolved Model
Defaultopenai/gpt-5-nano
Free fallbackopenai/gpt-oss-20b

Example usage

{
  "model": "kilo-auto/frontier",
  "messages": [{ "role": "user", "content": "Help me design a database schema" }]
}

With the mode header:

curl -X POST "https://api.kilo.ai/api/gateway/chat/completions" \
  -H "Authorization: Bearer $KILO_API_KEY" \
  -H "x-kilocode-mode: plan" \
  -H "Content-Type: application/json" \
  -d '{"model": "kilo-auto/balanced", "messages": [{"role": "user", "content": "Design a database schema"}]}'