Cost Controls and Usage Safeguards
Overview
How much you spend with Kilo Code is shaped by several factors working together:
- Model selection — frontier models cost significantly more per token than efficient or free tiers
- Prompt and context size — every token in your system prompt, conversation history, file attachments, and tool definitions is billed as input
- Number of agent steps and tool calls — each step the agent takes (read a file, run a command, write code) generates its own request
- Repeated retries or loops — a stuck agent that keeps retrying the same failing step multiplies your cost
- Background and automated tasks — long-running or unattended tasks accumulate cost without immediate visibility
- Session length — long sessions carry more conversation history into every new request
No single control eliminates cost on its own. The most effective approach combines model selection, context management, task scope, and account-level monitoring together.
This page covers the controls currently available in Kilo Code. For a direct overview of Auto Model tiers and token optimization tips, see Cost Efficiency & Model Selection.
Preventing Loops and Runaway Usage
Doom loop protection
When the agent enters a repeated failure cycle — attempting the same action multiple times without making progress — Kilo pauses and asks for permission before continuing. This is controlled by the doom_loop permission, which defaults to ask.
Where to configure: Settings → Auto Approve (VS Code) or the permission.doom_loop key in kilo.jsonc (CLI).
When to use: Leave this at ask (the default) unless you are running fully unattended automation in a controlled environment. Setting it to deny blocks recovery entirely; allow lets loops continue without interruption.
Per-tool approval controls
Every action Kilo takes — reading files, editing code, running shell commands, launching sub-agents — is governed by the permission system. Each tool can be set to allow, ask, or deny. When set to ask, Kilo pauses before executing and you can approve or reject that specific action.
Where to configure: Settings → Auto Approve (VS Code) or the permission section in kilo.jsonc (CLI). See Auto-Approving Actions for the full list of available permissions.
When to use: Keep bash set to ask by default for unfamiliar tasks. You can allow specific safe command prefixes (e.g. git *, npm *) while keeping everything else at ask. This prevents the agent from running expensive or destructive commands in a loop without oversight.
Runtime auto-approve toggle (VS Code)
A shield button in the prompt controls lets you toggle auto-approve on and off at runtime without opening Settings. When enabled, pending permission prompts are approved automatically. The state stays synced across the sidebar and open Kilo tabs.
When to use: Turn it on when working on a well-understood, low-risk task that does not need step-by-step review. Turn it off as soon as you want to pause and review the agent's next actions.
Spending limits
Individual accounts stop spending when their balance reaches zero — further requests to paid models return an error and prompt you to add credits. This acts as a hard ceiling on total spend.
Organization accounts can additionally configure per-user daily spending limits. When a member reaches their daily cap, subsequent requests are blocked until midnight UTC, when the limit resets.
Where to configure: Organization spending limits are managed in the organization dashboard at app.kilo.ai. Individual credit top-up is at Settings → Adding Credits.
Free model rate limits
Requests to free models (kilo-auto/free and other free-tier models) are rate-limited to 200 requests per hour. If you exceed this, requests return HTTP 429 and you must wait before continuing.
Practical recommendations
- Define a narrow scope before starting. "Fix the null pointer in
processData" generates far fewer steps than "keep fixing everything that looks wrong." Specificity reduces both steps and cost. - Ask the agent to stop and report after a stage. Phrases like "analyze the problem and summarize your findings before making any changes" let you review the plan and cost before committing to implementation.
- Review plans before allowing broad execution. Use Architect mode (which cannot modify code) to get a plan first, then switch to Code mode to apply it incrementally.
- Monitor long-running or unattended tasks. Check the per-request cost estimates in the chat history as the session progresses. If cost is climbing unexpectedly, pause and review what the agent is doing.
- Avoid open-ended prompts. Prompts like "keep trying until it works" or "explore the whole codebase and clean it up" give the agent unlimited scope to continue generating steps.
Reducing Context and Token Usage
Start a new session when the topic changes
Conversation history accumulates with every turn. When you finish a task and move on to something unrelated, starting a new session resets the context to just your system prompt and instructions — significantly cheaper than carrying the full prior conversation.
Keep prompts focused
Concise, specific prompts cost fewer tokens. Instead of pasting large blocks of background, describe what you need in plain terms and let the agent ask for files if it needs them. Repeating context you already provided earlier in the session is rarely necessary.
Use @file and @folder mentions selectively
Attaching an entire folder sends every file in it as input tokens, even if only one or two files are relevant. Use @file with specific paths rather than broad directory mentions. When reviewing a bug, include only the file containing the bug and its closest dependencies.
Exclude generated, build, and vendor directories
Kilo automatically skips a set of directories including node_modules, dist, build, .git, __pycache__, .cache, and vendor. You can add additional paths using permission deny rules in kilo.jsonc or using a .kilocodeignore file at your workspace root.
{
"permission": {
"read": {
"coverage/**": "deny",
".next/**": "deny",
"*": "allow"
}
}
}
Where to configure: kilo.jsonc (VS Code / CLI). See .kilocodeignore for full details.
Compact long conversations
When a conversation grows long, use /compact in the chat (also searchable as smol or condense) to summarize the history and free up context space. Kilo replaces older conversation turns with an anchored summary that captures your goal, constraints, progress, and next steps.
Auto-compaction is enabled by default — Kilo automatically compacts when approaching the context window limit so you do not need to intervene manually.
Where to configure: Toggle auto-compaction in Settings → Context (VS Code) or set compaction.auto in kilo.jsonc. Configure the trigger threshold with compaction.threshold_percent (e.g. 80 to compact at 80% of the model's context window).
You can also configure a cheaper model specifically for compaction, so summarization does not consume frontier model tokens:
{
"agent": {
"compaction": {
"model": "anthropic/claude-haiku-4-5"
}
}
}
See Context Condensing for full configuration options.
Keep max output tokens conservative
Every token you allocate to model output reduces how much conversation history can remain in the context window. For routine coding tasks, keep Code mode at 16k max output tokens or below. Raise the limit only in Architect or Debug modes where extended reasoning is useful.
Where to configure: Model settings in the Kilo Code UI, or the limit.output key in custom model configuration.
Use project instructions efficiently
Encode recurring guidance — coding standards, project conventions, preferred libraries — in your AGENTS.md or custom instructions once. This avoids repeating the same context in every prompt, and prompt caching means stable instructions are served from cache at a discounted rate on supported providers.
Disable unused MCP servers
MCP tool definitions are included in the system prompt sent with every request. If you are not using MCP features, disable MCP servers in Settings → Agent Behaviour → MCP Servers. This can meaningfully reduce per-request system prompt size.
See MCP Overview for details.
Prompt caching
Kilo automatically applies prompt caching on supported providers. Repeated context — your system prompt, stable file contents, and tool definitions — is reused from cache at a discounted rate. No configuration is required to benefit from this.
Choosing Models for Specific Tasks
Different tasks benefit from different model characteristics. Routing work to the right model reduces cost without sacrificing quality. Kilo has auto-models that can help you control costs; more information is available in Auto Model.
Practical examples by task type
| Task type | Suggested approach |
|---|---|
| Quick questions, syntax lookups, simple formatting | kilo-auto/efficient or a lightweight model |
| Routine edits, test generation, straightforward refactors | kilo-auto/efficient or a mid-tier model |
| Complex debugging, tracing unexpected behavior | kilo-auto/frontier or a strong reasoning model; Debug mode |
| Architecture planning, design decisions | kilo-auto/frontier; Architect mode |
| Repository-wide analysis or search | A model with a large context window (256K+); Architect mode |
| Code review and summarization | kilo-auto/efficient or a cost-effective model |
| Automated background tasks (CI, scripting) | kilo-auto/efficient or kilo-auto/free |
Manually selecting a model
Use the model selector dropdown in the Kilo Code chat interface to switch models for the current session. In the CLI, pass the --model flag to kilo run or use the model picker in the TUI (Ctrl+X m or /models).
Configuring a model per agent or mode
You can set a default model for each agent (Code, Architect, Debug, Plan, or a custom subagent) independently:
- VS Code: Settings → Models → Model per Mode, or edit
kilo.jsoncdirectly. - CLI: Set
agent.<name>.modelinkilo.jsonc.
{
"agent": {
"code": {
"model": "kilo-auto/efficient"
},
"architect": {
"model": "kilo-auto/frontier"
}
}
}
This lets you run cost-effective models for implementation while automatically routing planning tasks to a more capable model.
Organization-level model restrictions
Enterprise organizations can restrict which models team members may use. See Enterprise Cost Controls below.
For full guidance on model selection, see the Model Selection Guide.
Enterprise Cost Controls
The following controls are available to organizations and, where noted, are exclusive to Enterprise plans.
Model access controls (Enterprise only)
Enterprise organization owners can block specific models or entire providers for all team members using the Providers & Models page in the dashboard. The system uses a blocklist approach — everything is allowed by default, and admins explicitly block what should not be accessible.
Blocking a provider blocks all current and future models from that provider. Filters are available for:
- Data policy (trains on prompts, retains prompts)
- Provider location / datacenter region
- Specific model ID
Only Owners can modify model access controls. Individual members cannot override organization-level restrictions.
Where to configure: Dashboard → Providers and Models (Enterprise only). See Model Access Controls.
How it helps: Prevents accidental use of high-cost frontier models; enforces data residency or compliance requirements; limits cost surface by allowing only approved models.
Shared credit pool and auto top-up
All organization members draw from a single shared credit balance. Administrators can configure:
- Auto top-up: Automatically replenish credits when the balance drops below a threshold (minimum $50 balance, minimum $100 purchase)
- Minimum balance alerts: Email notifications when the balance drops below a configured amount
Where to configure: Dashboard → Billing.
Usage analytics
The Usage tab of the organization dashboard provides:
- Total spend, request count, average cost per request, total tokens, and active users for any selected time period (past week, month, year, or all time)
- Usage broken down by day, by model and day, or by project
- Per-user attribution — individual usage statistics visible to Owners and Admins
This gives administrators visibility into which team members, models, and projects are driving the majority of spend.
Where to access: app.kilo.ai → Usage tab.
Administrative permissions
Dashboard administrative actions (model restrictions, spending limits, billing management) are gated by role. Only Owners can modify model access controls and organization-level settings. Owners and Admins can view per-user usage data.
Recommended Configurations
Cost-conscious individual developer
- Use
kilo-auto/efficientas the default model - Switch to
kilo-auto/freefor low-stakes questions and exploration - Enable auto-compaction (on by default); set
compaction.threshold_percent: 80to compact earlier - Set Code agent max output tokens to 16k or below
- Keep
doom_looppermission atask - Start a new session whenever you switch to an unrelated task
- Use
@filementions with specific paths instead of@folderfor whole directories - Disable any MCP servers you are not actively using
Developer working on a large repository
- Use Architect mode for initial codebase exploration — it cannot modify code, keeping exploration cost lower
- Use
@filementions with specific paths instead of attaching whole directories - Add generated and build directories to permission deny rules (
coverage/**,.next/**, etc.) - Configure a cheap model for compaction (
anthropic/claude-haiku-4-5or equivalent) - Consider using a model with a large context window (256K+) for cross-file analysis tasks
- Break large tasks into focused sub-tasks rather than asking for a single comprehensive change
Team using multiple models
- Assign
kilo-auto/efficientto Code and Debug agents for everyday work - Assign
kilo-auto/frontierto Architect (or Plan) agent for planning tasks - Set
kilo-auto/efficientas the compaction model for all agents - If on an Enterprise plan, use Providers & Models to block high-cost models that are not needed for your team's typical work
{
"agent": {
"code": { "model": "kilo-auto/efficient" },
"debug": { "model": "kilo-auto/efficient" },
"architect": { "model": "kilo-auto/frontier" },
"compaction": { "model": "anthropic/claude-haiku-4-5" }
}
}
Maximum-constraint starter configuration
The snippet below is a ready-to-copy kilo.jsonc that turns on every available cost-control knob at its most restrictive setting. Drop it into your project root (or your global ~/.config/kilo/kilo.jsonc) and adjust individual values upward as you get comfortable with how each one behaves.
This configuration uses only kilo-auto/efficient.
{
"$schema": "https://app.kilo.ai/config.json",
// ── Model selection ──────────────────────────────────────────────────────
// Route all requests through the two lowest-cost Kilo Auto tiers.
// kilo-auto/efficient: lowest-cost paid tier (classifies each request by
// difficulty and routes to the cheapest benchmark-proven model).
"model": "kilo-auto/efficient",
"subagent_model": "kilo-auto/efficient", // default model for Task-tool subagents
// ── Per-agent model and step limits ─────────────────────────────────────
// Assign the cheapest suitable tier to each agent and cap how many
// agentic iterations it may take before it must produce a text-only reply.
// Raise `steps` for agents that need more room; lower it to tighten cost.
"agent": {
"code": {
"model": "kilo-auto/efficient",
"steps": 20 // hard cap on agentic iterations per turn
},
"plan": {
"model": "kilo-auto/efficient",
"steps": 10
},
"debug": {
"model": "kilo-auto/efficient",
"steps": 20
},
"ask": {
"model": "kilo-auto/efficient",
"steps": 5
},
"orchestrator": {
"model": "kilo-auto/efficient",
"steps": 10
},
"explore": {
"model": "kilo-auto/free",
"steps": 15
},
"general": {
"model": "kilo-auto/efficient",
"steps": 15
},
// Dedicated agents for background summarization
"compaction": { "model": "kilo-auto/free" },
"title": { "model": "kilo-auto/free" },
"summary": { "model": "kilo-auto/free" }
},
// ── Compaction (context management) ─────────────────────────────────────
// Auto-compact aggressively to keep conversation history short and cheap.
"compaction": {
"auto": true, // enable automatic compaction (default: true)
"threshold_percent": 50, // compact when context reaches 50% full (default: ~80%)
"prune": true, // prune old tool outputs to recover context space
"tail_turns": 1, // keep only 1 recent user-turn verbatim after compaction
"preserve_recent_tokens": 2000, // cap on tokens preserved verbatim from recent turns
"reserved": 8000 // token buffer reserved so compaction itself doesn't overflow
},
// ── Tool output truncation ───────────────────────────────────────────────
// Clip large tool responses early so they don't bloat the context window.
// Increase these if the agent needs more output (e.g. long test logs).
"tool_output": {
"max_lines": 500, // default: 2000 lines
"max_bytes": 10240 // default: 51200 bytes (~50 KB)
},
// ── Permission safeguards ────────────────────────────────────────────────
// "ask" means Kilo pauses and requires your approval before executing.
// This prevents runaway loops from autonomously consuming tokens or making
// irreversible changes. Flip individual entries to "allow" once you trust them.
"permission": {
"bash": "ask", // shell commands (highest risk of runaway cost)
"edit": "ask", // file writes and edits
"task": "ask", // launching sub-agents (each sub-agent = extra LLM requests)
"webfetch": "ask", // outbound HTTP fetches
"websearch": "ask", // web search calls
"doom_loop": "ask" // repeated-failure loop detection — always keep at "ask" or "deny"
},
}
Every field in this block is documented in the sections above. Use it as a starting point, then relax individual settings (for example, setting permission.edit to "allow" for a trusted project, or raising compaction.threshold_percent to 70 if compaction feels too aggressive) as you build confidence in how the agent behaves.
Troubleshooting Unexpected Usage
If your spend is higher than expected:
- Check your usage dashboard at app.kilo.ai/usage for a breakdown by day, model, and project
- Review the model in use — an accidental switch to a frontier model for routine tasks can significantly raise costs
- Look for long sessions — sessions that were never compacted carry their full history as input tokens on every request; use
/compactto reset them - Check MCP server configuration — unused MCP servers add tool definitions to every system prompt
- Review permission settings — auto-approving all actions with no
doom_loopguard removes the friction that normally slows down runaway loops
For further reading: 4 Levers to Take Control of Your AI Spend
Related
- Cost Efficiency & Model Selection — Auto Model tier comparison, rate limits, and per-request cost calculation
- Auto Model — Full details on each Auto Model tier and routing strategy
- Context Condensing — How compaction works and all configuration options
- Auto-Approving Actions — Permission system reference for VS Code and CLI
- Model Access Controls — Enterprise model and provider blocklist configuration
- Usage & Billing — Gateway billing mechanics and organization controls