Context Condensing
Overview
When working on complex tasks, conversations with Kilo Code can grow long and consume a significant portion of the AI model's context window. Context Condensing is a feature that intelligently summarizes your conversation history, reducing token usage while preserving the essential information needed to continue your work effectively.
The Problem: Context Window Limits
Every AI model has a maximum context window — a limit on how much text it can process at once. As your conversation grows with code snippets, file contents, and back-and-forth discussions, you may approach this limit. When this happens, you might experience:
- Slower responses as the model processes more tokens
- Higher API costs due to increased token usage
- Eventually hitting the context limit and being unable to continue
The Solution: Auto-Compaction
Kilo Code uses a Compaction system to manage context automatically. When your conversation approaches the token limit, compaction kicks in and produces a structured summary that captures:
- The overall goal of the session
- Instructions given along the way
- Key discoveries made
- What has been accomplished so far
- Relevant files and directories
This summary replaces the earlier conversation history, freeing up context window space while maintaining continuity in your work.
How Compaction Triggers
Automatic trigger
Kilo tracks the total token count for the session — input, output, and cached reads and writes — and compares it to the model's context window. Compaction runs when the total fills the window minus a reserved buffer of headroom kept free for the next turn.
How the buffer is chosen depends on what the model declares. When the model advertises a separate input limit, the buffer defaults to 20,000 tokens (or the model's maximum output size, whichever is smaller). When the model only declares a single context window, Kilo instead reserves the model's full output cap — up to 32,000 tokens.
Custom models that do not declare a context window are not tracked, and auto-compaction does not run for them.
Context Pruning
Between turns, Kilo also runs a lighter prune pass. It walks completed tool outputs outside a 40,000-token recency window and replaces them with "[Old tool result content cleared]". Pruning runs incrementally so large tool outputs don't consume space forever, even before full compaction is needed.
Manual Compaction
You can trigger compaction at any time:
- Slash command: type
/compactin chat (also findable by typingsmolorcondense) - Task header button: click the compact icon in the active task header
- Settings: toggle auto-compaction in Settings → Context
Defaults
| Setting | Default | Effect |
|---|---|---|
compaction.auto | true | Automatically compact when the usable window is reached |
compaction.prune | true | Clear old tool outputs beyond the 40K recency window |
compaction.reserved | min(20,000, model_max_output_tokens) | Token headroom kept free for the next turn — also defines the compaction trigger point |
Configuration
Compaction is configured in your kilo.jsonc file:
{
"compaction": {
"auto": true, // Enable or disable automatic compaction
"prune": true, // Enable pruning of old tool outputs beyond the recency window
"reserved": 20000, // Token buffer kept free; smaller = later trigger, larger = earlier trigger
},
}
| Option | Type | Default | Description |
|---|---|---|---|
compaction.auto | boolean | true | Enable or disable automatic compaction when the usable window is reached |
compaction.prune | boolean | true | Enable pruning of old tool outputs outside the 40K token recency window |
compaction.reserved | number | min(20000, model_max_output) | Token headroom reserved for the next turn. Applies only to models that advertise a separate input limit; models with a single context window use their full output cap as the reserve instead. |
Use a different model for compaction
Summarization can use a cheaper or larger-context model than your main agent. Configure a dedicated compaction agent:
{
"agent": {
"compaction": {
"model": "anthropic/claude-haiku-4-5",
},
},
}
If no compaction agent is set, the current session's model is used.
Environment overrides
| Variable | Effect |
|---|---|
KILO_DISABLE_AUTOCOMPACT=1 | Forces compaction.auto = false |
KILO_DISABLE_PRUNE=1 | Forces compaction.prune = false |
KILO_EXPERIMENTAL_OUTPUT_TOKEN_MAX | Overrides the 32,000 default output-token ceiling |
Best Practices
When to Compact
- Long sessions: If you've been working for an extended period on a complex task
- Before major transitions: When switching to a different aspect of your project
- When approaching limits: Run
/compactmanually before hitting the automatic trigger if you want control over when the summary is produced
Tuning compaction.reserved
On models that advertise a separate input limit, the reserved value is a trade-off:
- Lower value (e.g.
10000) → compaction triggers later, you get more turns out of the raw window, but you risk a mid-turn context overflow if a single response is larger than the buffer. - Higher value (e.g.
40000) → compaction triggers earlier, fewer overflow errors, but shorter effective conversations between summaries.
The default of ~20K is tuned to leave room for a full-size assistant response plus tool output. The setting has no effect on models with a single context window, which always reserve their full output cap instead.
Maintaining Context Quality
- Be specific in your initial task: A clear task description helps create better summaries
- Use AGENTS.md: Combine with AGENTS.md for persistent project context that doesn't need to be compacted
- Review the summary: After compaction, the summary is visible in your chat history
Related Features
- AGENTS.md - Persistent context storage across sessions
- Large Projects - Managing context for large codebases
- Codebase Indexing - Efficient code search and retrieval