Context Condensing

Overview

When working on complex tasks, conversations with Kilo Code can grow long and consume a significant portion of the AI model's context window. Context Condensing is a feature that intelligently summarizes your conversation history, reducing token usage while preserving the essential information needed to continue your work effectively.

The Problem: Context Window Limits

Every AI model has a maximum context window — a limit on how much text it can process at once. As your conversation grows with code snippets, file contents, and back-and-forth discussions, you may approach this limit. When this happens, you might experience:

  • Slower responses as the model processes more tokens
  • Higher API costs due to increased token usage
  • Eventually hitting the context limit and being unable to continue

The Solution: Auto-Compaction

Kilo Code uses a Compaction system to manage context automatically. When your conversation approaches the token limit, compaction kicks in and produces an anchored summary that captures:

  • The overall goal of the session
  • Constraints and preferences you gave along the way
  • Progress, key decisions, and next steps
  • Critical context needed to continue
  • Relevant files and directories

This summary replaces older conversation history while Kilo keeps the most recent turns verbatim when they fit. If a session has already been compacted, Kilo updates the previous summary instead of starting over, preserving still-relevant details and removing stale ones.

How Compaction Triggers

Automatic trigger

Kilo tracks the total token count for the session — input, output, and cached reads and writes — and compares it to the model's context window. Compaction runs when the total fills the window minus a reserved buffer of headroom kept free for the next turn.

How the buffer is chosen depends on what the model declares. When the model advertises a separate input limit, the buffer defaults to 20,000 tokens (or the model's maximum output size, whichever is smaller). When the model only declares a single context window, Kilo instead reserves the model's full output cap — up to 32,000 tokens.

Custom models that do not declare a context window are not tracked, and auto-compaction does not run for them.

Context Pruning

Between turns, Kilo also runs a lighter prune pass. It walks completed tool outputs outside a 40,000-token recency window and replaces them with "[Old tool result content cleared]". Pruning runs incrementally so large tool outputs don't consume space forever, even before full compaction is needed.

Manual Compaction

You can trigger compaction at any time:

  • Slash command: type /compact in chat (also findable by typing smol or condense)
  • Task header button: click the compact icon in the active task header
  • Settings: toggle auto-compaction in Settings → Context

Defaults

SettingDefaultEffect
compaction.autotrueAutomatically compact when the usable window is reached
compaction.prunetrueClear old tool outputs beyond the 40K recency window
compaction.tail_turns2Keep the most recent user turns and their responses verbatim when possible
compaction.preserve_recent_tokens25% of usable context, clamped between 2,000 and 8,000 tokensToken budget for the verbatim recent tail
compaction.reservedmin(20,000, model_max_output_tokens)Token headroom kept free for the next turn — also defines the compaction trigger point

Configuration

Compaction is configured in your kilo.jsonc file:

{
  "compaction": {
    "auto": true, // Enable or disable automatic compaction
    "prune": true, // Enable pruning of old tool outputs beyond the recency window
    "tail_turns": 2, // Recent user turns to keep verbatim during compaction
    "preserve_recent_tokens": 8000, // Maximum token budget for the recent tail
    "reserved": 20000, // Token buffer kept free; smaller = later trigger, larger = earlier trigger
  },
}
OptionTypeDefaultDescription
compaction.autobooleantrueEnable or disable automatic compaction when the usable window is reached
compaction.prunebooleantrueEnable pruning of old tool outputs outside the 40K token recency window
compaction.tail_turnsnumber2Number of recent user turns, including following assistant and tool responses, to keep verbatim during compaction
compaction.preserve_recent_tokensnumber25% of usable context, clamped between 2,000 and 8,000 tokensMaximum token budget for recent turns kept verbatim after compaction
compaction.reservednumbermin(20000, model_max_output)Token headroom reserved for the next turn. Applies only to models that advertise a separate input limit; models with a single context window use their full output cap as the reserve instead.

Use a different model for compaction

Summarization can use a cheaper or larger-context model than your main agent. Configure a dedicated compaction agent:

{
  "agent": {
    "compaction": {
      "model": "anthropic/claude-haiku-4-5",
    },
  },
}

If no compaction agent is set, the current session's model is used.

Environment overrides

VariableEffect
KILO_DISABLE_AUTOCOMPACT=1Forces compaction.auto = false
KILO_DISABLE_PRUNE=1Forces compaction.prune = false
KILO_EXPERIMENTAL_OUTPUT_TOKEN_MAXOverrides the 32,000 default output-token ceiling

Best Practices

When to Compact

  • Long sessions: If you've been working for an extended period on a complex task
  • Before major transitions: When switching to a different aspect of your project
  • When approaching limits: Run /compact manually before hitting the automatic trigger if you want control over when the summary is produced

Tuning compaction.reserved

On models that advertise a separate input limit, the reserved value is a trade-off:

  • Lower value (e.g. 10000) → compaction triggers later, you get more turns out of the raw window, but you risk a mid-turn context overflow if a single response is larger than the buffer.
  • Higher value (e.g. 40000) → compaction triggers earlier, fewer overflow errors, but shorter effective conversations between summaries.

The default of ~20K is tuned to leave room for a full-size assistant response plus tool output. The setting has no effect on models with a single context window, which always reserve their full output cap instead.

Maintaining Context Quality

  • Be specific in your initial task: A clear task description helps create better summaries
  • Use AGENTS.md: Combine with AGENTS.md for persistent project context that doesn't need to be compacted
  • Review the summary: After compaction, the summary is visible in your chat history