Context Condensing

Overview

When working on complex tasks, conversations with Kilo Code can grow long and consume a significant portion of the AI model's context window. Context Condensing is a feature that intelligently summarizes your conversation history, reducing token usage while preserving the essential information needed to continue your work effectively.

The Problem: Context Window Limits

Every AI model has a maximum context window — a limit on how much text it can process at once. As your conversation grows with code snippets, file contents, and back-and-forth discussions, you may approach this limit. When this happens, you might experience:

  • Slower responses as the model processes more tokens
  • Higher API costs due to increased token usage
  • Eventually hitting the context limit and being unable to continue

The Solution: Auto-Compaction

Kilo Code uses a Compaction system to manage context automatically. When your conversation approaches the token limit, compaction kicks in and produces a structured summary that captures:

  • The overall goal of the session
  • Instructions given along the way
  • Key discoveries made
  • What has been accomplished so far
  • Relevant files and directories

This summary replaces the earlier conversation history, freeing up context window space while maintaining continuity in your work.

How Compaction Triggers

Automatic trigger

Kilo tracks the total token count for the session — input, output, and cached reads and writes — and compares it to the model's context window. Compaction runs when the total fills the window minus a reserved buffer of headroom kept free for the next turn.

How the buffer is chosen depends on what the model declares. When the model advertises a separate input limit, the buffer defaults to 20,000 tokens (or the model's maximum output size, whichever is smaller). When the model only declares a single context window, Kilo instead reserves the model's full output cap — up to 32,000 tokens.

Custom models that do not declare a context window are not tracked, and auto-compaction does not run for them.

Context Pruning

Between turns, Kilo also runs a lighter prune pass. It walks completed tool outputs outside a 40,000-token recency window and replaces them with "[Old tool result content cleared]". Pruning runs incrementally so large tool outputs don't consume space forever, even before full compaction is needed.

Manual Compaction

You can trigger compaction at any time:

  • Slash command: type /compact in chat (also findable by typing smol or condense)
  • Task header button: click the compact icon in the active task header
  • Settings: toggle auto-compaction in Settings → Context

Defaults

SettingDefaultEffect
compaction.autotrueAutomatically compact when the usable window is reached
compaction.prunetrueClear old tool outputs beyond the 40K recency window
compaction.reservedmin(20,000, model_max_output_tokens)Token headroom kept free for the next turn — also defines the compaction trigger point

Configuration

Compaction is configured in your kilo.jsonc file:

{
  "compaction": {
    "auto": true, // Enable or disable automatic compaction
    "prune": true, // Enable pruning of old tool outputs beyond the recency window
    "reserved": 20000, // Token buffer kept free; smaller = later trigger, larger = earlier trigger
  },
}
OptionTypeDefaultDescription
compaction.autobooleantrueEnable or disable automatic compaction when the usable window is reached
compaction.prunebooleantrueEnable pruning of old tool outputs outside the 40K token recency window
compaction.reservednumbermin(20000, model_max_output)Token headroom reserved for the next turn. Applies only to models that advertise a separate input limit; models with a single context window use their full output cap as the reserve instead.

Use a different model for compaction

Summarization can use a cheaper or larger-context model than your main agent. Configure a dedicated compaction agent:

{
  "agent": {
    "compaction": {
      "model": "anthropic/claude-haiku-4-5",
    },
  },
}

If no compaction agent is set, the current session's model is used.

Environment overrides

VariableEffect
KILO_DISABLE_AUTOCOMPACT=1Forces compaction.auto = false
KILO_DISABLE_PRUNE=1Forces compaction.prune = false
KILO_EXPERIMENTAL_OUTPUT_TOKEN_MAXOverrides the 32,000 default output-token ceiling

Best Practices

When to Compact

  • Long sessions: If you've been working for an extended period on a complex task
  • Before major transitions: When switching to a different aspect of your project
  • When approaching limits: Run /compact manually before hitting the automatic trigger if you want control over when the summary is produced

Tuning compaction.reserved

On models that advertise a separate input limit, the reserved value is a trade-off:

  • Lower value (e.g. 10000) → compaction triggers later, you get more turns out of the raw window, but you risk a mid-turn context overflow if a single response is larger than the buffer.
  • Higher value (e.g. 40000) → compaction triggers earlier, fewer overflow errors, but shorter effective conversations between summaries.

The default of ~20K is tuned to leave room for a full-size assistant response plus tool output. The setting has no effect on models with a single context window, which always reserve their full output cap instead.

Maintaining Context Quality

  • Be specific in your initial task: A clear task description helps create better summaries
  • Use AGENTS.md: Combine with AGENTS.md for persistent project context that doesn't need to be compacted
  • Review the summary: After compaction, the summary is visible in your chat history