Codebase Indexing

Codebase Indexing enables semantic code search across your entire project using AI embeddings. Instead of searching for exact text matches, it understands the meaning of your queries, helping Kilo Code find relevant code even when you don't know specific function names or file locations.

ℹ️Opt-in indexing

Codebase Indexing is disabled by default. It starts only after you enable indexing globally or for an individual project. Configuring an embedding provider without enabling one of those toggles does not start indexing.

What It Does

When enabled, the indexing system:

Parses your code using Tree-sitter to identify semantic blocks (functions, classes, methods)
Creates embeddings of each code block using AI models
Stores vectors in a vector database for fast similarity search
Provides the semantic_search tool to Kilo Code for intelligent code discovery

This enables natural language queries like "user authentication logic" or "database connection handling" to find relevant code across your entire project.

Key Benefits

Semantic Search: Find code by meaning, not just keywords
Enhanced AI Understanding: Kilo Code can better comprehend and work with your codebase
Cross-Project Discovery: Search across all files, not just what's open
Pattern Recognition: Locate similar implementations and code patterns

Setup

Configure indexing

Open Kilo Code Settings → Indexing, or click the indexing indicator at the bottom of the prompt input panel.
Turn on Global Enable to index every workspace, or turn on Enable for This Project to index only the current workspace. Both toggles are off until explicitly enabled.
Pick an Embedding Provider and fill in its required fields.
Pick a Vector Store (Qdrant or LanceDB) and configure it.
Optionally adjust Tuning Parameters (search score, batch size, retries, max results).
Save to start the initial scan.

You can also edit the indexing section in kilo.jsonc directly:

{
  "indexing": {
    "enabled": true,
    "provider": "openai",
    "model": "text-embedding-3-small",
    "vectorStore": "lancedb",
    "openai": { "apiKey": "sk-..." },
    "lancedb": {}
  }
}

Embedding providers

Provider	How to use	Notes
OpenAI	API key	Default model: `text-embedding-3-small`. `text-embedding-3-large` for higher accuracy.
Ollama	Local base URL	No API costs. Runs fully offline.
OpenAI-Compatible	Base URL + API key	For self-hosted or third-party OpenAI-compatible endpoints.
Gemini	Google AI API key	Supports `gemini-embedding-001` and other Gemini embedding models.
Mistral	API key from La Plateforme	Use a standard Mistral API key. The Codestral-specific keys from the Mistral autocomplete setup guide are not interchangeable — those only work for completion.
Vercel AI Gateway	API key	Routes requests through Vercel AI Gateway.
AWS Bedrock	AWS region + profile	Uses the AWS SDK credential chain.
OpenRouter	API key (optional specific provider)	Routes through OpenRouter.
Voyage	API key	Voyage `voyage-code-3` is tuned for code.

Vector stores

Qdrant (default) — external server. Recommended for team deployments and larger codebases. See Setting Up Qdrant.
LanceDB — embedded, file-based. No server to run. Stores data under your Kilo data directory by default.

💡Tip

For a fully local, zero-cost setup, combine Ollama (embeddings) with LanceDB (vector store — no separate server needed).

Status indicator

The prompt input panel shows a compact indexing status indicator that reflects the current state (Standby / In Progress / Complete / Error) along with progress when scanning or embedding.

Setting Up Qdrant

If you choose Qdrant as your vector store, you need a running Qdrant server.

Quick Local Setup

Using Docker:

docker run -p 6333:6333 qdrant/qdrant

Using Docker Compose:

version: "3.8"
services:
  qdrant:
    image: qdrant/qdrant
    ports:
      - "6333:6333"
    volumes:
      - qdrant_storage:/qdrant/storage
volumes:
  qdrant_storage:

Production Deployment

For team or production use:

Qdrant Cloud — managed service
Self-hosted on AWS, GCP, or Azure
Local server with network access for team sharing

Understanding Index Status

The interface shows real-time status:

Standby: Not running, awaiting configuration or paused
In Progress: Currently processing files (with a progress percentage and processed/total count)
Complete: Up-to-date and ready for searches
Error: Failed state, with an error message
Disabled: Indexing is turned off or not yet configured

How Files Are Processed

Smart Code Parsing

Tree-sitter Integration: Uses AST parsing to identify semantic code blocks
Language Support: Broad language coverage via Tree-sitter — C, C#, C++, CSS, Elisp, Elixir, Go, HTML, Java, JavaScript, Kotlin, Lua, OCaml, PHP, Python, Ruby, Rust, Scala, Solidity, Swift, SystemRDL, TLA+, TOML, TSX, TypeScript, Vue, Zig, and more
Markdown Support: Dedicated parser for markdown and documentation
Fallback: Line-based chunking for unsupported file types
Block Sizing:
- Minimum: 100 characters
- Maximum: 1,000 characters
- Splits large functions intelligently

Automatic File Filtering

The indexer automatically excludes:

Binary files and images
Large files (>1MB)
Git repositories (.git folders)
Dependencies (node_modules, vendor, etc.)
Files matching .gitignore and .kilocodeignore patterns

Incremental Updates

File Watching: Monitors the workspace for changes and re-indexes in the background
Smart Updates: Only reprocesses modified files
Hash-based Caching: Avoids reprocessing unchanged content
Branch Switching: Automatically handles Git branch changes

Tuning Parameters

These advanced settings live under the indexing key and are exposed in the CLI's /indexing → Tuning Parameters menu and the VS Code extension's Indexing settings:

Setting	Default	Description
`searchMinScore`	`0.4`	Minimum cosine similarity (0-1) for a result to be returned.
`searchMaxResults`	`50`	Maximum number of results returned per search.
`embeddingBatchSize`	`60`	Number of code segments per embedding batch. Lower this if your embedding endpoint has strict rate limits.
`scannerMaxBatchRetries`	`3`	Maximum retry attempts for a failed embedding batch.

Best Practices

Model Selection

OpenAI:

text-embedding-3-small: Best balance of performance and cost
text-embedding-3-large: Higher accuracy, 5x more expensive
text-embedding-ada-002: Legacy model, lower cost

Ollama:

mxbai-embed-large: The largest and highest-quality embedding model
nomic-embed-text: Best balance of performance and embedding quality
all-minilm: Compact model with lower quality but faster performance

Voyage:

voyage-code-3: Code-tuned embeddings; strong default for source-heavy repos

Security Considerations

API Keys: Stored in your kilo.jsonc config. Treat that file as a secret in shared environments.
Code Privacy: Only small code snippets are sent for embedding — never whole files.
Local Processing: All parsing (Tree-sitter) happens locally.
Fully Local Option: Pair Ollama (embeddings) with LanceDB (local vector store) for a setup that never leaves your machine.
Qdrant Security: Use authentication for production deployments.

Current Limitations

File Size: 1MB maximum per file
Single Workspace: One workspace at a time
Dependencies: Requires an embedding provider, and — for Qdrant — a running Qdrant instance
Language Coverage: Optimal parsing is limited to Tree-sitter supported languages

Troubleshooting

Embeddings fail or indexing stalls (llama.cpp / Ollama)

If your local embedding server is based on llama.cpp (including Ollama), indexing can fail with errors about n_ubatch or GGML_ASSERT. Ensure both batch size (-b) and micro-batch size (-ub) are set to the same value for embedding models, then restart the server. For Ollama, configure num_batch in your Modelfile or request options to match the same effective value.

Indexing status stays on "Disabled"

Check that indexing.enabled is true in your kilo.jsonc
Verify that the selected provider has all required credentials set
If using Qdrant, make sure the Qdrant server is reachable at the configured URL

Rate-limit or batch errors with a hosted provider

Lower embeddingBatchSize under indexing (default 60). Smaller batches send fewer segments per request and are less likely to hit per-request or per-minute rate limits.

Using the Search Feature

Once indexed, Kilo Code can use the semantic_search tool to find relevant code:

Example Queries:

"How is user authentication handled?"
"Database connection setup"
"Error handling patterns"
"API endpoint definitions"

The tool provides Kilo Code with:

Relevant code snippets (up to your configured searchMaxResults)
File paths and line numbers
Similarity scores
Contextual information

Search Results Configuration

Tune result volume and quality via:

searchMaxResults — default 50. Lower for faster, more focused responses; higher for more context.
searchMinScore — default 0.4. Raise to require closer matches; lower to include more tangentially related code.

Privacy & Security

Code stays local: Only small code snippets are sent for embedding
Embeddings are numeric: Not human-readable representations
Secure storage: API keys are stored in your local kilo.jsonc configuration
Fully local option: Use Ollama + LanceDB for completely local processing
Access control: Respects existing file permissions and .kilocodeignore