Free and Budget Models

Why this matters: AI model costs can add up quickly during development. This guide shows you how to use Kilo Code effectively while minimizing or eliminating costs through free models, budget-friendly alternatives, and smart usage strategies.

Completely Free Options

Kilo Gateway Free Models

From time to time, Kilo works with AI inference providers to offer free models. These are available through the Kilo Gateway. Currently, we are offering these free models:

MiniMax M2.1 (free) - A capable model from MiniMax with strong general-purpose performance.
Z.AI: GLM 4.7 (free) - Latest variant of the GLM family, purpose-built for agent-centric applications.
MoonshotAI: Kimi K2.5 (free) - Optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis.
Giga Potato (free) - A stealth release model that is free in its evaluation period.
Arcee AI: Trinity Large Preview (free) - A preview model from Arcee AI with strong capabilities.

OpenRouter Free Tier Models

OpenRouter offers several models with generous free tiers. Note: You'll need to create a free OpenRouter account to access these models.

Setup:

Create a free OpenRouter account
Get your API key from the dashboard
Configure Kilo Code with the OpenRouter provider

Available free models:

Qwen3 Coder (free) - Optimized for agentic coding tasks such as function calling, tool use, and long-context reasoning over repositories.
Z.AI: GLM 4.5 Air (free) - Lightweight variant of the GLM-4.5 family, purpose-built for agent-centric applications.
DeepSeek: R1 0528 (free) - Performance on par with OpenAI o1, but open-sourced and with fully open reasoning tokens.
MoonshotAI: Kimi K2 (free) - Optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis.

Cost-Effective Premium Models

When you need more capability than free models provide, these options deliver excellent value:

Ultra-Budget Champions (Under $0.50 per million tokens)

Mistral Devstral Small

Cost: ~$0.20 per million input tokens
Best for: Code generation, debugging, refactoring
Performance: 85% of premium model capability at 10% of the cost

Llama 4 Maverick

Cost: ~$0.30 per million input tokens
Best for: Complex reasoning, architecture planning
Performance: Excellent for most development tasks

DeepSeek v3

Cost: ~$0.27 per million input tokens
Best for: Code analysis, large codebase understanding
Performance: Strong technical reasoning

Mid-Range Value Models ($0.50-$2.00 per million tokens)

Qwen3 235B

Cost: ~$1.20 per million input tokens
Best for: Complex projects requiring high accuracy
Performance: Near-premium quality at 40% of the cost

Smart Usage Strategies

The 50% Rule

Principle: Use budget models for 50% of your tasks, premium models for the other 50%.

Budget model tasks:

Code reviews and analysis
Documentation writing
Simple bug fixes
Boilerplate generation
Refactoring existing code

Premium model tasks:

Complex architecture decisions
Debugging difficult issues
Performance optimization
New feature design
Critical production code

Context Management for Cost Savings

Minimize context size:

// Instead of mentioning entire files
@src/components/UserProfile.tsx

// Mention specific functions or sections
@src/components/UserProfile.tsx:45-67

Reuse context effectively:

Keep key project notes in your repository (e.g., a AGENTS.md or docs folder)
Reduces need to re-explain project details
Saves tokens per conversation

Strategic file mentions:

Only include files directly relevant to the task
Use @folder/ for broad context, specific files for targeted work

Model Switching Strategies

Start cheap, escalate when needed:

Begin with free models (Qwen3 Coder, GLM-4.5-Air)
Switch to budget models if free models struggle
Escalate to premium models only for complex tasks

Use API Configuration Profiles:

Set up multiple profiles for different cost tiers
Quick switching between free, budget, and premium models
Match model capability to task complexity

Mode-Based Cost Optimization

Use appropriate modes to limit expensive operations:

Ask Mode: Information gathering without code changes
Architect Mode: Planning without expensive file operations
Debug Mode: Focused troubleshooting

Custom modes for budget control:

Create modes that restrict expensive tools
Limit file access to specific directories
Control which operations are auto-approved

Real-World Performance Comparisons

Code Generation Tasks

Simple function creation:

Mistral Devstral Small: 95% success rate
GPT-4: 98% success rate
Cost difference: Free vs $0.20 vs $30 per million tokens

Complex refactoring:

Budget models: 70-80% success rate
Premium models: 90-95% success rate
Recommendation: Start with budget, escalate if needed

Debugging Performance

Simple bugs:

Free models: Usually sufficient
Budget models: Excellent performance
Premium models: Overkill for most cases

Complex system issues:

Free models: 40-60% success rate
Budget models: 60-80% success rate
Premium models: 85-95% success rate

Hybrid Approach Recommendations

Daily Development Workflow

Morning planning session:

Use Architect mode with DeepSeek R1
Plan features and architecture
Create task breakdowns

Implementation phase:

Use Code mode with budget models
Generate and modify code
Handle routine development tasks

Complex problem solving:

Switch to premium models when stuck
Use for critical debugging
Architecture decisions affecting multiple systems

Project Phase Strategy

Early development:

Free and budget models for prototyping
Rapid iteration without cost concerns
Establish patterns and structure

Production preparation:

Premium models for critical code review
Performance optimization
Security considerations

Cost Monitoring and Control

Track Your Usage

Monitor credit consumption:

Check cost estimates in chat history
Review monthly usage patterns
Identify high-cost operations

Set spending limits:

Use provider billing alerts
Configure provider rate limits to control usage
Set daily/monthly budgets

Cost-Saving Tips

Reduce system prompt size:

Disable MCP if not using external tools
Use focused custom modes
Minimize unnecessary context

Optimize conversation length:

Use Checkpoints to reset context
Start fresh conversations for unrelated tasks
Archive completed work

Batch similar tasks:

Group related code changes
Handle multiple files in single requests
Reduce conversation overhead

Getting Started with Budget Models

Quick Setup Guide

Create OpenRouter account for free models
Configure multiple providers in Kilo Code
Set up API Configuration Profiles for easy switching
Escalate to budget models when needed
Reserve premium models for complex work

Recommended Provider Mix

Free tier foundation:

OpenRouter - Free models
Groq - Fast inference for supported models
Z.ai - Provides a free model GLM-4.5-Flash

Budget tier options:

DeepSeek - Excellent value models
Mistral - Specialized coding models

Premium tier backup:

Anthropic - Claude for complex reasoning
OpenAI - GPT-4 for critical tasks

Measuring Success

Track these metrics:

Monthly AI costs vs. development productivity
Task completion rates by model tier
Time saved vs. money spent
Code quality improvements

Success indicators:

70%+ of tasks completed with free/budget models
Monthly costs under your target budget
Maintained or improved code quality
Faster development cycles

By combining free models, strategic budget model usage, and smart optimization techniques, you can harness the full power of AI-assisted development while keeping costs minimal. Start with free options and gradually incorporate budget models as your needs and comfort with costs grow.