Free and Budget Models
Why this matters: AI model costs can add up quickly during development. This guide shows you how to use Kilo Code effectively while minimizing or eliminating costs through free models, budget-friendly alternatives, and smart usage strategies.
Completely Free Options
Kilo Gateway Free Models
From time to time, Kilo works with AI inference providers to offer free models. These are available through the Kilo Gateway. Currently, we are offering these free models:
- MiniMax M2.1 (free) - A capable model from MiniMax with strong general-purpose performance.
- Z.AI: GLM 4.7 (free) - Latest variant of the GLM family, purpose-built for agent-centric applications.
- MoonshotAI: Kimi K2.5 (free) - Optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis.
- Giga Potato (free) - A stealth release model that is free in its evaluation period.
- Arcee AI: Trinity Large Preview (free) - A preview model from Arcee AI with strong capabilities.
OpenRouter Free Tier Models
OpenRouter offers several models with generous free tiers. Note: You'll need to create a free OpenRouter account to access these models.
Setup:
- Create a free OpenRouter account
- Get your API key from the dashboard
- Configure Kilo Code with the OpenRouter provider
Available free models:
- Qwen3 Coder (free) - Optimized for agentic coding tasks such as function calling, tool use, and long-context reasoning over repositories.
- Z.AI: GLM 4.5 Air (free) - Lightweight variant of the GLM-4.5 family, purpose-built for agent-centric applications.
- DeepSeek: R1 0528 (free) - Performance on par with OpenAI o1, but open-sourced and with fully open reasoning tokens.
- MoonshotAI: Kimi K2 (free) - Optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis.
Cost-Effective Premium Models
When you need more capability than free models provide, these options deliver excellent value:
Ultra-Budget Champions (Under $0.50 per million tokens)
Mistral Devstral Small
- Cost: ~$0.20 per million input tokens
- Best for: Code generation, debugging, refactoring
- Performance: 85% of premium model capability at 10% of the cost
Llama 4 Maverick
- Cost: ~$0.30 per million input tokens
- Best for: Complex reasoning, architecture planning
- Performance: Excellent for most development tasks
DeepSeek v3
- Cost: ~$0.27 per million input tokens
- Best for: Code analysis, large codebase understanding
- Performance: Strong technical reasoning
Mid-Range Value Models ($0.50-$2.00 per million tokens)
Qwen3 235B
- Cost: ~$1.20 per million input tokens
- Best for: Complex projects requiring high accuracy
- Performance: Near-premium quality at 40% of the cost
Smart Usage Strategies
The 50% Rule
Principle: Use budget models for 50% of your tasks, premium models for the other 50%.
Budget model tasks:
- Code reviews and analysis
- Documentation writing
- Simple bug fixes
- Boilerplate generation
- Refactoring existing code
Premium model tasks:
- Complex architecture decisions
- Debugging difficult issues
- Performance optimization
- New feature design
- Critical production code
Context Management for Cost Savings
Minimize context size:
// Instead of mentioning entire files @src/components/UserProfile.tsx // Mention specific functions or sections @src/components/UserProfile.tsx:45-67
Reuse context effectively:
- Keep key project notes in your repository (e.g., a AGENTS.md or docs folder)
- Reduces need to re-explain project details
- Saves tokens per conversation
Strategic file mentions:
- Only include files directly relevant to the task
- Use
@folder/for broad context, specific files for targeted work
Model Switching Strategies
Start cheap, escalate when needed:
- Begin with free models (Qwen3 Coder, GLM-4.5-Air)
- Switch to budget models if free models struggle
- Escalate to premium models only for complex tasks
Use API Configuration Profiles:
- Set up multiple profiles for different cost tiers
- Quick switching between free, budget, and premium models
- Match model capability to task complexity
Mode-Based Cost Optimization
Use appropriate modes to limit expensive operations:
- Ask Mode: Information gathering without code changes
- Architect Mode: Planning without expensive file operations
- Debug Mode: Focused troubleshooting
Custom modes for budget control:
- Create modes that restrict expensive tools
- Limit file access to specific directories
- Control which operations are auto-approved
Real-World Performance Comparisons
Code Generation Tasks
Simple function creation:
- Mistral Devstral Small: 95% success rate
- GPT-4: 98% success rate
- Cost difference: Free vs $0.20 vs $30 per million tokens
Complex refactoring:
- Budget models: 70-80% success rate
- Premium models: 90-95% success rate
- Recommendation: Start with budget, escalate if needed
Debugging Performance
Simple bugs:
- Free models: Usually sufficient
- Budget models: Excellent performance
- Premium models: Overkill for most cases
Complex system issues:
- Free models: 40-60% success rate
- Budget models: 60-80% success rate
- Premium models: 85-95% success rate
Hybrid Approach Recommendations
Daily Development Workflow
Morning planning session:
- Use Architect mode with DeepSeek R1
- Plan features and architecture
- Create task breakdowns
Implementation phase:
- Use Code mode with budget models
- Generate and modify code
- Handle routine development tasks
Complex problem solving:
- Switch to premium models when stuck
- Use for critical debugging
- Architecture decisions affecting multiple systems
Project Phase Strategy
Early development:
- Free and budget models for prototyping
- Rapid iteration without cost concerns
- Establish patterns and structure
Production preparation:
- Premium models for critical code review
- Performance optimization
- Security considerations
Cost Monitoring and Control
Track Your Usage
Monitor credit consumption:
- Check cost estimates in chat history
- Review monthly usage patterns
- Identify high-cost operations
Set spending limits:
- Use provider billing alerts
- Configure provider rate limits to control usage
- Set daily/monthly budgets
Cost-Saving Tips
Reduce system prompt size:
- Disable MCP if not using external tools
- Use focused custom modes
- Minimize unnecessary context
Optimize conversation length:
- Use Checkpoints to reset context
- Start fresh conversations for unrelated tasks
- Archive completed work
Batch similar tasks:
- Group related code changes
- Handle multiple files in single requests
- Reduce conversation overhead
Getting Started with Budget Models
Quick Setup Guide
- Create OpenRouter account for free models
- Configure multiple providers in Kilo Code
- Set up API Configuration Profiles for easy switching
- Escalate to budget models when needed
- Reserve premium models for complex work
Recommended Provider Mix
Free tier foundation:
- OpenRouter - Free models
- Groq - Fast inference for supported models
- Z.ai - Provides a free model GLM-4.5-Flash
Budget tier options:
Premium tier backup:
Measuring Success
Track these metrics:
- Monthly AI costs vs. development productivity
- Task completion rates by model tier
- Time saved vs. money spent
- Code quality improvements
Success indicators:
- 70%+ of tasks completed with free/budget models
- Monthly costs under your target budget
- Maintained or improved code quality
- Faster development cycles
By combining free models, strategic budget model usage, and smart optimization techniques, you can harness the full power of AI-assisted development while keeping costs minimal. Start with free options and gradually incorporate budget models as your needs and comfort with costs grow.