Top Coding Models in Kilo This Week
Our picks based on real-world testing • View usage stats
by Anthropic
The most capable model for complex planning and orchestration
by Qwen
State-of-the-art performance for coding and multi-agent flows
Top Models by Mode
KiloClaw
| # | Model | % |
|---|---|---|
| 1 | ling-2.6-1t | 21.0% |
| 2 | step-3.5-flash | 21.0% |
| 3 | qwen3.6-plus | 19.7% |
| 4 | nemotron-3-super-120b-a12b | 17.7% |
| 5 | claude-sonnet-4.6 | 3.5% |
| 6 | laguna-m.1 | 2.9% |
| 7 | minimax-m2.7 | 2.3% |
| 8 | claude-opus-4.7 | 2.1% |
| 9 | deepseek-v4-flash | 1.4% |
| 10 | kimi-k2.6 | 1.2% |
Code
| # | Model | % |
|---|---|---|
| 1 | step-3.5-flash | 29.6% |
| 2 | ling-2.6-1t | 25.9% |
| 3 | grok-code-fast-1 | 15.7% |
| 4 | nemotron-3-super-120b-a12b | 7.7% |
| 5 | hy3-preview | 3.3% |
| 6 | qwen3.6-plus | 2.5% |
| 7 | laguna-m.1 | 2.4% |
| 8 | claude-opus-4.7 | 1.9% |
| 9 | claude-sonnet-4.6 | 1.9% |
| 10 | kimi-k2.6 | 1.0% |
Plan
| # | Model | % |
|---|---|---|
| 1 | ling-2.6-1t | 25.6% |
| 2 | step-3.5-flash | 20.2% |
| 3 | nemotron-3-super-120b-a12b | 10.2% |
| 4 | grok-code-fast-1 | 9.6% |
| 5 | claude-opus-4.7 | 5.7% |
| 6 | qwen3.6-plus | 5.4% |
| 7 | hy3-preview | 4.3% |
| 8 | kimi-k2.6 | 2.5% |
| 9 | laguna-m.1 | 2.4% |
| 10 | claude-sonnet-4.6 | 2.4% |
Ask
| # | Model | % |
|---|---|---|
| 1 | ling-2.6-1t | 33.9% |
| 2 | grok-code-fast-1 | 27.4% |
| 3 | step-3.5-flash | 15.4% |
| 4 | nemotron-3-super-120b-a12b | 6.4% |
| 5 | qwen3.6-plus | 2.1% |
| 6 | claude-sonnet-4.6 | 2.0% |
| 7 | hy3-preview | 1.8% |
| 8 | claude-opus-4.7 | 1.5% |
| 9 | kimi-k2.6 | 1.4% |
| 10 | laguna-m.1 | 1.3% |
Debug
| # | Model | % |
|---|---|---|
| 1 | step-3.5-flash | 27.4% |
| 2 | ling-2.6-1t | 25.3% |
| 3 | grok-code-fast-1 | 15.0% |
| 4 | nemotron-3-super-120b-a12b | 7.3% |
| 5 | hy3-preview | 6.6% |
| 6 | qwen3.6-plus | 3.4% |
| 7 | laguna-m.1 | 2.1% |
| 8 | claude-sonnet-4.6 | 1.8% |
| 9 | claude-opus-4.7 | 1.7% |
| 10 | kimi-k2.6 | 1.2% |
Review
| # | Model | % |
|---|---|---|
| 1 | ling-2.6-1t | 38.0% |
| 2 | step-3.5-flash | 24.9% |
| 3 | nemotron-3-super-120b-a12b | 11.9% |
| 4 | qwen3.6-plus | 6.1% |
| 5 | claude-sonnet-4.6 | 3.3% |
| 6 | kimi-k2.6 | 2.6% |
| 7 | claude-opus-4.7 | 1.5% |
| 8 | gpt-5.5 | 1.3% |
| 9 | claude-sonnet-4.5 | 1.2% |
| 10 | laguna-m.1 | 1.2% |
Top Models Today
- 1.step-3.5-flash45.3B
- 2.ling-2.6-1t43.0B
- 3.grok-code-fast-124.3B
- 4.deepseek-v4-pro24.3B
- 5.nemotron-3-super-120b-a12b18.2B
- 6.laguna-m.117.0B
- 7.glm-5.115.1B
- 8.deepseek-v4-flash14.8B
- 9.claude-opus-4.79.4B
- 10.gpt-5.58.6B
Daily Top Models
All Models
Browse and compare all available AI coding models
Anthropic: Claude Sonnet 4.5
anthropic
Claude Sonnet 4.5 is Anthropic’s most advanced Sonnet model to date, optimized for real-world agents and coding workflows. It delivers state-of-the-art performance on coding benchmarks such as SWE-bench Verified, with...
inclusionAI: Ling-2.6-1T (free)
inclusionai
Ling-2.6-1T is an instant (instruct) model from inclusionAI and the company’s trillion-parameter flagship, designed for real-world agents that require fast execution and high efficiency at scale. It uses a “fast...
Z.ai: GLM 5
GLM-5 is Z.ai’s flagship open-source foundation model engineered for complex systems design and long-horizon agent workflows. Built for expert developers, it delivers production-grade performance on large-scale programming tasks, rivaling leading...
Tencent: Hy3 preview (free)
tencent
Hy3 preview is a high-efficiency Mixture-of-Experts model from Tencent designed for agentic workflows and production use. It supports configurable reasoning levels across disabled, low, and high modes, allowing it to...
Anthropic: Claude Opus 4.5
anthropic
Claude Opus 4.5 is Anthropic’s frontier reasoning model optimized for complex software engineering, agentic workflows, and long-horizon computer use. It offers strong multimodal capabilities, competitive performance across real-world coding and...
Google: Gemini 3 Flash Preview
Gemini 3 Flash Preview is a high speed, high value thinking model designed for agentic workflows, multi turn chat, and coding assistance. It delivers near Pro level reasoning and tool...
Qwen: Qwen3.6 Plus
Qwen 3.6 Plus builds on a hybrid architecture that combines efficient linear attention with sparse mixture-of-experts routing, enabling strong scalability and high-performance inference. Compared to the 3.5 series, it delivers...
MoonshotAI: Kimi K2.6
Kimi K2.6 is Moonshot AI's next-generation multimodal model, designed for long-horizon coding, coding-driven UI/UX generation, and multi-agent orchestration. It handles complex end-to-end coding tasks across Python, Rust, and Go, and...
Z.ai: GLM 5.1
z-ai
GLM-5.1 delivers a major leap in coding capability, with particularly significant gains in handling long-horizon tasks. Unlike previous models built around minute-level interactions, GLM-5.1 can work independently and continuously on...
Z.ai: GLM 4.7
z-ai
GLM-4.7 is Z.ai’s latest flagship model, featuring upgrades in two key areas: enhanced programming capabilities and more stable multi-step reasoning/execution. It demonstrates significant improvements in executing complex agent tasks while...
OpenAI: GPT-5.3-Codex
GPT-5.3-Codex is OpenAI’s most advanced agentic coding model, combining the frontier software engineering performance of GPT-5.2-Codex with the broader reasoning and professional knowledge capabilities of GPT-5.2. It achieves state-of-the-art results...
DeepSeek: DeepSeek V4 Flash
DeepSeek V4 Flash is an efficiency-optimized Mixture-of-Experts model from DeepSeek with 284B total parameters and 13B activated parameters, supporting a 1M-token context window. It is designed for fast inference and...
OpenAI: GPT-5.2
openai
GPT-5.2 is the latest frontier-grade model in the GPT-5 series, offering stronger agentic and long context perfomance compared to GPT-5.1. It uses adaptive reasoning to allocate computation dynamically, responding quickly...
Poolside: Laguna M.1 (free)
poolside
Laguna M.1 is the flagship coding agent model from [Poolside](https://poolside.ai), optimized for complex software engineering tasks. Designed for agentic coding workflows, it supports tool calling and reasoning, with a 128K...
OpenAI: GPT-5.5
GPT-5.5 is OpenAI’s frontier model designed for complex professional workloads, building on GPT-5.4 with stronger reasoning, higher reliability, and improved token efficiency on hard tasks. It features a 1M+ token...
OpenAI: GPT-5.2-Codex
openai
GPT-5.2-Codex is an upgraded version of GPT-5.1-Codex optimized for software engineering and coding workflows. It is designed for both interactive development sessions and long, independent execution of complex engineering tasks....
OpenAI: GPT-5.4
openai
GPT-5.4 is OpenAI’s latest frontier model, unifying the Codex and GPT lines into a single system. It features a 1M+ token context window (922K input, 128K output) with support for...
StepFun: Step 3.5 Flash
Step 3.5 Flash is StepFun's most capable open-source foundation model. Built on a sparse Mixture of Experts (MoE) architecture, it selectively activates only 11B of its 196B parameters per token....
Google: Gemini 3.1 Pro Preview
Gemini 3.1 Pro Preview is Google’s frontier reasoning model, delivering enhanced software engineering performance, improved agentic reliability, and more efficient token usage across complex workflows. Building on the multimodal foundation...
MoonshotAI: Kimi K2.5
moonshotai
Kimi K2.5 is Moonshot AI's native multimodal model, delivering state-of-the-art visual coding capability and a self-directed agent swarm paradigm. Built on Kimi K2 with continued pretraining over approximately 15T mixed...
Qwen: Qwen3 Coder Plus
Qwen3 Coder Plus is Alibaba's proprietary version of the Open Source Qwen3 Coder 480B A35B. It is a powerful coding agent model specializing in autonomous programming via tool calling and...
Google: Gemini 2.5 Flash
Gemini 2.5 Flash is Google's state-of-the-art workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. It includes built-in "thinking" capabilities, enabling it to provide responses with greater...
xAI: Grok Code Fast 1
x-ai
Grok Code Fast 1 is a speedy and economical reasoning model that excels at agentic coding. With reasoning traces visible in the response, developers can steer Grok Code for high-quality...
Anthropic: Claude Sonnet 4.6
Sonnet 4.6 is Anthropic's most capable Sonnet-class model yet, with frontier performance across coding, agents, and professional work. It excels at iterative development, complex codebase navigation, end-to-end project management with...
DeepSeek: DeepSeek V4 Pro
DeepSeek V4 Pro is a large-scale Mixture-of-Experts model from DeepSeek with 1.6T total parameters and 49B activated parameters, supporting a 1M-token context window. It is designed for advanced reasoning, coding,...
MiniMax: MiniMax M2.5
MiniMax-M2.5 is a SOTA large language model designed for real-world productivity. Trained in a diverse range of complex real-world digital working environments, M2.5 builds upon the coding expertise of M2.1...
Z.ai: GLM 4.6
z-ai
Compared with GLM-4.5, this generation brings several key improvements: Longer context window: The context window has been expanded from 128K to 200K tokens, enabling the model to handle more complex...
Mistral: Devstral 2 2512
Devstral 2 is a state-of-the-art open-source model by Mistral AI specializing in agentic coding. It is a 123B-parameter dense transformer model supporting a 256K context window. Devstral 2 supports exploring...
Anthropic: Claude Haiku 4.5
anthropic
Claude Haiku 4.5 is Anthropic’s fastest and most efficient model, delivering near-frontier intelligence at a fraction of the cost and latency of larger Claude models. Matching Claude Sonnet 4’s performance...
Anthropic: Claude Opus 4.6
Opus 4.6 is Anthropic’s strongest model for coding and long-running professional tasks. It is built for agents that operate across entire workflows rather than single prompts, making it especially effective...
Anthropic: Claude Opus 4.7
anthropic
Opus 4.7 is the next generation of Anthropic's Opus family, built for long-running, asynchronous agents. Building on the coding and agentic strengths of Opus 4.6, it delivers stronger performance on...
DeepSeek: DeepSeek V3.1 Terminus
DeepSeek-V3.1 Terminus is an update to [DeepSeek V3.1](/deepseek/deepseek-chat-v3.1) that maintains the model's original capabilities while addressing issues reported by users, including language consistency and agent capabilities, further optimizing the model's...
Google: Gemini 3 Pro Preview
Gemini 3 Pro is Google’s flagship frontier model for high-precision multimodal reasoning, combining strong performance across text, image, video, audio, and code with a 1M-token context window. Reasoning Details must be preserved when using multi-turn tool calling, see our docs here: https://openrouter.ai/docs/use-cases/reasoning-tokens#preserving-reasoning-blocks. It delivers state-of-the-art benchmark results in general reasoning, STEM problem solving, factual QA, and multimodal understanding, including leading scores on LMArena, GPQA Diamond, MathArena Apex, MMMU-Pro, and Video-MMMU. Interactions emphasize depth and interpretability: the model is designed to infer intent with minimal prompting and produce direct, insight-focused responses. Built for advanced development and agentic workflows, Gemini 3 Pro provides robust tool-calling, long-horizon planning stability, and strong zero-shot generation for complex UI, visualization, and coding tasks. It excels at agentic coding (SWE-Bench Verified, Terminal-Bench 2.0), multimodal analysis, and structured long-form tasks such as research synthesis, planning, and interactive learning experiences. Suitable applications include autonomous agents, coding assistants, multimodal analytics, scientific reasoning, and high-context information processing.
Google: Gemma 4 26B A4B (free)
Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google DeepMind. Despite 25.2B total parameters, only 3.8B activate per token during inference — delivering near-31B quality at...
Google: Gemma 4 31B
Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and image input with text output. Features a 256K token context window, configurable thinking/reasoning mode, native function...
inclusionAI: Ling-2.6-flash (free)
inclusionai
Ling-2.6-flash is an instant (instruct) model from inclusionAI with 104B total parameters and 7.4B active parameters, designed for real-world agents that require fast responses, strong execution, and high token efficiency....
Kwaipilot: KAT-Coder-Pro V1
kwaipilot
KAT-Coder-Pro V1 is KwaiKAT's most advanced agentic coding model in the KAT-Coder series. Designed specifically for agentic coding tasks, it excels in real-world software engineering scenarios, achieving 73.4% solve rate on the SWE-Bench Verified benchmark. The model has been optimized for tool-use capability, multi-turn interaction, instruction following, generalization, and comprehensive capabilities through a multi-stage training process, including mid-training, supervised fine-tuning (SFT), reinforcement fine-tuning (RFT), and scalable agentic RL.
MiniMax: MiniMax M2.1
minimax
MiniMax-M2.1 is a lightweight, state-of-the-art large language model optimized for coding, agentic workflows, and modern application development. With only 10 billion activated parameters, it delivers a major jump in real-world...
MiniMax: MiniMax M2.7
MiniMax-M2.7 is a next-generation large language model designed for autonomous, real-world productivity and continuous improvement. Built to actively participate in its own evolution, M2.7 integrates advanced agentic capabilities through multi-agent...
NVIDIA: Nemotron 3 Super (free)
nvidia
NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameters for maximum compute efficiency and accuracy in complex multi-agent applications. Built on a hybrid Mamba-Transformer...
OpenAI: GPT-5.1
openai
GPT-5.1 is the latest frontier-grade model in the GPT-5 series, offering stronger general-purpose reasoning, improved instruction adherence, and a more natural conversational style compared to GPT-5. It uses adaptive reasoning...
OpenAI: GPT-5.1-Codex
openai
GPT-5.1-Codex is a specialized version of GPT-5.1 optimized for software engineering and coding workflows. It is designed for both interactive development sessions and long, independent execution of complex engineering tasks....
Qwen: Qwen3 Coder 480B A35B
qwen
Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is optimized for agentic coding tasks such as function calling, tool use, and long-context reasoning over...
Qwen: Qwen3 Coder Next
Qwen3-Coder-Next is an open-weight causal language model optimized for coding agents and local development workflows. It uses a sparse MoE design with 80B total parameters and only 3B activated per...
xAI: Grok 4.3
Grok 4.3 is a reasoning model from xAI. It accepts text and image inputs with text output, and is suited for agentic workflows, instruction-following tasks, and applications requiring high factual...
Xiaomi: MiMo-V2-Pro
MiMo-V2-Pro is Xiaomi's flagship foundation model, featuring over 1T total parameters and a 1M context length, deeply optimized for agentic scenarios. It is highly adaptable to general agent frameworks like...
Z.ai: GLM 4.7 Flash
z-ai
As a 30B-class SOTA model, GLM-4.7-Flash offers a new option that balances performance and efficiency. It is further optimized for agentic coding use cases, strengthening coding capabilities, long-horizon task planning,...
‡ Real-world usage from Kilo Code Leaderboard
* Performance metrics from Artificial Analysis
† Pricing from OpenRouter
Recent posts
Read the latest news and updates from the Kilo Code team.
Warp Finally Went Open Source
Here's why the community forked it by Friday
Brian Turcotte·
Rebuilding a viral Hacker News game with Kilo CLI + Opus 4.7
Yesterday, I saw an interesting game on the front page of Hacker News: Cursor Camp. The game got pretty popular, with over 1,000 upvotes on HN.
Darko·
How I Built a Claw-Powered Dashboard That Helps Us Work
The pattern behind turning a proactive AI agent into a work pipeline.
Brian Turcotte·