Updated May 2026 · Tracking 60+ open-weight coding models

Best Open-Source & Open-Weight AI Coding Models in 2026

The best open-source coding models in 2026 are GLM-5.1, Kimi K2.6, DeepSeek V4-Pro, and Qwen3-Coder-Next for agentic work; MiniMax M2.5 for free hosted use; and Devstral Small 2 or Qwen3.6-27B for local development. All ship as open weights under Apache 2.0, MIT, or Modified MIT licenses — use them in Kilo Code, run them locally, or bring your own keys.

Best Open-Source Coding Models Ranked (2026)

Ranked by SWE-Bench Verified, SWE-Bench Pro, Terminal-Bench 2.0, LiveCodeBench, and real-world Kilo leaderboard usage. Each model links to a full review and is available either hosted in Kilo Code, locally via Ollama / LM Studio / vLLM, or through your own provider keys.

#1

GLM-5.1

Z.ai

In Kilo

Best for long-horizon agentic engineering

Z.ai's flagship for agentic coding: stronger judgment on ambiguous problems, sustained iteration over thousands of tool calls, SOTA on SWE-Bench Pro, and large gains on NL2Repo and Terminal-Bench 2.0.

Parameters
744B · 40B active
Context
200K
License
MIT
Top benchmark
Terminal-Bench 2.0 SOTA
View GLM-5.1 in Kilo
#2

Kimi K2.6

Moonshot AI

In Kilo

Best for agent swarms and long autonomous runs

Held the highest SWE-Bench Pro score of any open-weight model at its April 2026 release (58.6%) and remains a top-tier choice. Native 300-sub-agent swarms, 4,000-step coordination, 12-hour autonomous runs. Modified MIT, weights on Hugging Face.

Parameters
1T · 32B active
Context
256K
License
Modified MIT
Top benchmark
SWE-Bench Pro 58.6%
View Kimi K2.6 in Kilo
#3

DeepSeek V4-Pro

DeepSeek models
In Kilo

Best for 1M-context and competitive coding

1.6T MoE with a true 1M-token context window, #1 on LiveCodeBench (93.5) and Codeforces (3206) among all evaluated models including closed APIs. MIT license, FP4/FP8 mixed precision.

Parameters
1.6T · 49B active
Context
1M
License
MIT
Top benchmark
LiveCodeBench 93.5
View DeepSeek V4-Pro in Kilo
#4

Qwen3-Coder-Next

Alibaba Qwen

In Kilo

Best efficiency per active parameter

80B total / 3B active hybrid MoE built specifically for coding agents. Competitive with much larger frontier and open-weight models on SWE-Bench Verified, Multilingual, and Pro. Apache 2.0.

Parameters
80B · 3B active
Context
256K
License
Apache 2.0
Top benchmark
SWE-Bench Verified 71.3
View Qwen3-Coder-Next in Kilo
#5

Qwen3.6-27B

Alibaba Qwen

In Kilo

Best dense model for repo-level coding

Dense 27B model that beats the much larger Qwen3.5-397B-A17B MoE on agentic coding. Matches Claude 4.5 Opus on Terminal-Bench 2.0 (59.3). Apache 2.0, BF16 and FP8 weights on Hugging Face.

Parameters
27B dense
Context
262K
License
Apache 2.0
Top benchmark
SWE-Bench Verified 77.2
View Qwen3.6-27B in Kilo
#6

MiniMax M2.5

MiniMax

In Kilo

Best free hosted open-weight coding model

High-throughput open-weight coding model with heavy real-world usage on the Kilo leaderboard across Code, Plan, Ask, Debug, Review, and OpenClaw workflows. Free hosted variant in Kilo.

Parameters
MoE
Context
200K
License
Open weights
Top benchmark
Free in Kilo
View MiniMax M2.5 in Kilo
#7

GLM-5

Z.ai

In Kilo

Best foundation for local self-hosting

The 744B-A40B MIT-licensed predecessor to GLM-5.1, with published guidance and weights for self-serving via vLLM, SGLang, xLLM, or Ktransformers. First open model to reach SWE-bench Verified frontier tier.

Parameters
744B · 40B active
Context
200K
License
MIT
Top benchmark
#1 Vending Bench 2 OSS
View GLM-5 in Kilo
#8

Devstral 2

Mistral AI

Best Mistral coding model

123B parameter model purpose-built for agentic software engineering. 72.2% on SWE-Bench Verified (self-reported), 256K context. Mistral describes it as 7× more cost-efficient than Sonnet.

Parameters
123B
Context
256K
License
Apache 2.0
Top benchmark
SWE-Bench Verified 72.2
Read release
#9

Devstral Small 2

Mistral AI

Best for consumer GPUs

24B model that runs on a single 24GB consumer GPU. 68% on SWE-Bench Verified self-reported, comparable to GLM-4.5-Air at a fraction of the size. Apache 2.0.

Parameters
24B
Context
128K
License
Apache 2.0
Top benchmark
SWE-Bench Verified 68%
Read release
#10

Trinity Large Thinking

Arcee AI

In Kilo

Best US-origin open reasoning model

399B sparse MoE thinking model for multi-turn tool calling, instruction following, and stable long-horizon agent loops. #2 on PinchBench at launch. Apache 2.0.

Parameters
399B sparse MoE
Context
128K
License
Apache 2.0
Top benchmark
#2 PinchBench
Read release

Open-Source Coding Models Compared (Benchmarks)

Side-by-side specs and benchmarks for the top open-weight coding models. SWE-Bench Verified, Terminal-Bench 2.0, and LiveCodeBench are the most comparable measures of real software-engineering capability today.

ModelParamsContextLicenseSWE-Bench VerifiedTerminal-Bench 2.0LiveCodeBenchIn Kilo
GLM-5.1744B-A40B200KMITSWE-Bench Pro SOTASOTA OSSYes
Kimi K2.61T-A32B256KModified MIT80.2%66.7%89.6%Yes
DeepSeek V4-Pro1.6T-A49B1MMIT80.6%93.5%Yes
Qwen3-Coder-Next80B-A3B256KApache 2.071.3%Yes
Qwen3.6-27B27B dense262KApache 2.077.2%59.3%Yes
GLM-5744B-A40B200KMITFrontier tierYes
Devstral 2123B256KApache 2.072.2%66.79%BYOK / local
Devstral Small 224B128KApache 2.068%BYOK / local

Benchmarks sourced from each model’s technical report and release notes. Results may vary by scaffold, prompt, and agent setup. — = not yet published.

Best Open-Source Coding Model for Each Use Case

Different jobs call for different open-weight models. Pick by need:

NeedRecommendedWhy
Best overall agentic codingGLM-5.1SOTA on SWE-Bench Pro and long-horizon stability.
Best for 1M-token contextDeepSeek V4-ProTrue 1M context window, #1 on LiveCodeBench.
Best for consumer hardwareDevstral Small 2 / Qwen3.6-27BRun on a single 24GB GPU at strong SWE-Bench scores.
Best free hostedMiniMax M2.5Free in Kilo, leaderboard-favorite for everyday coding.
Best for agent swarmsKimi K2.6Native 300-sub-agent swarm, 12-hour autonomous runs.
Best efficiency per active paramQwen3-Coder-Next80B total / 3B active matches much larger frontier models.
Best US-origin open reasoningTrinity Large ThinkingApache 2.0, multi-turn tool use, long-horizon agent loops.

Open-Source vs Open-Weight Coding Models: What’s the Difference?

Understanding the terminology helps you make informed decisions about which models to use in your development workflow.

Open Source

Fully transparent — includes model weights, training code, datasets, and documentation. You can inspect, modify, and understand exactly how the model was built. Examples include OpenCoder, StarCoder2, and IBM Granite Code.

Open Weight

Model weights available — you can download and run the model, but training code and datasets may not be public. License terms vary: Apache 2.0 and MIT are highly permissive; Modified MIT and bespoke model licenses add usage, distribution, or attribution rules.

The bottom line: both open-source and open-weight coding models give you freedom from single-vendor lock-in. You can run many of them locally, self-host them, use hosted endpoints, fine-tune within the license, or route them through Kilo Code alongside closed frontier models.

Why Choose an Open-Weight Coding Model?

Open-source and open-weight models are getting serious. Here’s why developers are choosing them for production coding agents.

Run Locally

Use Ollama, LM Studio, vLLM, SGLang, or another OpenAI-compatible runtime to run open weights on hardware you control. Keep sensitive prompts on your own network.

No Vendor Lock-In

Your code never depends on a single provider. Switch between local and hosted, or between different models, without changing your workflow.

Rapidly Improving

Open-weight models are moving from chat benchmarks into real agent loops: planning, editing many files, calling tools, reading terminal output, retrying, and staying coherent over long sessions.

Community Driven

Benefit from community fine-tunes, optimizations, and improvements. Open models get better through collective effort.

Cost Effective

Run local queries for the cost of your hardware, choose free hosted promos when they appear, or pay low per-token rates for efficient MoE models. Route cheap work to open models and save frontier models for the hardest steps.

More Inspectable

Download weights, run reproducible tests, audit release notes, compare provider behavior, and pin deployments when you need predictable model behavior.

How to Run Open-Source Coding Models (Local, Hosted, BYOK)

Three ways to use open-weight coding models: locally on hardware you control, hosted in Kilo Code, or through your own provider keys.

1

Run Locally

Install Ollama, LM Studio, vLLM, or another OpenAI-compatible runtime. Download a model such as Qwen3.6-27B, Devstral Small 2, GLM-5, DeepSeek V4, or Trinity, then connect it to Kilo Code. Your code and prompts stay on hardware you control.

Local setup guide →
2

Use Hosted in Kilo

Access 500+ models through Kilo Code’s hosted service, including leaderboard favorites from MiniMax, Qwen, Z.ai, Mistral, DeepSeek, Moonshot, NVIDIA, Arcee, and more. Pay only for what you use, no markup.

All models →
3

Bring Your Own Keys

Connect your own API keys from providers like OpenRouter, Together AI, Google AI, Arcee, Z.ai, or any compatible direct model provider. Full control, full flexibility.

BYOK guides →

Open Source Never Sleeps

Innovation isn’t limited to Silicon Valley. Open-source AI models come from labs and researchers across the globe, democratizing access to cutting-edge technology.

From DeepSeek, Qwen, MiniMax, Kimi, and GLM in China to Arcee’s Trinity family in the United States, Mistral’s Devstral in Europe, Falcon in the UAE, Singapore’s SEAL-LION, and India’s Sarvam — the open model movement is global. The Kilo community evaluates models from every corner of the world for performance, transparency, speed, licensing, and cost.

Open-Weight Models Are Becoming Production Coding Agents

Kilo research, Kilo leaderboard usage, Stanford AI Index data, and new Apache-licensed launches point in the same direction: open models are now credible for real coding work.

By the Numbers

According to Stanford’s 2025 AI Index Report

5.4%performance gap

The gap between top and 10th-ranked models fell from 11.9% to just 5.4% in one year. The frontier is increasingly competitive.

0.7%between #1 and #2

The top two models are now separated by just 0.7%. Chinese open-weight models like DeepSeek and Qwen are competing at the highest level.

Real-World Evidence

From Kilo usage, Kilo research, and recent releases

MiniMax M2.5 and Qwen 3.6 Plus

Recent Kilo leaderboard snapshots show free/open-weight families earning heavy real-world usage across Code, Plan, Ask, Debug, Review, and OpenClaw workflows.

GLM-5.1

GLM-5.1 is the follow-up flagship for longer-running engineering agents. Z.ai describes stronger coding capability, better judgment on ambiguous work, repo generation improvements, terminal-task gains, and sustained iteration over many tool calls.

Kimi K2.6 and DeepSeek V4

Kimi K2.6 sets the highest open-weight SWE-Bench Pro score to date with native 300-sub-agent swarms. DeepSeek V4-Pro leads all evaluated models on LiveCodeBench (93.5) and Codeforces (3206), including closed frontier APIs.

Trinity Large Thinking

Arcee’s thinking release adds reasoning before answers and improves multi-turn tool use, instruction following, and context coherence for long-running agent loops. The checkpoint is published under Apache 2.0.

The bottom line: treat open-weight models as first-class options. Some are ready for hosted Kilo coding today; others are best used locally or through your own provider while the ecosystem catches up.

Use Open-Source Coding Models Everywhere with Kilo Code

Kilo works where you work. Build solo or with your engineering team.

Use Open-Weight Coding Models in KiloClaw

KiloClaw brings the same model flexibility to managed OpenClaw automations. Pick GLM-5.1, GLM-5, Kimi K2.6, MiniMax M2.5, Trinity Large Thinking, or another Kilo-hosted model for workflows that run beyond the IDE.

Route by Workflow

Use GLM-5.1 for long-running research and planning, MiniMax M2.5 for fast routine automations, and Trinity Large Thinking for reasoning-heavy tasks.

Automate Outside the IDE

Let OpenClaw recipes work across browser tasks, documents, inboxes, calendars, business apps, and recurring research while still choosing the model behind each step.

Keep Model Control

KiloClaw runs on Kilo Gateway, so teams can access 500+ hosted models, compare OSS options, and avoid rebuilding automations around one provider.

Open-Source Coding Models: FAQ

Quick answers to the most common questions about open-source and open-weight coding models in 2026.

What is the best open-source model for coding in 2026?+

GLM-5.1 from Z.ai is the strongest all-around open-source coding model in 2026 for long-horizon agentic engineering. Kimi K2.6 leads on SWE-Bench Pro, DeepSeek V4-Pro leads on LiveCodeBench and 1M-context tasks, and Qwen3-Coder-Next offers the best efficiency per active parameter. All four are available in Kilo Code.

What is the difference between open-source and open-weight coding models?+

Open-source models publish weights, training code, and datasets — you can fully reproduce them. Open-weight models publish the weights so you can download and run the model, but training code and data may be private. Most modern coding LLMs (GLM-5, Kimi K2.6, DeepSeek V4, Qwen3-Coder) are open-weight under permissive licenses such as Apache 2.0, MIT, or Modified MIT.

Can open-source coding models match Claude or GPT-5?+

Yes, on coding benchmarks the gap is now small. DeepSeek V4-Pro leads all evaluated models on LiveCodeBench (93.5) and Codeforces Rating (3206). Kimi K2.6 outscores GPT-5.4 and Claude Opus 4.6 on SWE-Bench Pro. The Stanford AI Index 2025 shows the gap between top and 10th-ranked models fell from 11.9% to 5.4% in one year.

What is the best open-weight coding model I can run locally?+

For consumer hardware (single 24GB GPU), Devstral Small 2 and Qwen3.6-27B deliver strong SWE-Bench performance. For workstations with 80GB+ VRAM, Qwen3-Coder-Next (80B-A3B) and OpenHands LM 32B run well. For server-class hardware, GLM-5 and Kimi K2.6 weights are publicly downloadable from Hugging Face.

Are open-weight coding models free?+

The weights themselves are free to download under Apache 2.0, MIT, or Modified MIT licenses. You pay only for compute when you self-host. Hosted access through Kilo Code is pay-as-you-go with no markup, and several models (such as MiniMax M2.5) are offered free in Kilo from time to time.

Which open-source coding models work in Kilo Code?+

Kilo Code supports 500+ models including GLM-5.1, GLM-5, Kimi K2.6, DeepSeek V4-Pro, Qwen3-Coder, Qwen3.6, MiniMax M2.5, Trinity Large Thinking, and many others. You can also bring your own keys for OpenRouter, Together AI, Z.ai, or any OpenAI-compatible provider, or run local models through Ollama, LM Studio, vLLM, or SGLang.

Start Using the Best Open-Source Coding Models Today

Install Kilo Code and get instant access to GLM-5.1, Kimi K2.6, DeepSeek V4-Pro, Qwen3-Coder, MiniMax M2.5, and 500+ other open and frontier models. Free to start, no credit card required.