Skip to main content
Updated June 2026 · Tracking the leading open-weight coding models

Best Open-Source & Open-Weight AI Coding Models in 2026

The best open-source coding models in 2026 are GLM-5.1, MiniMax M3 (just released, June 2026), Kimi K2.6, DeepSeek V4-Pro, V4-Flash, and Qwen3-Coder-Next for agentic work; Nemotron 3 Super and MiniMax M2.5 for free hosted use; and Qwen3.6-27B for local development. All ship as open weights under Apache 2.0, MIT, Modified MIT, or the NVIDIA Nemotron Open Model License — use them in Kilo Code, run them locally, or bring your own keys.

Best Open-Source Coding Models Ranked (2026)

Ranked by SWE-Bench Verified, SWE-Bench Pro, Terminal-Bench 2.0/2.1, LiveCodeBench, and real-world Kilo leaderboard usage. Each model links to a full review and is available either hosted in Kilo Code, locally via Ollama / LM Studio / vLLM, or through your own provider keys.

#1

GLM-5.1

Z.ai

In Kilo

Best for long-horizon agentic engineering

Z.ai's flagship for agentic coding: stronger judgment on ambiguous problems, sustained iteration over thousands of tool calls, SOTA on SWE-Bench Pro, and large gains on NL2Repo and Terminal-Bench 2.0.

Parameters
744B · 40B active
Context
200K
License
MIT
Top benchmark
Terminal-Bench 2.0 SOTA
View GLM-5.1 in Kilo
#2

MiniMax M3

MiniMax

In Kilo

Best new frontier open-weight model (June 2026)

Released June 1, 2026. The first open-weight model to combine frontier-tier coding with 1M-token context AND native multimodality (image/video input + computer use). 59.0% SWE-Bench Pro (beats GPT-5.5 and Gemini 3.1 Pro), 66.0% Terminal-Bench 2.1, 74.2% MCP Atlas, 70.06% OSWorld-Verified. Built on the new MSA (MiniMax Sparse Attention) architecture. Open weights and tech report committed within ~10 days of launch.

Parameters
MoE (MSA)
Context
1M
License
Open weights
Top benchmark
SWE-Bench Pro 59.0%
View MiniMax M3 in Kilo
#3

Kimi K2.6

Moonshot AI

In Kilo

Best for agent swarms and long autonomous runs

Held the highest SWE-Bench Pro score of any open-weight model at its April 2026 release (58.6%) and remains a top-tier choice. Native 300-sub-agent swarms, 4,000-step coordination, 12-hour autonomous runs. Modified MIT, weights on Hugging Face.

Parameters
1T · 32B active
Context
256K
License
Modified MIT
Top benchmark
SWE-Bench Pro 58.6%
View Kimi K2.6 in Kilo
#4

DeepSeek V4-Pro

DeepSeek models
In Kilo

Best for 1M-context and competitive coding

1.6T MoE with a true 1M-token context window, #1 on LiveCodeBench (93.5) and Codeforces (3206) among all evaluated models including closed APIs. MIT license, FP4/FP8 mixed precision.

Parameters
1.6T · 49B active
Context
1M
License
MIT
Top benchmark
LiveCodeBench 93.5
View DeepSeek V4-Pro in Kilo
#5

DeepSeek V4-Flash

DeepSeek models
In Kilo

Best cost-efficient self-hosted MoE

The lighter-weight half of the V4 family: 284B total / 13B active with the same 1M context window as V4-Pro. 79% SWE-Bench Verified, 91.6 LiveCodeBench (Max mode), and a 98% prompt cache discount put it in the top tier for cost-to-quality. MIT license, weights on Hugging Face, runs on a single H100 (or quantized to RTX 4090).

Parameters
284B · 13B active
Context
1M
License
MIT
Top benchmark
SWE-Bench Verified 79%
View DeepSeek V4-Flash in Kilo
#6

Qwen3-Coder-Next

Alibaba Qwen

In Kilo

Best efficiency per active parameter

80B total / 3B active hybrid MoE built specifically for coding agents. Competitive with much larger frontier and open-weight models on SWE-Bench Verified, Multilingual, and Pro. Apache 2.0.

Parameters
80B · 3B active
Context
256K
License
Apache 2.0
Top benchmark
SWE-Bench Verified 71.3
View Qwen3-Coder-Next in Kilo
#7

Qwen3.6-27B

Alibaba Qwen

In Kilo

Best dense model for repo-level coding

Dense 27B model that beats the much larger Qwen3.5-397B-A17B MoE on agentic coding. Matches Claude 4.5 Opus on Terminal-Bench 2.0 (59.3). Apache 2.0, BF16 and FP8 weights on Hugging Face.

Parameters
27B dense
Context
262K
License
Apache 2.0
Top benchmark
SWE-Bench Verified 77.2
View Qwen3.6-27B in Kilo
#8

MiniMax M2.5

MiniMax

In Kilo

Best free hosted open-weight coding model

High-throughput open-weight coding model with heavy real-world usage on the Kilo leaderboard across Code, Plan, Ask, Debug, Review, and OpenClaw workflows. Free hosted variant in Kilo.

Parameters
MoE
Context
200K
License
Open weights
Top benchmark
Free in Kilo
View MiniMax M2.5 in Kilo
#9

GLM-5

Z.ai

In Kilo

Best foundation for local self-hosting

The 744B-A40B MIT-licensed predecessor to GLM-5.1, with published guidance and weights for self-serving via vLLM, SGLang, xLLM, or Ktransformers. First open model to reach SWE-bench Verified frontier tier.

Parameters
744B · 40B active
Context
200K
License
MIT
Top benchmark
#1 Vending Bench 2 OSS
View GLM-5 in Kilo
#10

Nemotron 3 Super

NVIDIA

In Kilo

Best for NVIDIA-optimized agent infrastructure

120B total / 12B active hybrid Mamba-Transformer LatentMoE with native NVFP4 pretraining and Multi-Token Prediction (MTP) for fast speculative decoding. Strong inference throughput, 1M context, and 91.8 RULER@1M long-context retrieval. Free in Kilo. NVIDIA Nemotron Open Model License — open weights, training data, and recipes.

Parameters
120B · 12B active
Context
1M
License
NVIDIA Nemotron Open
Top benchmark
RULER@1M 91.8
View Nemotron 3 Super in Kilo
#11

Devstral 2

Mistral AI

Best Mistral coding model

123B parameter model purpose-built for agentic software engineering. 72.2% on SWE-Bench Verified (self-reported), 256K context. Mistral describes it as 7× more cost-efficient than Sonnet.

Parameters
123B
Context
256K
License
Apache 2.0
Top benchmark
SWE-Bench Verified 72.2
Read release
#12

Trinity Large Thinking

Arcee AI

In Kilo

Best US-origin open reasoning model

399B sparse MoE thinking model for multi-turn tool calling, instruction following, and stable long-horizon agent loops. #2 on PinchBench at launch. Apache 2.0.

Parameters
399B sparse MoE
Context
128K
License
Apache 2.0
Top benchmark
#2 PinchBench
Read release

Open-Source Coding Models Compared (Benchmarks)

Side-by-side specs and benchmarks for the top open-weight coding models. SWE-Bench Verified, Terminal-Bench 2.0, and LiveCodeBench are the most comparable measures of real software-engineering capability today.

ModelParamsContextLicenseSWE-Bench VerifiedTerminal-Bench 2.0/2.1LiveCodeBenchIn Kilo
GLM-5.1744B-A40B200KMITSWE-Bench Pro SOTASOTA OSSYes
MiniMax M3MoE (MSA)1MOpen weights59.0% (Pro)66.0% (2.1)Yes
Kimi K2.61T-A32B256KModified MIT80.2%66.7%89.6%Yes
DeepSeek V4-Pro1.6T-A49B1MMIT80.6%67.9%93.5%Yes
DeepSeek V4-Flash284B-A13B1MMIT79.0%57.9%91.6%Yes
Qwen3-Coder-Next80B-A3B256KApache 2.071.3%Yes
Qwen3.6-27B27B dense262KApache 2.077.2%59.3%Yes
GLM-5744B-A40B200KMITFrontier tierYes
Nemotron 3 Super120B-A12B1MNVIDIA Nemotron Open60.5% (OpenHands)31.0% (Core 2.0)81.2% (v5)Yes
Devstral 2123B256KApache 2.072.2%66.79%BYOK / local

Benchmarks sourced from each model’s technical report and release notes. Results may vary by scaffold, prompt, and agent setup. — = not yet published.

Best Open-Source Coding Model for Each Use Case

Different jobs call for different open-weight models. Pick by need:

NeedRecommendedWhy
Best overall agentic codingGLM-5.1SOTA on SWE-Bench Pro and long-horizon stability.
Best newest open-weight frontier modelMiniMax M3First open-weight model combining frontier coding, 1M context, and native multimodality.
Best for 1M-token contextDeepSeek V4-ProTrue 1M context window, #1 on LiveCodeBench.
Best cost-efficient self-hosted MoEDeepSeek V4-Flash284B/13B active with 1M context, runs on a single H100, MIT licensed.
Best for consumer hardwareQwen3.6-27BDense 27B model with strong SWE-Bench scores that fits a single 24GB GPU.
Best free hosted (long context)Nemotron 3 SuperFree in Kilo; NVIDIA-optimized hybrid Mamba-Transformer with 1M context and fast speculative decoding.
Best free hosted (high throughput)MiniMax M2.5Free in Kilo; high-throughput MoE with heavy real-world usage across Code, Plan, Ask, and Debug workflows.
Best for agent swarmsKimi K2.6Native 300-sub-agent swarm, 12-hour autonomous runs.
Best efficiency per active paramQwen3-Coder-Next80B total / 3B active matches much larger frontier models.
Best US-origin open reasoningTrinity Large ThinkingApache 2.0, multi-turn tool use, long-horizon agent loops.

Open-Source vs Open-Weight Coding Models: What’s the Difference?

Understanding the terminology helps you make informed decisions about which models to use in your development workflow.

Open Source

Fully transparent — includes model weights, training code, datasets, and documentation. You can inspect, modify, and understand exactly how the model was built. Examples include OpenCoder, StarCoder2, and IBM Granite Code.

Open Weight

Model weights available — you can download and run the model, but training code and datasets may not be public. License terms vary: Apache 2.0 and MIT are highly permissive; Modified MIT and bespoke model licenses add usage, distribution, or attribution rules.

The bottom line: both open-source and open-weight coding models give you freedom from single-vendor lock-in. You can run many of them locally, self-host them, use hosted endpoints, fine-tune within the license, or route them through Kilo Code alongside closed frontier models.

Why Choose an Open-Weight Coding Model?

Open-source and open-weight models are getting serious. Here’s why developers are choosing them for production coding agents.

Run Locally

Use Ollama, LM Studio, vLLM, SGLang, or another OpenAI-compatible runtime to run open weights on hardware you control. Keep sensitive prompts on your own network.

No Vendor Lock-In

Your code never depends on a single provider. Switch between local and hosted, or between different models, without changing your workflow.

Rapidly Improving

Open-weight models are moving from chat benchmarks into real agent loops: planning, editing many files, calling tools, reading terminal output, retrying, and staying coherent over long sessions.

Community Driven

Benefit from community fine-tunes, optimizations, and improvements. Open models get better through collective effort.

Cost Effective

Run local queries for the cost of your hardware, choose free hosted promos when they appear, or pay low per-token rates for efficient MoE models. Route cheap work to open models and save frontier models for the hardest steps.

More Inspectable

Download weights, run reproducible tests, audit release notes, compare provider behavior, and pin deployments when you need predictable model behavior.

How to Run Open-Source Coding Models (Local, Hosted, BYOK)

Three ways to use open-weight coding models: locally on hardware you control, hosted in Kilo Code, or through your own provider keys.

1

Run Locally

Install Ollama, LM Studio, vLLM, or another OpenAI-compatible runtime. Download a model such as Qwen3.6-27B, GLM-5, DeepSeek V4, or Trinity, then connect it to Kilo Code. Your code and prompts stay on hardware you control.

Local setup guide →
2

Use Hosted in Kilo

Access 500+ models through Kilo Code’s hosted service, including leaderboard favorites from MiniMax, Qwen, Z.ai, Mistral, DeepSeek, Moonshot, NVIDIA, Arcee, and more. Pay only for what you use, no markup.

All models →
3

Bring Your Own Keys

Connect your own API keys from providers like OpenRouter, Together AI, Google AI, Arcee, Z.ai, or any compatible direct model provider. Full control, full flexibility.

BYOK guides →

Open Source Never Sleeps

Innovation isn’t limited to Silicon Valley. Open-source AI models come from labs and researchers across the globe, democratizing access to cutting-edge technology.

From DeepSeek, Qwen, MiniMax, Kimi, and GLM in China to Arcee’s Trinity family in the United States, Mistral’s Devstral in Europe, Falcon in the UAE, Singapore’s SEAL-LION, and India’s Sarvam — the open model movement is global. The Kilo community evaluates models from every corner of the world for performance, transparency, speed, licensing, and cost.

Open-Weight Models Are Becoming Production Coding Agents

Kilo research, Kilo leaderboard usage, Stanford AI Index data, and new Apache-licensed launches point in the same direction: open models are now credible for real coding work.

By the Numbers

According to Stanford’s 2025 AI Index Report

5.4%performance gap

The gap between top and 10th-ranked models fell from 11.9% to just 5.4% in one year. The frontier is increasingly competitive.

0.7%between #1 and #2

The top two models are now separated by just 0.7%. Chinese open-weight models like DeepSeek and Qwen are competing at the highest level.

Real-World Evidence

From Kilo usage, Kilo research, and recent releases

MiniMax M2.5 and Qwen 3.6 Plus

Recent Kilo leaderboard snapshots show free/open-weight families earning heavy real-world usage across Code, Plan, Ask, Debug, Review, and OpenClaw workflows.

GLM-5.1

GLM-5.1 is the follow-up flagship for longer-running engineering agents. Z.ai describes stronger coding capability, better judgment on ambiguous work, repo generation improvements, terminal-task gains, and sustained iteration over many tool calls.

Kimi K2.6 and DeepSeek V4

MiniMax M3 tops the open-weight SWE-Bench Pro at 59.0% (June 2026), edging past Kimi K2.6’s 58.6% from April. DeepSeek V4-Pro leads all evaluated models on LiveCodeBench (93.5) and Codeforces (3206), including closed frontier APIs.

Trinity Large Thinking

Arcee’s thinking release adds reasoning before answers and improves multi-turn tool use, instruction following, and context coherence for long-running agent loops. The checkpoint is published under Apache 2.0.

The bottom line: treat open-weight models as first-class options. Some are ready for hosted Kilo coding today; others are best used locally or through your own provider while the ecosystem catches up.

Use Open-Source Coding Models Everywhere with Kilo Code

Kilo works where you work. Build solo or with your engineering team.

Use Open-Weight Coding Models in KiloClaw

KiloClaw brings the same model flexibility to managed OpenClaw automations. Pick GLM-5.1, GLM-5, Kimi K2.6, MiniMax M2.5, Trinity Large Thinking, or another Kilo-hosted model for workflows that run beyond the IDE.

Route by Workflow

Use GLM-5.1 for long-running research and planning, MiniMax M2.5 for fast routine automations, and Trinity Large Thinking for reasoning-heavy tasks.

Automate Outside the IDE

Let OpenClaw recipes work across browser tasks, documents, inboxes, calendars, business apps, and recurring research while still choosing the model behind each step.

Keep Model Control

KiloClaw runs on Kilo Gateway, so teams can access 500+ hosted models, compare OSS options, and avoid rebuilding automations around one provider.

Open-Source Coding Models: FAQ

Quick answers to the most common questions about open-source and open-weight coding models in 2026.

What is the best open-source model for coding in 2026?+

GLM-5.1 from Z.ai is the strongest all-around open-source coding model in 2026 for long-horizon agentic engineering. MiniMax M3 (released June 2026) is the first open-weight model to combine frontier coding, 1M context, and native multimodality, and tops the open-weight SWE-Bench Pro at 59.0%. Kimi K2.6 excels at agent swarms and long autonomous runs, DeepSeek V4-Pro leads on LiveCodeBench and 1M-context tasks, and Qwen3-Coder-Next offers the best efficiency per active parameter. All five are available in Kilo Code.

What is the difference between open-source and open-weight coding models?+

Open-source models publish weights, training code, and datasets — you can fully reproduce them. Open-weight models publish the weights so you can download and run the model, but training code and data may be private. Most modern coding LLMs ( GLM-5, Kimi K2.6, DeepSeek V4, Qwen3-Coder) are open-weight under permissive licenses such as Apache 2.0, MIT, or Modified MIT.

Can open-source coding models match Claude or GPT-5?+

Yes, on coding benchmarks the gap is now small. DeepSeek V4-Pro leads all evaluated models on LiveCodeBench (93.5) and Codeforces Rating (3206). MiniMax M3 surpasses GPT-5.5 and Gemini 3.1 Pro on SWE-Bench Pro at 59.0%, and Kimi K2.6 (58.6%) sits just behind. The Stanford AI Index 2025 shows the gap between top and 10th-ranked models fell from 11.9% to 5.4% in one year.

What is the best open-weight coding model I can run locally?+

For consumer hardware (single 24GB GPU), Qwen3.6-27B delivers strong SWE-Bench performance. For a single 80GB H100 or AMD MI300X, DeepSeek V4-Flash (MIT, 1M context) is a standout choice. For server-class hardware, GLM-5, Kimi K2.6, MiniMax M3, and Nemotron 3 Super are all available in Kilo Code and run well via vLLM, SGLang, or other OpenAI-compatible runtimes.

Are open-weight coding models free?+

The weights themselves are free to download under Apache 2.0, MIT, or Modified MIT licenses. You pay only for compute when you self-host. Hosted access through Kilo Code is pay-as-you-go with no markup, and several models (such as MiniMax M2.5) are offered free in Kilo from time to time.

Which open-source coding models work in Kilo Code?+

Kilo Code supports 500+ models including GLM-5.1, GLM-5, MiniMax M3, Kimi K2.6, DeepSeek V4-Pro, DeepSeek V4-Flash, Qwen3-Coder, Qwen3.6, Nemotron 3 Super, MiniMax M2.5, Trinity Large Thinking, and many others. You can also bring your own keys for OpenRouter, Together AI, Z.ai, or any OpenAI-compatible provider, or run local models through Ollama, LM Studio, vLLM, or SGLang.

Start Using the Best Open-Source Coding Models Today

Install Kilo Code and get instant access to GLM-5.1, Kimi K2.6, DeepSeek V4-Pro, Qwen3-Coder, MiniMax M2.5, and 500+ other open and frontier models. Free to start, no credit card required.