Use Cutting-Edge Open-Weight AI Coding Models in Kilo Code
Get the top open-source and open-weight AI models in Kilo Code: GLM-5.1 and GLM-5 when you want hosted agentic coding, MiniMax M2.5 for practical free OSS workflows, Trinity Large Thinking for open reasoning, and local models you run yourself. No vendor lock-in.
Latest Open-Weight Releases to Watch
The open model frontier is moving every week. These releases are worth tracking now: use Kilo-hosted models when they appear in the picker, use local runtimes for downloaded weights, or connect OpenAI-compatible providers with BYOK.
GLM-5.1
GLM-5.1 is Z.ai's newest model for agentic engineering. It is designed to stay productive across long sessions: breaking down ambiguous problems, running experiments, reading terminal output, identifying blockers, and revising strategy over many tool calls.
GLM-5
GLM-5 is an Apache-licensed 744B-A40B open foundation model built for complex systems design and long-horizon agent workflows. Z.ai publishes local-serving guidance and weights for teams that want to host the GLM-5 series themselves.
Trinity Large Thinking
Arcee upgraded Trinity Large from a preview instruct model into a thinking model for multi-turn tool calling, cleaner instruction following, and stable long-horizon agent loops. The weights are released under Apache 2.0.
MiniMax M2.5
MiniMax M2.5 is a high-throughput open-weight coding model built for real-world productivity. Kilo developers are using the free hosted variant heavily across coding and agent workflows, making it one of the most practical OSS models to try first.
Open Source Never Sleeps
Innovation isn't limited to Silicon Valley. Open-source AI models come from labs and researchers across the globe, democratizing access to cutting-edge technology.
From DeepSeek, Qwen, MiniMax, Kimi, and GLM in China to Arcee's Trinity family in the United States, Mistral's Devstral in Europe, Falcon in the UAE, Singapore's SEAL-LION, and India's Sarvam - the open model movement is global. The Kilo community evaluates models from every corner of the world for performance, transparency, speed, licensing, and cost.
Open Source vs Open Weight: What's the Difference?
Understanding the terminology helps you make informed decisions about which models to use in your development workflow.
Open Source
Fully transparent - includes model weights, training code, datasets, and documentation. You can inspect, modify, and understand exactly how the model was built. Examples include smaller research models and community projects.
Open Weight
Model weights available - you can download and run the model, but training code and datasets may not be public. License terms vary: Apache 2.0 and MIT are highly permissive, while other model licenses add usage, distribution, or attribution rules.
The Bottom Line: Both open-source and open-weight models give you freedom from single-vendor lock-in. You can run many of them locally, self-host them, use hosted endpoints, fine-tune within the license, or route them through Kilo Code alongside closed frontier models.
Use Open Source Models Everywhere with Kilo Code
Kilo works where you work. Build solo or with your engineering team.
Why more developers are switching to OSS Models
Open-source and open-weight models are getting serious. Here's why developers are choosing them.
Run Locally
Use Ollama, LM Studio, vLLM, SGLang, or another OpenAI-compatible runtime to run open weights on hardware you control. Keep sensitive prompts on your own network.
No Vendor Lock-In
Your code never depends on a single provider. Switch between local and hosted, or between different models, without changing your workflow.
Rapidly Improving
Open-weight models are moving from chat benchmarks into real agent loops: planning, editing many files, calling tools, reading terminal output, retrying, and staying coherent over long sessions.
Community Driven
Benefit from community fine-tunes, optimizations, and improvements. Open models get better through collective effort.
Cost Effective
Run local queries for the cost of your hardware, choose free hosted promos when they appear, or pay low per-token rates for efficient MoE models. Route cheap work to open models and save frontier models for the hardest steps.
More Inspectable
Download weights, run reproducible tests, audit release notes, compare provider behavior, and pin deployments when you need predictable model behavior.
Top Open Source Models
Open-weight and open-source model families used by Kilo developers. The catalog refreshes hourly from Kilo model stats; visit the leaderboard for real-world usage by mode.
Arcee AI: Trinity Large Preview (free)
Trinity-Large-Preview is a frontier-scale open-weight language model from Arcee, built as a 400B-parameter sparse Mixture-of-Experts with 13B active parameters per token using 4-of-256 expert routing. It excels in creative writing,...
DeepSeek: DeepSeek V3.1 Terminus
DeepSeek-V3.1 Terminus is an update to [DeepSeek V3.1](/deepseek/deepseek-chat-v3.1) that maintains the model's original capabilities while addressing issues reported by users, including language consistency and agent capabilities, further optimizing the model's...
Google: Gemma 4 26B A4B (free)
Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google DeepMind. Despite 25.2B total parameters, only 3.8B activate per token during inference — delivering near-31B quality at...
Google: Gemma 4 31B
Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and image input with text output. Features a 256K token context window, configurable thinking/reasoning mode, native function...
Kwaipilot: KAT-Coder-Pro V1
kwaipilot
KAT-Coder-Pro V1 is KwaiKAT's most advanced agentic coding model in the KAT-Coder series. Designed specifically for agentic coding tasks, it excels in real-world software engineering scenarios, achieving 73.4% solve rate on the SWE-Bench Verified benchmark. The model has been optimized for tool-use capability, multi-turn interaction, instruction following, generalization, and comprehensive capabilities through a multi-stage training process, including mid-training, supervised fine-tuning (SFT), reinforcement fine-tuning (RFT), and scalable agentic RL.
MiniMax: MiniMax M2.1
minimax
MiniMax-M2.1 is a lightweight, state-of-the-art large language model optimized for coding, agentic workflows, and modern application development. With only 10 billion activated parameters, it delivers a major jump in real-world...
MiniMax: MiniMax M2.5
MiniMax-M2.5 is a SOTA large language model designed for real-world productivity. Trained in a diverse range of complex real-world digital working environments, M2.5 builds upon the coding expertise of M2.1...
MiniMax: MiniMax M2.5 (free)
minimax
MiniMax-M2.5 is a SOTA large language model designed for real-world productivity. Trained in a diverse range of complex real-world digital working environments, M2.5 builds upon the coding expertise of M2.1...
MiniMax: MiniMax M2.7
MiniMax-M2.7 is a next-generation large language model designed for autonomous, real-world productivity and continuous improvement. Built to actively participate in its own evolution, M2.7 integrates advanced agentic capabilities through multi-agent...
Open-Weight Models Are Becoming Production Coding Agents
Kilo research, Kilo leaderboard usage, Stanford AI Index data, and new Apache-licensed launches point in the same direction: open models are now credible for real coding work.
By the Numbers
According to Stanford's 2025 AI Index Report
The gap between top and 10th-ranked models fell from 11.9% to just 5.4% in one year. The frontier is increasingly competitive.
The top two models are now separated by just 0.7%. Chinese open-weight models like DeepSeek and Qwen are competing at the highest level.
Source: Stanford HAI AI Index 2025
Real-World Evidence
From Kilo usage, Kilo research, and recent releases
MiniMax M2.5 and Qwen 3.6 Plus
Recent Kilo leaderboard snapshots show free/open-weight families earning heavy real-world usage across Code, Plan, Ask, Debug, Review, and OpenClaw workflows.
GLM-5
Z.ai's 744B-A40B GLM-5 moves from vibe coding toward agentic engineering: complex systems design, long-horizon workflow targets, open weights, and local deployment paths through vLLM, SGLang, xLLM, and Ktransformers.
GLM-5.1
GLM-5.1 is the follow-up flagship for longer-running engineering agents. Z.ai describes stronger coding capability, better judgment on ambiguous work, repo generation improvements, terminal-task gains, and sustained iteration over many tool calls.
Trinity Large Thinking
Arcee's thinking release adds reasoning before answers and improves multi-turn tool use, instruction following, and context coherence for long-running agent loops. The checkpoint is published under Apache 2.0.
Gemma 4 from Google
Google calls Gemma 4 its first Gemma release that is truly open source: Apache 2.0 terms, downloadable weights from edge-scale sizes through 31B parameters, local and private deployment, modification rights, and native function calling for agentic apps.
The Bottom Line: Treat open-weight models as first-class options. Some are ready for hosted Kilo coding today; others are best used locally or through your own provider while the ecosystem catches up.
How to Use Open Source Models in Kilo Code
Three ways to run open models: locally, hosted in Kilo, or through your own providers.
Run Locally
Install Ollama, LM Studio, vLLM, or another OpenAI-compatible runtime. Download a model such as Gemma, Devstral, Qwen, GLM, DeepSeek, or Trinity, then connect it to Kilo Code. Your code and prompts stay on hardware you control.
Learn More →Use Hosted
Access 500+ models through Kilo Code's hosted service, including leaderboard favorites from MiniMax, Qwen, Z.ai, Mistral, DeepSeek, NVIDIA, Arcee, and more. Pay only for what you use, no markup.
All Models →Bring Your Own Keys
Connect your own API keys from providers like OpenRouter, Together AI, Google AI, Arcee, Z.ai, or any compatible direct model provider. Full control, full flexibility.
Setup Guides →Use Open-Weight Models in KiloClaw
KiloClaw brings the same model flexibility to managed OpenClaw automations. Pick GLM-5.1, GLM-5, MiniMax M2.5, Trinity Large Thinking, or another Kilo-hosted model for workflows that run beyond the IDE.
Route by Workflow
Use GLM-5.1 for long-running research and planning, MiniMax M2.5 for fast routine automations, and Trinity Large Thinking for reasoning-heavy tasks.
Automate Outside the IDE
Let OpenClaw recipes work across browser tasks, documents, inboxes, calendars, business apps, and recurring research while still choosing the model behind each step.
Keep Model Control
KiloClaw runs on Kilo Gateway, so teams can access 500+ hosted models, compare OSS options, and avoid rebuilding automations around one provider.