Best AI Models in Kilo

Grok Build 0.1

Built for agentic engineering — 256k context, no output limits

context window256k

GPT 5.5

Remarkably detailed and consistent across modes

SWE-bench Verified88.7%

Nemotron 3 Ultra

Frontier-class coding model from NVIDIA. Currently free in Kilo.

PinchBench91%

KiloEval Real usage Artificial Analysis OpenRouter

From our researchThe Quiet Arrival of Grok Build 0.1 We Audited the Same Codebase With Claude Opus 4.8 NVIDIA Nemotron 3 Ultra Review

Kilo Benchmark

Cost vs performance across the best coding models

Bubble size indicates Kilo usage over the last 7 days

Top 10 Most Capable Models

Rank	Model	Completion	Cost per attempt
1	GPT-5.5	74.2%	$72.63
2	Claude Fable 5 ($$$$)	71.0%	$87.52
3	Grok 4.5	70.8%	$27.29
4	GPT-5.6 Sol	70.3%	$58.99
5	Claude Opus 4.7	70.1%	$100.51
6	Claude Opus 4.8	67.6%	$85.19
7	Gemini 3.5 Flash	64.7%	$104.49
8	Kimi K2.7 Code	60.7%	$32.94
9	MMuse Spark 1.1	59.8%	$30.15
10	Claude Sonnet 5	59.6%	$36.19

Official Kilo eval results on Terminal Bench 2.0. Cost and token usage are averaged per complete benchmark attempt.

Top Models by Mode

See which models lead in Code, Plan, Debug, Ask, and Orchestrator

Code

Rank	Model	Usage
01	step-3.7-flash	48.9%
02	hy3	25.2%
03	laguna-m.1	4.3%
04	nemotron-3-ultra-550b-a55b	3.9%
05	deepseek-v4-pro	3.4%
06	minimax-m3	2.8%
07	deepseek-v4-flash	1.7%
08	laguna-xs-2.1	1.7%
09	glm-5.2	1.0%
10	claude-sonnet-5	0.8%

Plan

Rank	Model	Usage
01	step-3.7-flash	36.5%
02	hy3	22.3%
03	deepseek-v4-pro	7.4%
04	nemotron-3-ultra-550b-a55b	7.0%
05	laguna-m.1	5.3%
06	minimax-m3	3.7%
07	kimi-k3	1.8%
08	glm-5.2	1.7%
09	deepseek-v4-flash	1.6%
10	gpt-5.6-terra	1.5%

Ask

Rank	Model	Usage
01	step-3.7-flash	44.1%
02	hy3	17.8%
03	deepseek-v4-flash	8.8%
04	laguna-m.1	4.4%
05	deepseek-v4-pro	3.6%
06	nemotron-3-ultra-550b-a55b	2.7%
07	glm-5.2	2.3%
08	qwen3.7-plus	2.1%
09	claude-opus-4.8	2.0%
10	minimax-m3	1.8%

Debug

Rank	Model	Usage
01	step-3.7-flash	41.6%
02	hy3	22.6%
03	nemotron-3-ultra-550b-a55b	7.2%
04	laguna-m.1	6.2%
05	deepseek-v4-pro	3.8%
06	deepseek-v4-flash	3.0%
07	minimax-m3	2.4%
08	claude-opus-4.8	2.2%
09	laguna-xs-2.1	1.9%
10	nemotron-3-super-120b-a12b	1.3%

Review

Rank	Model	Usage
01	laguna-m.1	32.5%
02	step-3.7-flash	21.6%
03	hy3	13.1%
04	minimax-m3	6.7%
05	deepseek-v4-pro	6.5%
06	gpt-5.5	3.5%
07	gpt-5.6-sol	3.0%
08	kimi-k2.7-code	2.4%
09	Mmuse-spark-1.1	2.0%
10	qwen3.7-plus	1.7%

KiloClaw

Rank	Model	Usage
01	claude-sonnet-5	22.2%
02	step-3.7-flash	21.6%
03	qwen3.7-plus	14.9%
04	minimax-m3	11.6%
05	hy3	10.5%
06	claude-opus-4.8	3.1%
07	gemma-4-26b-a4b-it	2.7%
08	laguna-m.1	1.9%
09	deepseek-v4-flash	1.8%
10	nemotron-3-ultra-550b-a55b	1.7%

Top Models Today

Most-used models across Kilo Code in the last 24 hours

Rank	Model	Usage
01	step-3.7-flash	263.8B
02	atlasforcausallm	148.0B
03	ling	70.2B
04	deepseek-v4-pro	69.4B
05	deepseek-v4-flash	57.2B
06	glm-5.2	48.2B
07	minimax-m3	39.7B
08	gpt-5.6-sol	33.7B
09	benchflow-ckpt-model_d2036e94-3852-4f68-928e-b5be5fd6ba14	17.3B
10	claude-opus-4.8	13.8B

Daily Top Models

Token usage by model over time, stacked daily

All Models

Browse and compare all available AI coding models

Hy3 (free)

Hy3 is a 295B-parameter Mixture-of-Experts model from Tencent (21B active, 192 experts with top-8 routing) built for reasoning, agentic workflows, and real-world production use. It supports a configurable reasoning effort:...

Kilo Bench completion47.6%

$0.00/1M$0.00/attempt

DeepSeek V4 Flash

DeepSeek V4 Flash is an efficiency-optimized Mixture-of-Experts model from DeepSeek with 284B total parameters and 13B activated parameters, supporting a 1M-token context window. It is designed for fast inference and...

Coding index52.0

$0.09/1M

Kilo Bench completion44.0%

DeepSeek V4 Pro

DeepSeek V4 Pro is a large-scale Mixture-of-Experts model from DeepSeek with 1.6T total parameters and 49B activated parameters, supporting a 1M-token context window. It is designed for advanced reasoning, coding,...

$0.43/1M$15.91/attempt

GLM 5.2

Kilo Bench completion53.0%

GLM 5.2 is a large-scale reasoning model from Z.ai. It supports text input and output with a 1M-token context window, and is suited for long-horizon agent workflows, project-level software engineering,...

$1.40/1M$26.21/attempt

GPT-5.4

GPT-5.4 is OpenAI’s latest frontier model, unifying the Codex and GPT lines into a single system. It features a 1M+ token context window (922K input, 128K output) with support for...

Code mode rank#13

$2.50/1M

Kilo Bench completion25.4%

Laguna M.1 (free)

Laguna M.1 is the flagship coding agent model from [Poolside](https://poolside.ai/), optimized for complex software engineering tasks. Designed for agentic coding workflows, it supports tool calling and reasoning, with a 256K...

$0.00/1M$0.00/attempt

Claude Sonnet 4.5

Claude Sonnet 4.5 is Anthropic’s most advanced Sonnet model to date, optimized for real-world agents and coding workflows. It delivers state-of-the-art performance on coding benchmarks such as SWE-bench Verified, with...

Coding index52.1

$3.00/1M

Nemotron 3 Ultra (free)

NVIDIA Nemotron 3 Ultra is an open frontier-reasoning and orchestration model from NVIDIA, with 55B active parameters out of 550B total (MoE). Built on a hybrid Transformer-Mamba mixture-of-experts architecture, it...

Code mode rank#22

GLM 5

GLM-5 is Z.ai’s flagship open-source foundation model engineered for complex systems design and long-horizon agent workflows. Built for expert developers, it delivers production-grade performance on large-scale programming tasks, rivaling leading...

Code mode rank#24

$1.00/1M

Claude Opus 4.5

Claude Opus 4.5 is Anthropic’s frontier reasoning model optimized for complex software engineering, agentic workflows, and long-horizon computer use. It offers strong multimodal capabilities, competitive performance across real-world coding and...

Code mode rank#28

$5.00/1M

Kilo Bench completion64.7%

Gemini 3.5 Flash

Gemini 3.5 Flash is Google's high-efficiency multimodal model, bringing near-Pro level coding and reasoning at Flash-tier cost and speed. It is highly optimized for coding proficiency and parallel agentic execution...

$1.50/1M$104.49/attempt

GPT-5.6 Terra

GPT-5.6 Terra is a balanced model in OpenAI's GPT-5.6 series, positioned between the flagship Sol tier and the cost-efficient Luna tier. It is suited for everyday coding, reasoning, and agentic...

Code mode rank#35

$2.50/1M

Claude Sonnet 5

Kilo Bench completion59.6%

Sonnet 5 is Anthropic's most capable Sonnet-class model, with frontier performance across coding, agents, and professional work. It supports adaptive thinking with selectable reasoning effort levels (low, medium, high, max,...

$2.00/1M$36.19/attempt

Kilo Bench completion70.8%

Grok 4.5

Grok 4.5 is SpaceXAI's smartest model with frontier performance on coding, knowledge work, and STEM.

$2.00/1M$27.29/attempt

Kilo Bench completion46.7%

kilo-auto/efficient

$19.60/attempt

Ling-2.6-1T (free)

Ling-2.6-1T is an instant (instruct) model from inclusionAI and the company’s trillion-parameter flagship, designed for real-world agents that require fast execution and high efficiency at scale. It uses a “fast...

Code mode rank#60

GPT-5.2

GPT-5.2 is the latest frontier-grade model in the GPT-5 series, offering stronger agentic and long context perfomance compared to GPT-5.1. It uses adaptive reasoning to allocate computation dynamically, responding quickly...

Code mode rank#61

$1.75/1M

GLM 4.7

GLM-4.7 is Z.ai’s latest flagship model, featuring upgrades in two key areas: enhanced programming capabilities and more stable multi-step reasoning/execution. It demonstrates significant improvements in executing complex agent tasks while...

Code mode rank#74

$0.60/1M

MiniMax M3

Kilo Bench completion47.6%

MiniMax-M3 is a multimodal foundation model from MiniMax. It supports text, image, and video inputs with text output, a 1M-token context window, and is suited for long-horizon agentic work, coding,...

$0.30/1M$10.35/attempt

Kilo Bench completion19.1%

Nemotron 3 Ultra

$0.50/1M$101.82/attempt

Grok Code Fast 1

Grok Code Fast 1 is a speedy and economical reasoning model that excels at agentic coding. With reasoning traces visible in the response, developers can steer Grok Code for high-quality...

Code mode rank#87

$0.20/1M

Kilo Bench completion60.7%

Kimi K2.7 Code

MoonshotAI: Kimi K2.7 Code is a coding-focused model in Moonshot AI's Kimi K2 family, built to complete end-to-end programming tasks reliably over long contexts. It uses a native multimodal mixture-of-experts...

$0.95/1M$32.94/attempt

Kilo Bench completion54.6%

Qwen3.7 Max (50% off)

Qwen3.7-Max is the flagship model in Alibaba's Qwen3.7 series. It supports text input and output and is designed for agent-centric workloads, with particular strengths in coding, office and productivity tasks,...

$1.25/1M$20.65/attempt

Kilo Bench completion47.6%

MiMo-V2.5-Pro

MiMo-V2.5-Pro is Xiaomi’s flagship model, delivering strong performance in general agentic capabilities, complex software engineering, and long-horizon tasks, with top rankings on benchmarks such as ClawEval, GDPVal, and SWE-bench Pro....

$0.43/1M$4.92/attempt

MiniMax M2.1

MiniMax-M2.1 is a lightweight, state-of-the-art large language model optimized for coding, agentic workflows, and modern application development. With only 10 billion activated parameters, it delivers a major jump in real-world...

Code mode rank#100

Kilo Bench completion71.0%

Claude Fable 5 ($$$$)

Claude Fable 5 is a Mythos-class model from Anthropic, built for autonomous knowledge work and coding. It supports text, image, and file inputs with text output, with reasoning support and...

$10.00/1M$87.52/attempt

Claude Haiku 4.5

Claude Haiku 4.5 is Anthropic’s fastest and most efficient model, delivering near-frontier intelligence at a fraction of the cost and latency of larger Claude models. Matching Claude Sonnet 4’s performance...

anthropic

$1.00/1M

Claude Opus 4.6

Opus 4.6 is Anthropic’s strongest model for coding and long-running professional tasks. It is built for agents that operate across entire workflows rather than single prompts, making it especially effective...

anthropic

$5.00/1M

Kilo Bench completion70.1%

Claude Opus 4.7

Opus 4.7 is the next generation of Anthropic's Opus family, built for long-running, asynchronous agents. Building on the coding and agentic strengths of Opus 4.6, it delivers stronger performance on...

$5.00/1M$100.51/attempt

Claude Opus 4.8

Kilo Bench completion67.6%

Claude Opus 4.8 is Anthropic's most capable generally available model in the Opus family. It supports text, image, and file inputs with text output, with reasoning support and a 1M-token...

$5.00/1M$85.19/attempt

Kilo Bench completion55.1%

Claude Sonnet 4.6

Sonnet 4.6 is Anthropic's most capable Sonnet-class model yet, with frontier performance across coding, agents, and professional work. It excels at iterative development, complex codebase navigation, end-to-end project management with...

$3.00/1M$53.37/attempt

free

baidu

DeepSeek V3.1 Terminus

DeepSeek-V3.1 Terminus is an update to [DeepSeek V3.1](/deepseek/deepseek-chat-v3.1) that maintains the model's original capabilities while addressing issues reported by users, including language consistency and agent capabilities, further optimizing the model's...

Coding index43.5

$0.27/1M

Gemini 2.5 Flash

Gemini 2.5 Flash is Google's state-of-the-art workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. It includes built-in "thinking" capabilities, enabling it to provide responses with greater...

Gemini 3 Flash Preview

Gemini 3 Flash Preview is a high speed, high value thinking model designed for agentic workflows, multi turn chat, and coding assistance. It delivers near Pro level reasoning and tool...

$0.50/1M

Gemini 3 Pro Preview

Gemini 3 Pro is Google’s flagship frontier model for high-precision multimodal reasoning, combining strong performance across text, image, video, audio, and code with a 1M-token context window. Reasoning Details must be preserved when using multi-turn tool calling, see our docs here: https://openrouter.ai/docs/use-cases/reasoning-tokens#preserving-reasoning-blocks. It delivers state-of-the-art benchmark results in general reasoning, STEM problem solving, factual QA, and multimodal understanding, including leading scores on LMArena, GPQA Diamond, MathArena Apex, MMMU-Pro, and Video-MMMU. Interactions emphasize depth and interpretability: the model is designed to infer intent with minimal prompting and produce direct, insight-focused responses. Built for advanced development and agentic workflows, Gemini 3 Pro provides robust tool-calling, long-horizon planning stability, and strong zero-shot generation for complex UI, visualization, and coding tasks. It excels at agentic coding (SWE-Bench Verified, Terminal-Bench 2.0), multimodal analysis, and structured long-form tasks such as research synthesis, planning, and interactive learning experiences. Suitable applications include autonomous agents, coding assistants, multimodal analytics, scientific reasoning, and high-context information processing.

$2.00/1M

Gemini 3.1 Pro Preview

Gemini 3.1 Pro Preview is Google’s frontier reasoning model, delivering enhanced software engineering performance, improved agentic reliability, and more efficient token usage across complex workflows. Building on the multimodal foundation...

$2.00/1M

Gemma 4 26B A4B

Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google DeepMind. Despite 25.2B total parameters, only 3.8B activate per token during inference — delivering near-31B quality at...

$0.06/1M

Gemma 4 26B A4B (free)

Gemma 4 31B

Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and image input with text output. Features a 256K token context window, configurable thinking/reasoning mode, native function...

$0.10/1M

Kilo Bench completion28.1%

Ling-2.6-1T

$0.30/1M$30.82/attempt

Ling-2.6-flash (free)

Ling-2.6-flash is an instant (instruct) model from inclusionAI with 104B total parameters and 7.4B active parameters, designed for real-world agents that require fast responses, strong execution, and high token efficiency....

inclusionai

Ring-2.6-1T (free)

Ring-2.6-1T is a 1T-parameter-scale thinking model with 63B active parameters, built for real-world agent workflows that require both strong capability and operational efficiency. It is optimized for coding agents, tool...

inclusionai

KAT-Coder-Pro V1

KAT-Coder-Pro V1 is KwaiKAT's most advanced agentic coding model in the KAT-Coder series. Designed specifically for agentic coding tasks, it excels in real-world software engineering scenarios, achieving 73.4% solve rate on the SWE-Bench Verified benchmark. The model has been optimized for tool-use capability, multi-turn interaction, instruction following, generalization, and comprehensive capabilities through a multi-stage training process, including mid-training, supervised fine-tuning (SFT), reinforcement fine-tuning (RFT), and scalable agentic RL.

kwaipilot

$0.21/1M

Kilo Bench completion50.3%

KAT-Coder-Pro V2.5

KAT-Coder-Pro V2.5 is a flagship-level Agentic Coding model that can directly hand over an entire issue or an entire business workflow to it, allowing it to autonomously locate and make...

$0.74/1M$36.16/attempt

Kilo Bench completion59.8%

Muse Spark 1.1

Muse Spark 1.1 is a multimodal reasoning model from Meta, built for agentic tasks. It accepts text, images, video, audio, and PDF documents and returns text, with a 1M-token context...

$1.25/1M$30.15/attempt

MiniMax M2.5

MiniMax-M2.5 is a SOTA large language model designed for real-world productivity. Trained in a diverse range of complex real-world digital working environments, M2.5 builds upon the coding expertise of M2.1...

minimax

MiniMax M2.7

MiniMax-M2.7 is a next-generation large language model designed for autonomous, real-world productivity and continuous improvement. Built to actively participate in its own evolution, M2.7 integrates advanced agentic capabilities through multi-agent...

minimax

Devstral 2 2512

Devstral 2 is a state-of-the-art open-source model by Mistral AI specializing in agentic coding. It is a 123B-parameter dense transformer model supporting a 256K context window. Devstral 2 supports exploring...

mistralai

$0.40/1M

Mistral Medium 3.5

Mistral Medium 3.5 is a dense 128B instruction-following model from Mistral AI. It supports text and image inputs with text output, and is designed for agentic workflows, coding, and complex...

Coding index46.9

$1.50/1M

Kimi K2.5

Kimi K2.5 is Moonshot AI's native multimodal model, delivering state-of-the-art visual coding capability and a self-directed agent swarm paradigm. Built on Kimi K2 with continued pretraining over approximately 15T mixed...

moonshotai

$0.60/1M

Kilo Bench completion54.4%

Kimi K2.6

Kimi K2.6 is Moonshot AI's next-generation multimodal model, designed for long-horizon coding, coding-driven UI/UX generation, and multi-agent orchestration. It handles complex end-to-end coding tasks across Python, Rust, and Go, and...

$0.80/1M$24.84/attempt

Kimi K3 (new)

Kimi K3 is a 2.8T parameter open-weight multimodal reasoning model from Moonshot AI. It is suited for complex coding, knowledge work, and long-horizon agentic workflows, and is particularly strong at...

Coding index76.2

$3.00/1M

Nex-N2-Pro (free)

Nex-N2-Pro is an agentic mixture-of-experts model from Nex AGI, with 17B active parameters out of 397B total. Built on the Qwen3.5 architecture, it accepts text and image input and produces...

nex-agi

Kilo Bench completion15.5%

Nemotron 3 Super (free)

NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameters for maximum compute efficiency and accuracy in complex multi-agent applications. Built on a hybrid Mamba-Transformer...

$0.00/1M$0.00/attempt

GPT-5.1

GPT-5.1 is the latest frontier-grade model in the GPT-5 series, offering stronger general-purpose reasoning, improved instruction adherence, and a more natural conversational style compared to GPT-5. It uses adaptive reasoning...

Coding index49.4

$1.25/1M

GPT-5.1-Codex

GPT-5.1-Codex is a specialized version of GPT-5.1 optimized for software engineering and coding workflows. It is designed for both interactive development sessions and long, independent execution of complex engineering tasks....

openai

$1.25/1M

GPT-5.2-Codex

GPT-5.2-Codex is an upgraded version of GPT-5.1-Codex optimized for software engineering and coding workflows. It is designed for both interactive development sessions and long, independent execution of complex engineering tasks....

openai

$1.75/1M

GPT-5.3-Codex

GPT-5.3-Codex is OpenAI’s most advanced agentic coding model, combining the frontier software engineering performance of GPT-5.2-Codex with the broader reasoning and professional knowledge capabilities of GPT-5.2. It achieves state-of-the-art results...

openai

$1.75/1M

Kilo Bench completion74.2%

GPT-5.5

GPT-5.5 is OpenAI’s frontier model designed for complex professional workloads, building on GPT-5.4 with stronger reasoning, higher reliability, and improved token efficiency on hard tasks. It features a 1M+ token...

$5.00/1M$72.63/attempt

GPT-5.6 Sol

Kilo Bench completion70.3%

GPT-5.6 Sol is the flagship model in OpenAI's GPT-5.6 series. It is suited for complex reasoning, coding, and agentic workflows, and is particularly strong at command-line and multi-step coding tasks...

$5.00/1M$58.99/attempt

Kilo Bench completion32.6%

Laguna M.1 (retires Jul 28)

$0.20/1M$40.55/attempt

Laguna S 2.1 (free)

Laguna S 2.1 is the latest coding agent model from [Poolside](<https://poolside.ai/>). Laguna S 2.1 is a 118B total parameter model with 8B active parameters, scoring 70.2% on Terminal-Bench 2.1 and...

poolside

Kilo Bench completion26.7%

Laguna XS 2.1

Laguna XS 2.1 is the latest coding agent model in the 33B-A3B category from [Poolside](https://poolside.ai/) and a step forward from their Laguna XS.2 model (released in April 2026). It combines...

$0.10/1M$12.03/attempt

Qwen3 Coder 480B A35B

Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is optimized for agentic coding tasks such as function calling, tool use, and long-context reasoning over...

$1.50/1M

Qwen3 Coder Next

Qwen3-Coder-Next is an open-weight causal language model optimized for coding agents and local development workflows. It uses a sparse MoE design with 80B total parameters and only 3B activated per...

Qwen3 Coder Plus

Qwen3 Coder Plus is Alibaba's proprietary version of the Open Source Qwen3 Coder 480B A35B. It is a powerful coding agent model specializing in autonomous programming via tool calling and...

$1.00/1M

Qwen3.6 Plus

Qwen 3.6 Plus builds on a hybrid architecture that combines efficient linear attention with sparse mixture-of-experts routing, enabling strong scalability and high-performance inference. Compared to the 3.5 series, it delivers...

Coding index54.5

$0.50/1M

Qwen3.7 Plus (20% off)

Qwen3.7-Plus is a cost-effective model in Alibaba's Qwen3.7 series. It supports text and image input with text output, building on the series' text capabilities with a comprehensive upgrade to its...

$0.32/1M

Step 3.5 Flash

Step 3.5 Flash is StepFun's most capable open-source foundation model. Built on a sparse Mixture of Experts (MoE) architecture, it selectively activates only 11B of its 196B parameters per token....

stepfun

$0.10/1M

Hy3

Coding index58.8

$0.14/1M

Hy3 preview (free)

Hy3 preview is a high-efficiency Mixture-of-Experts model from Tencent designed for agentic workflows and production use. It supports configurable reasoning levels across disabled, low, and high modes, allowing it to...

tencent

Grok 4.3

Grok 4.3 is a reasoning model from xAI. It accepts text and image inputs with text output, and is suited for agentic workflows, instruction-following tasks, and applications requiring high factual...

Coding index42.2

$1.25/1M

Kilo Bench completion50.6%

Grok Build 0.1

Grok Build 0.1 is xAI’s fast coding model trained specifically for agentic software engineering workflows. It supports text and image inputs with text output, and is optimized for interactive coding...

$1.00/1M$30.70/attempt

MiMo-V2-Flash

Available in Kilo

MiMo-V2.5

MiMo-V2.5 is a native omnimodal model by Xiaomi. It delivers Pro-level agentic performance at roughly half the inference cost, while surpassing MiMo-V2-Omni in multimodal perception across image and video understanding...

Coding index56.8

$0.10/1M

GLM 4.6

Compared with GLM-4.5, this generation brings several key improvements: Longer context window: The context window has been expanded from 128K to 200K tokens, enabling the model to handle more complex...

Coding index45.8

$0.55/1M

GLM 4.7 Flash

As a 30B-class SOTA model, GLM-4.7-Flash offers a new option that balances performance and efficiency. It is further optimized for agentic coding use cases, strengthening coding capabilities, long-horizon task planning,...

z-ai

$0.07/1M

Kilo Bench completion49.4%

GLM 5.1

GLM-5.1 delivers a major leap in coding capability, with particularly significant gains in handling long-horizon tasks. Unlike previous models built around minute-level interactions, GLM-5.1 can work independently and continuously on...

$1.38/1M$23.98/attempt