Auto Efficient vs Frontier Models

Auto Efficient delivers 71% of published frontier completion at 72% lower cost on KiloBench.

Try Auto Efficient

About Auto Model

AnthropicOpenAIGooglexAIDeepSeekQwenMoonshotMistralMiniMaxTencent HunyuanZ.aiInclusionAIKwaipilotPoolsideStepFunNVIDIA

Headline numbers

Official KiloBench results for Auto Efficient and published frontier model evaluations.

Cost savings

72% cheaper

Auto Efficient vs published frontier average

Cost per attempt

$19.60 vs $70.40

Auto Efficient vs published frontier average

Completion rate

46.7% vs 65.6%

KiloBench

One-shot apps

Four one-shot app prompts, compared across Auto Efficient and frontier models.

App 01

Earth visualizer

Prompt: "Create an animation of the earth spinning in space"

Auto Efficient

Cost: $0.01

Claude Opus 4.8

Cost: $0.37

Claude Sonnet 4.6

Cost: $0.20

GPT-5.5

Cost: $0.82

App 02

Sports car visualizer

Prompt: "Create a 3D visualizer of a sportscar"

Auto Efficient

Cost: $0.17

Claude Opus 4.8

Cost: $0.35

Claude Sonnet 4.6

Cost: $0.58

GPT-5.5

Cost: $0.76

App 03

Blocks physics simulator

Prompt: "Create a physics simulator that allows you to drag and stack 3D blocks onto one another, and they tumble if they aren't balanced."

Auto Efficient

Cost: $0.05

Claude Opus 4.8

Cost: $1.31

Claude Sonnet 4.6

Cost: $0.34

GPT-5.5

Cost: $0.95

App 04

Basketball game

Prompt: "Create a game that lets you shoot basketballs into a hoop"

Auto Efficient

Cost: $0.09

Claude Opus 4.8

Cost: $0.29

Claude Sonnet 4.6

Cost: $0.32

GPT-5.5

Cost: $0.14

Head-to-head comparison

Official KiloBench metrics side by side.

Auto Efficient

efficient

Completion rate: 46.7%
Cost per attempt: $19.60
Cost per task: $0.22
Tasks solved: 208 / 445
nAttempts: 5

Claude Opus 4.8

published

Completion rate: 67.6%
Cost per attempt: $85.19
Cost per task: $0.97
Tasks solved: 301 / 445
nAttempts: 5

Claude Sonnet 4.6

published

Completion rate: 55.1%
Cost per attempt: $53.37
Cost per task: $0.60
Tasks solved: 245 / 445
nAttempts: 5

GPT-5.5

published

Completion rate: 74.2%
Cost per attempt: $72.63
Cost per task: $0.82
Tasks solved: 330 / 445
nAttempts: 5

Bottom line: Auto Efficient solves 208 / 445 tasks at $0.22 per task. Published frontier models average $70.40 per attempt, while Auto Efficient averages $19.60 per attempt.

Methodology

How to read this benchmark

Every published number comes from running the model through the same Kilo agent harness, not a generic scaffold. Costs include reasoning tokens, accumulated context re-sends, and all agent loop overhead.

Eval suite: KiloBench
Harness: Kilo's agent framework
nAttempts: 5 per published model
Tasks: 89 TB2 tasks × 5 = 445

What the metrics mean

Completion % is the primary signal
Higher is better. It measures the fraction of benchmark tasks the model completed end-to-end through Kilo's harness — not a synthetic scaffold.
Cost per attempt reflects your real bill
Sticker per-token pricing tells you almost nothing. These costs include reasoning tokens, cumulative context re-sends, and all agent loop overhead from the actual Kilo pipeline.
Cost per task normalizes for completions
Cost per attempt divided by completion rate. A model that is cheap but rarely completes tasks can cost more per solved task than one with a higher attempt price.
Published results only
This comparison only shows promoted KiloBench results. Models without an official result remain visible, but their cells stay marked pending instead of using estimates.

Why Auto Efficient

The case for cost-efficient routing when frontier spend is not always justified.

01

72% lower cost

Auto Efficient keeps routine work affordable while reserving expensive frontier models for tasks that actually need them.

02

Session-aware routing

Auto Efficient uses live session classification to route each request to the model that fits the work — not just the most expensive one available.

03

71% of frontier completion

For exploratory work, refactoring, documentation, and straightforward coding tasks, the lower-cost tradeoff is often the right one.

04

Best-fit-for-task

Auto Efficient is not a single frozen model — it routes to benchmark-proven models that match the session type, so quality tracks the work rather than the price tag.

Auto Efficient

Get frontier-level results at a fraction of the cost.

Let Kilo's Auto Efficient tier route your tasks intelligently. Session-aware routing picks the right model for the work — so you spend less without manually managing model selection.

Start with Auto Efficient

See full KiloBench results

Auto Efficient vs Frontier Models

Headline numbers

One-shot apps

Earth visualizer

Auto Efficient

Claude Opus 4.8

Claude Sonnet 4.6

GPT-5.5

Sports car visualizer

Auto Efficient

Claude Opus 4.8

Claude Sonnet 4.6

GPT-5.5

Blocks physics simulator

Auto Efficient

Claude Opus 4.8

Claude Sonnet 4.6

GPT-5.5

Basketball game

Auto Efficient

Claude Opus 4.8

Claude Sonnet 4.6

GPT-5.5

Head-to-head comparison

Auto Efficient

Claude Opus 4.8

Claude Sonnet 4.6

GPT-5.5

How to read this benchmark

What the metrics mean

Completion % is the primary signal

Cost per attempt reflects your real bill

Cost per task normalizes for completions

Published results only

Why Auto Efficient

72% lower cost

Session-aware routing

71% of frontier completion

Best-fit-for-task

Get frontier-level results at a fraction of the cost.

Related

Auto Model routing tiers

Full KiloBench results

Live model leaderboard

Kilo Pass