Skip to main content

Auto Efficient vs Frontier Models

Auto Efficient delivers 71% of published frontier completion at 72% lower cost on KiloBench.

Headline numbers

Official KiloBench results for Auto Efficient and published frontier model evaluations.

Cost savings

72% cheaper

Auto Efficient vs published frontier average

Cost per attempt

$19.60 vs $70.40

Auto Efficient vs published frontier average

Completion rate

46.7% vs 65.6%

KiloBench

One-shot apps

Four one-shot app prompts, compared across Auto Efficient and frontier models.

App 01

Earth visualizer

Prompt: "Create an animation of the earth spinning in space"

Auto Efficient

Cost: $0.01

Claude Opus 4.8

Cost: $0.37

Claude Sonnet 4.6

Cost: $0.20

GPT-5.5

Cost: $0.82
App 02

Sports car visualizer

Prompt: "Create a 3D visualizer of a sportscar"

Auto Efficient

Cost: $0.17

Claude Opus 4.8

Cost: $0.35

Claude Sonnet 4.6

Cost: $0.58

GPT-5.5

Cost: $0.76
App 03

Blocks physics simulator

Prompt: "Create a physics simulator that allows you to drag and stack 3D blocks onto one another, and they tumble if they aren't balanced."

Auto Efficient

Cost: $0.05

Claude Opus 4.8

Cost: $1.31

Claude Sonnet 4.6

Cost: $0.34

GPT-5.5

Cost: $0.95
App 04

Basketball game

Prompt: "Create a game that lets you shoot basketballs into a hoop"

Auto Efficient

Cost: $0.09

Claude Opus 4.8

Cost: $0.29

Claude Sonnet 4.6

Cost: $0.32

GPT-5.5

Cost: $0.14

Head-to-head comparison

Official KiloBench metrics side by side.

Auto Efficient

efficient
Completion rate
46.7%
Cost per attempt
$19.60
Cost per task
$0.22
Tasks solved
208 / 445
nAttempts
5

Claude Opus 4.8

published
Completion rate
67.6%
Cost per attempt
$85.19
Cost per task
$0.97
Tasks solved
301 / 445
nAttempts
5

Claude Sonnet 4.6

published
Completion rate
55.1%
Cost per attempt
$53.37
Cost per task
$0.60
Tasks solved
245 / 445
nAttempts
5

GPT-5.5

published
Completion rate
74.2%
Cost per attempt
$72.63
Cost per task
$0.82
Tasks solved
330 / 445
nAttempts
5

Bottom line: Auto Efficient solves 208 / 445 tasks at $0.22 per task. Published frontier models average $70.40 per attempt, while Auto Efficient averages $19.60 per attempt.

Methodology

How to read this benchmark

Every published number comes from running the model through the same Kilo agent harness, not a generic scaffold. Costs include reasoning tokens, accumulated context re-sends, and all agent loop overhead.

Eval suite
KiloBench
Harness
Kilo's agent framework
nAttempts
5 per published model
Tasks
89 TB2 tasks × 5 = 445

What the metrics mean

  1. Completion % is the primary signal

    Higher is better. It measures the fraction of benchmark tasks the model completed end-to-end through Kilo's harness — not a synthetic scaffold.

  2. Cost per attempt reflects your real bill

    Sticker per-token pricing tells you almost nothing. These costs include reasoning tokens, cumulative context re-sends, and all agent loop overhead from the actual Kilo pipeline.

  3. Cost per task normalizes for completions

    Cost per attempt divided by completion rate. A model that is cheap but rarely completes tasks can cost more per solved task than one with a higher attempt price.

  4. Published results only

    This comparison only shows promoted KiloBench results. Models without an official result remain visible, but their cells stay marked pending instead of using estimates.

Why Auto Efficient

The case for cost-efficient routing when frontier spend is not always justified.

01

72% lower cost

Auto Efficient keeps routine work affordable while reserving expensive frontier models for tasks that actually need them.

02

Session-aware routing

Auto Efficient uses live session classification to route each request to the model that fits the work — not just the most expensive one available.

03

71% of frontier completion

For exploratory work, refactoring, documentation, and straightforward coding tasks, the lower-cost tradeoff is often the right one.

04

Best-fit-for-task

Auto Efficient is not a single frozen model — it routes to benchmark-proven models that match the session type, so quality tracks the work rather than the price tag.

Auto Efficient

Get frontier-level results at a fraction of the cost.

Let Kilo's Auto Efficient tier route your tasks intelligently. Session-aware routing picks the right model for the work — so you spend less without manually managing model selection.