Usage & Billing

The Kilo AI Gateway tracks usage and costs for every request with microdollar precision (1 USD = 1,000,000 microdollars). This enables accurate billing even for very low-cost requests.

How billing works

Every request to the gateway follows this flow:

  1. Balance check: Before proxying the request, the gateway verifies you have sufficient balance
  2. Request execution: The request is sent to the upstream provider
  3. Usage tracking: Token counts and costs are extracted from the response
  4. Balance update: Your balance is atomically updated with the request cost

Cost calculation

Costs are determined by the upstream provider's pricing based on token usage:

  • Input tokens: Tokens in your prompt (system message, user messages, tool definitions)
  • Output tokens: Tokens generated by the model
  • Cache write tokens: Tokens written to the provider's prompt cache
  • Cache hit tokens: Tokens served from the provider's prompt cache (typically discounted)

Free and BYOK requests

  • Free models: Models tagged with :free have zero cost -- usage is tracked but not billed
  • BYOK requests: When using your own API key, the cost is set to $0 on Kilo's side. You pay the provider directly based on your agreement with them

Balance management

Individual accounts

Your account balance is the difference between total credits purchased and total usage. Check your balance in the Kilo dashboard.

When your balance reaches zero, requests to paid models will return HTTP 402 with a link to add credits:

{
	"error": {
		"message": "Insufficient balance. Please add credits to continue.",
		"code": 402,
		"metadata": {
			"buyCreditsUrl": "https://app.kilo.ai/credits"
		}
	}
}

Organization accounts

Organizations have their own balance pool that members draw from. Organization billing supports:

  • Shared balance: All members use a common credit pool
  • Per-user daily limits: Cap individual member spending (e.g., $5/day per user)
  • Auto top-up: Automatically replenish credits when the balance drops below a threshold
  • Minimum balance alerts: Email notifications when the balance drops below a configured amount

Organization controls

Organizations can enforce policies on gateway usage for their members.

Model allow lists

Restrict which models organization members can use:

# Examples of allow list entries
anthropic/claude-sonnet-4.5      # Specific model
anthropic/*                       # All Anthropic models
openai/gpt-5.2                   # Specific OpenAI model

The allow list supports exact matches and wildcard patterns. Requests for models not on the list return HTTP 403.

Provider allow lists

Restrict which inference providers can be used for routing. This is passed to the upstream router and affects which backends serve the request.

Data collection controls

Organizations can set a data collection policy (allow or deny) that is applied to all requests from their members. Some free models require data collection to be allowed.

Per-user daily spending limits

Set a maximum daily spend per organization member. When a member reaches their daily limit, subsequent requests return a balance error. The daily limit resets at midnight UTC.

Rate limiting

Free model rate limits

All free model requests (both anonymous and authenticated) are rate-limited by IP address:

ScopeLimit
Free models per IP200 requests per hour

When rate-limited, you receive HTTP 429:

{
	"error": {
		"message": "Rate limit exceeded for free models. Please try again later.",
		"code": 429
	}
}

Paid model requests are not rate-limited by the gateway itself, but may be rate-limited by upstream providers. Organization per-user daily spending limits provide an additional layer of cost control.

Usage data

Usage data is tracked per request and includes:

FieldDescription
modelModel ID used
providerInference provider that served the request
input_tokensNumber of input/prompt tokens
output_tokensNumber of output/completion tokens
cache_write_tokensTokens written to cache
cache_hit_tokensTokens served from cache
cost_microdollarsCost in microdollars (1 USD = 1,000,000)
time_to_first_tokenLatency to first token (streaming only)
is_byokWhether a BYOK key was used

Token counting

Token counts are provided by the upstream model and are based on the model's native tokenizer. The gateway does not re-tokenize content. Usage data is available:

  • Non-streaming: In the usage field of the response body
  • Streaming: In the final SSE chunk before [DONE]