Cost Tracking
What cost tracking monitors
The cost scorer tracks token consumption and spend across all inferences, providing budget alerts before you hit limits.
| Metric | Description |
|---|---|
| Input tokens | Tokens in the prompt (messages + system) |
| Output tokens | Tokens in the model response |
| Total tokens | Input + output |
| Estimated cost | Spend calculated from provider pricing tables |
| Budget burn rate | Projected hourly/daily spend at current rate |
Token pricing
The Probe includes a built-in pricing table for common providers. Costs are calculated per inference and accumulated in the GOVERN platform.
| Provider | Model | Input (per 1M) | Output (per 1M) |
|---|---|---|---|
| Anthropic | Claude Sonnet 4 | $3.00 | $15.00 |
| Anthropic | Claude Haiku 4.5 | $0.80 | $4.00 |
| OpenAI | GPT-4o | $2.50 | $10.00 |
| OpenAI | GPT-4o mini | $0.15 | $0.60 |
| Groq | Llama 3.1 70B | $0.59 | $0.79 |
Pricing is updated monthly. Current table: GET /api/govern/probe/policy-sync returns the latest pricing.
Budget configuration
Set token and spend budgets at the hourly level:
scoring: cost: enabled: true budget_tokens_per_hour: 1000000 # 1M tokens/hour budget_spend_per_hour_usd: 15.00 # $15/hour alert_at_percent: 0.80 # Alert at 80% of budget block_at_percent: 1.00 # Block at 100% (optional)When block_at_percent is set to 1.00 and scoring.mode is block, new inferences are rejected once the hourly budget is exhausted. The budget resets at the top of each hour.
Cost score interpretation
The cost score is a budget utilization ratio, not a quality score:
| Score | Meaning |
|---|---|
| 0.00 – 0.50 | Under 50% of hourly budget used |
| 0.50 – 0.80 | 50–80% of budget used (normal) |
| 0.80 – 1.00 | 80–100% of budget used (alert threshold) |
| > 1.00 | Budget exceeded |
Per-model breakdown
Cost data in the GOVERN dashboard is broken down by model, enabling you to see which models are driving spend:
claude-sonnet-4: $12.40 (68%)claude-haiku-4.5: $4.20 (23%)gpt-4o: $1.60 (9%)Anomaly detection
The cost scorer also detects unusual spending patterns:
- Spike detection — a single inference using 10x the normal token count
- Model substitution — sudden shift to a more expensive model
- Runaway loops — very high inference frequency in a short window
These are reported as cost.anomaly events in the GOVERN platform.
Environment variable configuration
SCORING_COST_ENABLED=trueSCORING_COST_BUDGET_TOKENS=1000000SCORING_COST_BUDGET_USD=15.00SCORING_COST_ALERT_PERCENT=0.80