- benchmarks routing cost-optimization
We ran 200 questions per model. Here is what we found.
Routerly routing policies matched Claude Sonnet 4.6 accuracy on MMLU and HumanEval while cutting costs by up to 69%.
Carlo Satta5 min read - benchmarks routing cost-optimization
LLM routing policies work: what three benchmarks confirm
Three benchmarks validate LLM-based routing policies. Cost savings are confirmed on all tasks; the right success metric depends on the use case.
Carlo Satta7 min read - benchmarks routing performance
Measuring Routerly: MMLU, HumanEval, and BIRD Benchmarks
We published routerly-benchmark, an open suite that measures the impact of intelligent routing on quality, cost, and latency across three standard AI evaluation tasks. Here is how it works and what we found.
Carlo Satta4 min read