Measuring Routerly: MMLU, HumanEval, and BIRD Benchmarks
We published routerly-benchmark, an open suite that measures the impact of intelligent routing on quality, cost, and latency across three standard AI evaluation tasks. Here is how it works and what we found.
Carlo Satta 4 min read