Introducing Routerly: One Gateway for Every AI Model
After months of internal use, we are open-sourcing Routerly: a self-hosted LLM gateway that sits between your application and any AI provider. Change one environment variable. Nothing else in your code changes.
The problem
Every team using LLMs at some point runs into the same cluster of problems:
- Which provider should serve this request? GPT-4o is expensive; gpt-4.1-nano is cheap but less capable. Claude Sonnet handles some tasks better. Gemini Flash is fast. Manually routing each request gets messy fast.
- You are paying for tokens at a cloud provider’s prices with no visibility into exactly what each service or tenant is spending.
- One provider goes down. Your application returns 500s until someone manually updates the config.
- A new developer needs API access. You hand them a production key. Now you have a problem.
Routerly is designed to solve all four at once.
What Routerly does
Routerly is a drop-in OpenAI and Anthropic compatible proxy. You point base_url at it, use a project token as the API key, and every request goes through a routing engine before it reaches a provider.
The routing engine scores candidate models across up to 9 pluggable policies:
| Policy | What it does |
|---|---|
llm | Asks a language model to pick the best candidate given the request context |
cheapest | Minimises cost per token |
health | Deprioritises models with recent errors |
performance | Favours lower average latency |
capability | Matches models to task requirements (vision, tools, JSON mode) |
context | Filters by context window size relative to the prompt |
budget-remaining | Excludes models that would push a project over its budget |
rate-limit | Steers away from rate-limited providers |
fairness | Balances load across candidates |
Policies are combined into a final score. The top candidate wins. If it fails, Routerly falls back to the next one automatically.
Zero infrastructure
Most comparable tools require a database. LiteLLM needs SQLite or PostgreSQL. Routerly stores everything in plain JSON files under ~/.routerly/. No migrations, no schemas, no connection strings to manage.
~/.routerly/
├── config/
│ ├── settings.json
│ ├── models.json # AES-256 encrypted API keys
│ ├── projects.json # encrypted project tokens
│ ├── users.json
│ └── roles.json
└── data/
└── usage.json
Multi-tenant from day one
Each project has its own Bearer token, its own model pool, its own routing configuration, and its own budget envelope. You can run dev, staging, and production traffic through the same gateway without mixing credentials or costs.
Supported providers
OpenAI, Anthropic, Google Gemini, Ollama (local), Mistral, Cohere, xAI (Grok), and any custom HTTP endpoint. Mix cloud and local models in the same project.
Native Anthropic support
Most gateways translate everything to the OpenAI format. If you use the Anthropic SDK directly, that creates subtle incompatibilities: top_k, extended thinking, and other Anthropic-specific parameters silently break. Routerly handles /v1/messages natively, without translation.
Quick start
# macOS / Linux
curl -fsSL https://www.routerly.ai/install.sh | bash
# Or with Docker
docker run -d \
--name routerly \
-p 3000:3000 \
-v routerly_data:/data \
-e ROUTERLY_HOME=/data \
--restart unless-stopped \
inebrio/routerly:latest
After install, open http://localhost:3000/dashboard to register your first model and create a project.
Open source, AGPL-3.0
Routerly is free to self-host under AGPL-3.0. Your prompts and API keys never leave your infrastructure. We plan to offer a commercial license for teams that cannot comply with AGPL copyleft terms.
The source is on GitHub at github.com/Inebrio/Routerly. Issues, PRs, and feedback are welcome.
Sources
- Routerly GitHub repository: github.com/Inebrio/Routerly
- Documentation: docs.routerly.ai