Introducing Routerly: One Gateway for Every AI Model

Carlo Satta 4 min read

After months of internal use, we are open-sourcing Routerly: a self-hosted LLM gateway that sits between your application and any AI provider. Change one environment variable. Nothing else in your code changes.

The problem

Every team using LLMs at some point runs into the same cluster of problems:

  • Which provider should serve this request? GPT-4o is expensive; gpt-4.1-nano is cheap but less capable. Claude Sonnet handles some tasks better. Gemini Flash is fast. Manually routing each request gets messy fast.
  • You are paying for tokens at a cloud provider’s prices with no visibility into exactly what each service or tenant is spending.
  • One provider goes down. Your application returns 500s until someone manually updates the config.
  • A new developer needs API access. You hand them a production key. Now you have a problem.

Routerly is designed to solve all four at once.

What Routerly does

Routerly is a drop-in OpenAI and Anthropic compatible proxy. You point base_url at it, use a project token as the API key, and every request goes through a routing engine before it reaches a provider.

The routing engine scores candidate models across up to 9 pluggable policies:

PolicyWhat it does
llmAsks a language model to pick the best candidate given the request context
cheapestMinimises cost per token
healthDeprioritises models with recent errors
performanceFavours lower average latency
capabilityMatches models to task requirements (vision, tools, JSON mode)
contextFilters by context window size relative to the prompt
budget-remainingExcludes models that would push a project over its budget
rate-limitSteers away from rate-limited providers
fairnessBalances load across candidates

Policies are combined into a final score. The top candidate wins. If it fails, Routerly falls back to the next one automatically.

Zero infrastructure

Most comparable tools require a database. LiteLLM needs SQLite or PostgreSQL. Routerly stores everything in plain JSON files under ~/.routerly/. No migrations, no schemas, no connection strings to manage.

~/.routerly/
├── config/
│   ├── settings.json
│   ├── models.json       # AES-256 encrypted API keys
│   ├── projects.json     # encrypted project tokens
│   ├── users.json
│   └── roles.json
└── data/
    └── usage.json

Multi-tenant from day one

Each project has its own Bearer token, its own model pool, its own routing configuration, and its own budget envelope. You can run dev, staging, and production traffic through the same gateway without mixing credentials or costs.

Supported providers

OpenAI, Anthropic, Google Gemini, Ollama (local), Mistral, Cohere, xAI (Grok), and any custom HTTP endpoint. Mix cloud and local models in the same project.

Native Anthropic support

Most gateways translate everything to the OpenAI format. If you use the Anthropic SDK directly, that creates subtle incompatibilities: top_k, extended thinking, and other Anthropic-specific parameters silently break. Routerly handles /v1/messages natively, without translation.

Quick start

# macOS / Linux
curl -fsSL https://www.routerly.ai/install.sh | bash

# Or with Docker
docker run -d \
  --name routerly \
  -p 3000:3000 \
  -v routerly_data:/data \
  -e ROUTERLY_HOME=/data \
  --restart unless-stopped \
  inebrio/routerly:latest

After install, open http://localhost:3000/dashboard to register your first model and create a project.

Open source, AGPL-3.0

Routerly is free to self-host under AGPL-3.0. Your prompts and API keys never leave your infrastructure. We plan to offer a commercial license for teams that cannot comply with AGPL copyleft terms.

The source is on GitHub at github.com/Inebrio/Routerly. Issues, PRs, and feedback are welcome.


Sources