Multi-provider model gateway
A gateway in front of multiple model providers - routing on cost, quality and latency, with caching, failover and one place to observe it all.
As soon as you depend on more than one model or provider, you need a control point. The gateway centralises routing, caching, rate limiting, key management and observability - and gives you provider failover instead of a single point of failure.
Use this when
- You depend on more than one model or provider
- You need failover or cost/quality/latency routing
- You want one place to observe and cap spend
Reach for something else when
- A single provider and model fully cover your needs
- The extra network hop’s latency is unacceptable
- You have no routing or fallback policy yet
What's in the box.
Gateway / proxy
Single entry point: caching, rate limiting and request normalisation.
Router
Chooses a provider/model per request on cost, quality or latency rules.
Provider adapters
Normalise each provider’s API behind one interface.
Fallback logic
Retries on another provider when one errors or times out.
Cache
Serves repeated requests without hitting a provider.
Key management
Centralises and rotates provider keys; never exposes them to clients.
Unified observability
One trace, cost and latency view across all providers.
Where it breaks - and the fix.
What good looks like, measured.
- Provider availabilityHealth that drives failover.
- Cache hit rateRequests served without a provider call.
- Cross-provider eval parityQuality consistency across providers.
- Cost per routeWhether routing is actually saving money.
- Fallback rateHow often the primary provider fails.
Don't build everything on day one.
Ship the MVP column to get to users; the production column is what makes it durable. Choose deliberately which gaps you're leaving.
| Aspect | MVP | Production-grade |
|---|---|---|
| Routing | Static default | Cost/quality/latency-aware |
| Failover | None | Cross-provider with health checks |
| Cache | None | Context-keyed with TTLs |
| Keys | In the app | Server-side, rotated |
| Observability | Per provider | Unified traces + cost |
Instrument it in minutes.
A starting point you can paste into your tracing and eval setup - then adapt to your stack.
{
"request_id": "req_5012",
"architecture": "multi-provider-gateway",
"route": "cost",
"provider_chosen": "anthropic",
"model": "haiku-4.5",
"fallback_used": false,
"cache_hit": true,
"input_tokens": 0,
"output_tokens": 0,
"latency_ms": 38,
"cost_usd": 0
} {
"input": "Summarize this ticket in one sentence",
"expected_behavior": "Produce a consistent one-sentence summary regardless of provider",
"must_include": [
"single sentence"
],
"must_not_include": [
"provider-specific formatting artifacts"
],
"risk_category": "cross_provider_consistency"
} Ship-ready when…
- Cross-provider failover is configured and tested
- The eval set runs per provider/model
- Caching keys on full context with sensible TTLs
- Provider keys stay server-side and are rotated
- Routing is budget-aware with per-team cost alerts
- Tracing and cost are unified across providers