← All reference architectures
Reference architecture

Multi-provider model gateway

A gateway in front of multiple model providers - routing on cost, quality and latency, with caching, failover and one place to observe it all.

01 Architecture

As soon as you depend on more than one model or provider, you need a control point. The gateway centralises routing, caching, rate limiting, key management and observability - and gives you provider failover instead of a single point of failure.

02 When to use it

Use this when

  • You depend on more than one model or provider
  • You need failover or cost/quality/latency routing
  • You want one place to observe and cap spend

Reach for something else when

  • A single provider and model fully cover your needs
  • The extra network hop’s latency is unacceptable
  • You have no routing or fallback policy yet
03 Components

What's in the box.

Gateway / proxy

Single entry point: caching, rate limiting and request normalisation.

Router

Chooses a provider/model per request on cost, quality or latency rules.

Provider adapters

Normalise each provider’s API behind one interface.

Fallback logic

Retries on another provider when one errors or times out.

Cache

Serves repeated requests without hitting a provider.

Key management

Centralises and rotates provider keys; never exposes them to clients.

Unified observability

One trace, cost and latency view across all providers.

04 Failure modes

Where it breaks - and the fix.

Provider outage with no fallback
Configure cross-provider failover and health checks.
Inconsistent outputs across providers
Run the eval set per provider/model; pin defaults per use case.
Cache returning wrong/stale results
Key cache on full request context; set sensible TTLs; bypass for volatile inputs.
Key leakage
Keep keys server-side in the gateway; rotate regularly.
Cost blowout from misrouting
Budget-aware routing and per-team alerts; cap expensive models.
05 Metrics to monitor

What good looks like, measured.

  • Provider availability
    Health that drives failover.
  • Cache hit rate
    Requests served without a provider call.
  • Cross-provider eval parity
    Quality consistency across providers.
  • Cost per route
    Whether routing is actually saving money.
  • Fallback rate
    How often the primary provider fails.
06 MVP vs production-grade

Don't build everything on day one.

Ship the MVP column to get to users; the production column is what makes it durable. Choose deliberately which gaps you're leaving.

Aspect MVP Production-grade
Routing Static default Cost/quality/latency-aware
Failover None Cross-provider with health checks
Cache None Context-keyed with TTLs
Keys In the app Server-side, rotated
Observability Per provider Unified traces + cost
07 Copy-paste schemas

Instrument it in minutes.

A starting point you can paste into your tracing and eval setup - then adapt to your stack.

Example trace schema
{
  "request_id": "req_5012",
  "architecture": "multi-provider-gateway",
  "route": "cost",
  "provider_chosen": "anthropic",
  "model": "haiku-4.5",
  "fallback_used": false,
  "cache_hit": true,
  "input_tokens": 0,
  "output_tokens": 0,
  "latency_ms": 38,
  "cost_usd": 0
}
Example eval dataset row
{
  "input": "Summarize this ticket in one sentence",
  "expected_behavior": "Produce a consistent one-sentence summary regardless of provider",
  "must_include": [
    "single sentence"
  ],
  "must_not_include": [
    "provider-specific formatting artifacts"
  ],
  "risk_category": "cross_provider_consistency"
}
08 Checklist

Ship-ready when…

  • Cross-provider failover is configured and tested
  • The eval set runs per provider/model
  • Caching keys on full context with sensible TTLs
  • Provider keys stay server-side and are rotated
  • Routing is budget-aware with per-team cost alerts
  • Tracing and cost are unified across providers
Full production checklist Score your maturity
09 Related
Stack layers
Deep dives