Lead magnet · cost-free

The LLMOps Production Readiness Checklist

50 things to verify before you put an LLM in front of real users. Switch to your team's profile - from minimum viable to regulated - tick what's already true (saved in this browser), and export the result to Markdown, your team inbox or PDF.

Download the PDF ↓ Get a maturity score instead →

Production readiness 0 / 0 ready

Prompt management

We can roll back a prompt, model or config in one step

Every prompt and config change is versioned

Prompt changes are tested against an eval set before production

System prompts are stored outside application code

We can attribute a quality change to a specific prompt version

Prompt templates are reviewed like code (PR + approval)

Evaluation

We have an eval dataset that reflects real traffic

We measure accuracy or task success, not just vibes

Evals run automatically on every prompt or model change

We track hallucination / faithfulness rate

We test safety and refusal behaviour

Releases are gated on eval results

Eval failures page or block the deploy

Observability

We log every request and response

We capture full traces for multi-step chains and agents

We measure p50 / p95 latency in production

We track token usage per request and per feature

We alert on error-rate and latency regressions

We can search and inspect individual production traces

We can replay a production request to reproduce a failure

Cost control

We have budget alerts on token spend

We know our cost per request and per active user

Prompt / context caching is enabled where it helps

We cap max tokens and context size per request

Easy traffic is routed to smaller, cheaper models

Spend is attributable to a team, feature or customer

RAG operations

We measure retrieval quality, not just final answers

We have a fallback when retrieval returns nothing relevant

Chunking and embedding choices are evaluated, not assumed

The index is refreshed on a known schedule

We detect and handle stale or missing context

We monitor embedding / data drift over time

Security

We treat all user and retrieved input as untrusted

We have guardrails for PII and data leakage

Secrets and keys are never exposed to the model context

We test for prompt injection and jailbreaks

Tool and data access is scoped with least privilege

Outputs are validated before they trigger actions

Governance

A human reviews critical or high-risk use cases

We have an incident process for model failures

High-risk actions require an explicit approval step

We keep an audit trail of inputs, outputs and decisions

We can explain and reproduce a given production decision

Data retention and privacy policy is defined and enforced

Deployment

We can revert a release without a code redeploy

LLM changes go through CI before production

We have a staging environment that mirrors production

On-call knows how to triage an LLM incident

Changes roll out progressively (canary / percentage)

Model and provider fallbacks are configured

Checklist v2026.1. Progress is stored locally in your browser - nothing is uploaded. Profiles scope the list to your team; export mirrors exactly what you see.