LLMOps Checklist: From prototype to production
The narrative companion to the production-readiness checklist - the two questions that separate a demo from a system, walked across all eight layers of the LLMOps stack.
This is the narrative companion to the interactive Production Readiness Checklist. A demo proves the model can do the task; production asks a harder set of questions across every layer of the stack. This walks through them.
The two questions, per layer
Every layer is the gap between a prototype question and a production one. If you can answer the right-hand column for each, you’re ready - and where you can’t, you at least know what you’re choosing to ship thin.
| Layer | Prototype question | Production question |
|---|---|---|
| Prompt management | Is the prompt good? | Is it versioned, tested and rollbackable? |
| Evaluation | Correct in a demo? | Does it pass evals on real cases? |
| Observability | Did it work when I tried it? | Can I see why a request failed? |
| Cost control | Can we afford the model? | Cost per user, feature and month? |
| RAG operations | Does RAG work today? | Right context retrieved, index fresh? |
| Security | Safe when I test it? | What’s the blast radius under attack? |
| Governance | Good answer? | Who approved it - can you prove it? |
| Deployment | Runs on my machine? | Ship safely, undo in one step? |
The non-negotiables
You do not need every box ticked to ship - but if you do only three things first:
- Log every request and response - without it you are blind.
- Build an eval set and gate changes on it - without it you cannot improve safely.
- Make rollback one step - without it every change is a gamble.
These three cut across the table above and unlock everything else: you can’t evaluate or attribute cost without logs, and you can’t ship confidently without a way back.
Scope it to your team
Not every team needs the same bar. The checklist ships four profiles
- Minimum viable, Startup, Enterprise and Regulated - that filter the 50 checks to what fits your stage. A two-person team shipping its first feature and a bank with auditors are answering different versions of the same questions.
Measure where you stand
Take the Maturity Score for a weighted 0–100 read across the stack, then work the checklist at your profile to close the gaps - deliberately, one layer at a time.