← All articles
Deployment

LLMOps Checklist: From prototype to production

The narrative companion to the production-readiness checklist - the two questions that separate a demo from a system, walked across all eight layers of the LLMOps stack.

May 27, 2026 · updated June 8, 2026 · 7 min read · checklist · production

This is the narrative companion to the interactive Production Readiness Checklist. A demo proves the model can do the task; production asks a harder set of questions across every layer of the stack. This walks through them.

The two questions, per layer

Every layer is the gap between a prototype question and a production one. If you can answer the right-hand column for each, you’re ready - and where you can’t, you at least know what you’re choosing to ship thin.

LayerPrototype questionProduction question
Prompt managementIs the prompt good?Is it versioned, tested and rollbackable?
EvaluationCorrect in a demo?Does it pass evals on real cases?
ObservabilityDid it work when I tried it?Can I see why a request failed?
Cost controlCan we afford the model?Cost per user, feature and month?
RAG operationsDoes RAG work today?Right context retrieved, index fresh?
SecuritySafe when I test it?What’s the blast radius under attack?
GovernanceGood answer?Who approved it - can you prove it?
DeploymentRuns on my machine?Ship safely, undo in one step?

The non-negotiables

You do not need every box ticked to ship - but if you do only three things first:

  1. Log every request and response - without it you are blind.
  2. Build an eval set and gate changes on it - without it you cannot improve safely.
  3. Make rollback one step - without it every change is a gamble.

These three cut across the table above and unlock everything else: you can’t evaluate or attribute cost without logs, and you can’t ship confidently without a way back.

Scope it to your team

Not every team needs the same bar. The checklist ships four profiles

  • Minimum viable, Startup, Enterprise and Regulated - that filter the 50 checks to what fits your stage. A two-person team shipping its first feature and a bank with auditors are answering different versions of the same questions.

Measure where you stand

Take the Maturity Score for a weighted 0–100 read across the stack, then work the checklist at your profile to close the gaps - deliberately, one layer at a time.

Get the Production Checklist Explore the Stack