← All articles
Governance

What your CTO should ask before approving an LLM launch

The questions a technical leader should ask before signing off on shipping an LLM feature to production - covering evals, observability, cost, security, rollback and governance.

June 13, 2026 · 7 min read · governance · leadership · checklist

A demo earns enthusiasm. A launch earns accountability. Before a CTO (or any technical leader) signs off on putting an LLM feature in front of real users, there’s a short set of questions that separates “it works in the demo” from “we can operate this.” None of them are about the model.

The questions, by layer

Evaluation - “How do we know it’s good enough?”

  • Do we have an eval set built from real cases, and what’s our current pass rate?
  • Do evals run on every prompt and model change, and do they block a bad one?
  • What’s our hallucination rate on the workflows where a wrong answer is expensive?

A confident answer here is the single strongest signal of readiness.

Observability - “When it breaks, can we see why?”

  • Do we log every request and response, and can we pull up exactly what the model saw for a given complaint in under a minute?
  • Are multi-step flows traced, not just logged?
  • Do we alert on quality, latency and error-rate regressions?

Cost - “What does this cost at scale?”

  • What’s our cost per request and per active user, and what happens to the bill if traffic 10בs?
  • Do we have budget alerts, caching and a plan to route easy traffic to cheaper models?

Security - “What’s the blast radius if it’s abused?”

  • Have we tested for prompt injection and jailbreaks?
  • Can it leak PII, or take actions it shouldn’t? Is tool access scoped to least privilege?
  • Are outputs validated before they trigger anything irreversible?

Reliability - “How fast can we undo a mistake?”

  • Can we roll back a prompt, model or config in one step, without a code redeploy?
  • Do we have provider/model fallback if our primary fails?
  • Has on-call practised the incident runbook?

Governance - “Can we stand behind it?”

  • For high-risk use cases, is there human review and an audit trail?
  • Who is accountable for this system, per layer?
  • If a regulator or customer asks “what happened and why,” can we answer?

How to read the answers

You’re not looking for every box ticked - you’re looking for deliberate decisions. “We don’t gate on evals yet, and here’s our plan to add it in two weeks” is a defensible answer. “We hadn’t thought about that” is the one that should pause a launch.

A useful framing for the conversation: which layers are we choosing to ship thin, and do we accept that risk on purpose?

Make it a five-minute ritual

Turn this into a lightweight launch gate:

  1. Run the team through the Maturity Score - a 0–100 read across the stack - a few days before launch.
  2. Walk the Production Checklist at your team’s profile (startup, enterprise or regulated) and export the result.
  3. In the sign-off meeting, review the gaps and decide each one explicitly.

That ritual costs almost nothing and routinely catches the one missing control that would have become an incident. For the systems most teams are launching, the reference architectures show exactly where each of these controls belongs.

Get the Production Checklist Explore the Stack