Reference architecture

Regulated LLM workflow

An LLM workflow for finance, health or the public sector - where audit trails, explainability, human sign-off and data retention are not optional.

01 Architecture

In regulated settings the question is not only “is it correct?” but “can you prove what happened and why?”. This blueprint makes governance a designed-in path: policy gates, full audit logging, human review and version pinning.

Request authenticated

Policy gate compliance rules

LLM (pinned) version-locked

Human review sign-off queue

Decision record explainable

Audit store retention policy

02 When to use it

Use this when

Decisions are high-stakes and auditable
A regulator or auditor may review them later
Human sign-off is required by policy

Reach for something else when

Low-risk, high-volume tasks where the overhead is not justified
You have no human-review capacity
Reproducibility and audit are genuinely not required

03 Components

What's in the box.

Policy / compliance gate

Blocks out-of-policy requests and enforces jurisdiction rules.

Version-pinned model

Locks model and prompt versions so decisions are reproducible.

Full audit logging

Captures inputs, retrieved context, outputs and decisions immutably.

Human review queue

Required sign-off for high-risk decisions before they take effect.

Explainability record

Stores the rationale and sources behind each decision.

Retention store

Enforces data-retention and deletion policy on all records.

04 Failure modes

Where it breaks - and the fix.

Missing or incomplete audit trail

Log immutably at every step; treat the audit path as a hard dependency.

Unexplainable decision

Record sources and rationale; pin versions so results are reproducible.

Unreviewed high-risk output

Hard approval gate; nothing high-risk takes effect without sign-off.

Data-retention breach

Policy-driven retention and deletion; access-controlled stores.

Silent model drift between versions

Pin versions; re-validate on any change via the eval set and change management.

05 Metrics to monitor

What good looks like, measured.

Audit completeness

Every step recorded immutably.
Human review SLA

Time from request to sign-off.
Explainability coverage

Decisions with a recorded rationale.
Version drift

Unintended model or prompt changes.
Retention compliance

Records kept and deleted per policy.

06 MVP vs production-grade

Don't build everything on day one.

Ship the MVP column to get to users; the production column is what makes it durable. Choose deliberately which gaps you're leaving.

Aspect	MVP	Production-grade
Audit	Logs	Immutable, access-controlled audit store
Review	Ad-hoc	Mandatory sign-off queue
Versioning	Latest model	Pinned model + prompt versions
Explainability	None	Rationale + sources per decision
Retention	Default	Policy-driven retention & deletion

07 Copy-paste schemas

Instrument it in minutes.

A starting point you can paste into your tracing and eval setup - then adapt to your stack.

Example trace schema

{
  "request_id": "req_9931",
  "architecture": "regulated-llm-workflow",
  "policy_check": "passed",
  "model": "sonnet-4.6",
  "prompt_version": "v12-locked",
  "human_review": "approved",
  "reviewer_id": "r_07",
  "decision_recorded": true,
  "retention_class": "7y",
  "audit_id": "aud_9931"
}

Example eval dataset row

{
  "input": "Summarize this loan application for a decision",
  "expected_behavior": "Summarize and cite sources; flag for human decision; never decide",
  "must_include": [
    "source citation",
    "human review required"
  ],
  "must_not_include": [
    "final approval decision",
    "unsupported claims"
  ],
  "risk_category": "regulated_decision"
}

08 Checklist

Ship-ready when…

Inputs, outputs and decisions are captured in an immutable audit trail
High-risk decisions require human sign-off
Every decision is explainable and reproducible
Model and prompt versions are pinned
Data retention and deletion policy is enforced
Changes follow a documented change-management process

Full production checklist → Score your maturity →

09 Related

Stack layers

Governance Security Evaluation Deployment

Deep dives

What Is Llmops →Llm Evaluation Testing Prompts Rag Agents →