Writing

Articles & Guides

Deep dives on each layer of running language models in production - from first principles to the operational details.

What is LLMOps?

LLMOps is the discipline of deploying, monitoring and improving large language model applications after the prototype works. A practical guide to what it covers, why it matters, and where to start.

Jun 5, 2026 · Read → Evaluation

LLMOps vs MLOps: What changes with large language models?

LLMOps builds on MLOps but adds prompts-as-code, non-determinism, LLM-judged evaluation, prompt-injection security and live token budgets. A practical guide to what actually changes - and what carries over.

Jun 4, 2026 · Read → Deployment

The LLMOps Stack: The 8 layers of production LLM systems

Prompt management, evaluation, observability, cost control, RAG operations, security, governance and deployment - a deep dive into the eight layers between an LLM prototype and production, with failure modes and a checklist for each.

Jun 3, 2026 · Read → Observability

LLM Observability: What to monitor in production

A production guide to LLM observability - the signals that matter, how to instrument with OpenTelemetry or Langfuse, what breaks without it, and a minimal-vs-mature path.

Jun 2, 2026 · Read → Evaluation

LLM Evaluation: How to test prompts, RAG and agents

A production-grade guide to LLM evaluation - what breaks without it, what to measure, how to write an LLM-as-judge and an eval runner, and how to gate releases the way you gate code on tests.

Jun 1, 2026 · Read → Prompt management

Prompt Versioning: Why prompts should be treated like code

A one-line prompt edit can move behaviour as much as a model swap. A production guide to treating prompts as versioned artifacts - change-tested, attributable and rollbackable in one step.

May 31, 2026 · Read → RAG operations

RAGOps: How to monitor retrieval quality

Most "the model is wrong" bugs are retrieval bugs. A production guide to measuring retrieval quality, tuning chunking and embeddings, keeping the index fresh, and failing safe when nothing matches.

May 30, 2026 · Read → Cost control

LLM Cost Control: Tokens, caching and model routing

Token spend compounds quietly until a finance review forces a panic. A production guide to unit economics, caching, model routing, right-sizing and budget alerts - before the bill, not after.

May 29, 2026 · Read → Security

LLM Security: Prompt injection, data leakage and guardrails

An LLM with tools and data access is an attack surface. A practical guide to prompt injection, data leakage and excessive agency - with guardrail patterns and the controls that contain them.

May 28, 2026 · Read → Governance

LLM Governance: Audit trails, approvals and explainability

A practical guide to governing LLM applications - audit trails, human-in-the-loop approval gates, explainability and reproducibility, and data retention you could show an auditor.

Jun 8, 2026 · Read → Deployment

LLM Deployment: CI/CD, staging and one-step rollback

How to ship LLM changes safely - CI with eval gates, a staging mirror, progressive rollout, config-driven rollback and provider fallback. The deployment layer of the LLMOps stack.

Jun 8, 2026 · Read → Deployment

LLMOps Checklist: From prototype to production

The narrative companion to the production-readiness checklist - the two questions that separate a demo from a system, walked across all eight layers of the LLMOps stack.

May 27, 2026 · Read → Evaluation

How to build your first eval dataset

A practical, step-by-step guide to building an LLM eval dataset from real traffic - what a row looks like, how to score it, how many cases you need, and how to wire it into CI.

Jun 20, 2026 · Read → Observability

What to log in an LLM trace

A field-by-field guide to what belongs in a production LLM trace - request IDs, prompt versions, retrieval, tokens, latency, cost and outcome - plus what to redact.

Jun 19, 2026 · Read → Evaluation

How to calculate hallucination rate

A practical method for measuring LLM hallucination (faithfulness) rate in production - how to define it, sample it, judge it, and track it over time.

Jun 18, 2026 · Read → Prompt management

Prompt versioning with GitHub

A concrete workflow for versioning LLM prompts in GitHub - file layout, pull-request review, eval gating in CI, and one-step rollback - without buying a dedicated tool.

Jun 17, 2026 · Read → RAG operations

RAG freshness monitoring checklist

A focused checklist for keeping a RAG index fresh in production - detecting stale content, missing documents, embedding drift and re-index failures before users do.

Jun 16, 2026 · Read → Governance

LLM incident response template

A ready-to-adapt incident response template for LLM applications - severity levels, the first 15 minutes, mitigation levers unique to LLMs, and a post-mortem structure.

Jun 15, 2026 · Read → Observability

How to choose between Langfuse, LangSmith, Braintrust and Helicone

A decision framework for picking an LLM observability and evals platform - how Langfuse, LangSmith, Braintrust and Helicone differ, and which fits your team.

Jun 14, 2026 · Read → Governance

What your CTO should ask before approving an LLM launch

The questions a technical leader should ask before signing off on shipping an LLM feature to production - covering evals, observability, cost, security, rollback and governance.

Jun 13, 2026 · Read →