RAG operations

RAG freshness monitoring checklist

A focused checklist for keeping a RAG index fresh in production - detecting stale content, missing documents, embedding drift and re-index failures before users do.

June 16, 2026 · 6 min read · rag · monitoring · checklist

A RAG system degrades silently. The model keeps answering confidently while the index quietly falls behind reality - deleted docs still surface, new policies are missing, embeddings drift. Freshness monitoring is how you catch it before a customer does. Here’s the checklist.

Why freshness is its own problem

Retrieval quality asks “did we fetch the right context?” Freshness asks “is the context still true?” You can have perfect retrieval over a stale corpus and serve confidently outdated answers. The two need separate monitoring.

Source change edit · delete

Re-index scheduled job

Index lag + count check

Freshness gate age / score alert

Serve or suppress fail safe

The checklist below walks this pipeline left to right - pipeline health, content staleness, drift, and the outcome signals that tell you it’s working.

The freshness checklist

Indexing pipeline health

Re-index runs on a known schedule, and you alert when a run fails or is skipped.
You track indexing lag - time between a source document changing and the index reflecting it.
Document counts reconcile - index doc count matches the source within tolerance (catches silent drops).
Deletions propagate - removing a source document removes it from the index (orphaned content is a leak and a correctness bug).

Content staleness

Each chunk carries a last_updated timestamp you can surface and filter on.
You monitor the age distribution of retrieved chunks and alert if it skews old.
Time-sensitive sources (pricing, policy, inventory) have a tighter refresh SLA than static ones.

Retrieval & embedding drift

You watch the top similarity score distribution - a downward trend signals drift between query and corpus.
Empty / low-score retrievals are tracked and trigger a safe fallback instead of a guess.
Re-embedding after an embedding-model change is treated as a migration, re-evaluated against your eval set.

Outcome signals

grounded_answer and hallucination rate are watched - a rise often means freshness, not the model.
User signals (thumbs-down, “this is outdated”) are routed back as eval cases.

Instrument it with two numbers

If you do nothing else, track these in your traces:

{
  "index_lag_minutes": 22,
  "retrieved_chunk_max_age_days": 3,
  "top_score": 0.81,
  "retrieval_empty": false
}

index_lag_minutes tells you the pipeline is keeping up; retrieved_chunk_max_age tells you whether this answer leaned on stale content. Alert on both.

Make staleness fail safe

Freshness monitoring is only useful if a stale result degrades gracefully:

When retrieval is empty or low-score, say so or hand off - don’t fabricate.
When content is past its freshness SLA, flag or suppress it rather than serving it as current.

Wire this checklist into your observability and you turn a silent failure mode into an alert. For the full picture of operating retrieval, see RAGOps and the RAG chatbot reference architecture.

Get the Production Checklist → Explore the Stack →

RAG freshness monitoring checklist

Why freshness is its own problem#

The freshness checklist#

Indexing pipeline health#

Content staleness#

Retrieval & embedding drift#

Outcome signals#

Instrument it with two numbers#

Make staleness fail safe#

RAGOps: How to monitor retrieval quality

What your CTO should ask before approving an LLM launch

LLM Observability: What to monitor in production