← All articles
RAG operations

RAG freshness monitoring checklist

A focused checklist for keeping a RAG index fresh in production - detecting stale content, missing documents, embedding drift and re-index failures before users do.

June 16, 2026 · 6 min read · rag · monitoring · checklist

A RAG system degrades silently. The model keeps answering confidently while the index quietly falls behind reality - deleted docs still surface, new policies are missing, embeddings drift. Freshness monitoring is how you catch it before a customer does. Here’s the checklist.

Why freshness is its own problem

Retrieval quality asks “did we fetch the right context?” Freshness asks “is the context still true?” You can have perfect retrieval over a stale corpus and serve confidently outdated answers. The two need separate monitoring.

The checklist below walks this pipeline left to right - pipeline health, content staleness, drift, and the outcome signals that tell you it’s working.

The freshness checklist

Indexing pipeline health

  • Re-index runs on a known schedule, and you alert when a run fails or is skipped.
  • You track indexing lag - time between a source document changing and the index reflecting it.
  • Document counts reconcile - index doc count matches the source within tolerance (catches silent drops).
  • Deletions propagate - removing a source document removes it from the index (orphaned content is a leak and a correctness bug).

Content staleness

  • Each chunk carries a last_updated timestamp you can surface and filter on.
  • You monitor the age distribution of retrieved chunks and alert if it skews old.
  • Time-sensitive sources (pricing, policy, inventory) have a tighter refresh SLA than static ones.

Retrieval & embedding drift

  • You watch the top similarity score distribution - a downward trend signals drift between query and corpus.
  • Empty / low-score retrievals are tracked and trigger a safe fallback instead of a guess.
  • Re-embedding after an embedding-model change is treated as a migration, re-evaluated against your eval set.

Outcome signals

  • grounded_answer and hallucination rate are watched - a rise often means freshness, not the model.
  • User signals (thumbs-down, “this is outdated”) are routed back as eval cases.

Instrument it with two numbers

If you do nothing else, track these in your traces:

{
  "index_lag_minutes": 22,
  "retrieved_chunk_max_age_days": 3,
  "top_score": 0.81,
  "retrieval_empty": false
}

index_lag_minutes tells you the pipeline is keeping up; retrieved_chunk_max_age tells you whether this answer leaned on stale content. Alert on both.

Make staleness fail safe

Freshness monitoring is only useful if a stale result degrades gracefully:

  • When retrieval is empty or low-score, say so or hand off - don’t fabricate.
  • When content is past its freshness SLA, flag or suppress it rather than serving it as current.

Wire this checklist into your observability and you turn a silent failure mode into an alert. For the full picture of operating retrieval, see RAGOps and the RAG chatbot reference architecture.

Get the Production Checklist Explore the Stack