LLMOps
aka Large Language Model Operations
LLMOps (Large Language Model Operations) is the discipline of running LLM
applications in production: evaluation, observability, prompt management, cost
control, RAG operations, security, governance and deployment. It is the set of
practices that turns a working prototype into a system you can run, measure and
trust. See the LLMOps Stack.
Related: MLOps · Evals · LLM observability
Hallucination rate
aka Faithfulness
Hallucination rate measures how often the model states something unsupported
by the facts or the provided context. Its inverse, faithfulness, is a core eval
metric - especially for RAG, where the test is whether the answer is grounded in
what was retrieved. Track it as a first-class number, not an anecdote.
Related: Evals · RAG evaluation
Context window
Context window is the token budget for a single request - everything the
model can “see” at once: system prompt, history, retrieved context and the user
message. Larger windows enable more context but raise
token cost and can dilute attention, which is why RAG
retrieves the relevant slice rather than dumping everything in.
Related: Token cost · Embeddings
Agent tracing
Agent tracing records the full path of a request through an agent: each
model call, tool invocation, retrieval and decision, with inputs, outputs and
timing. It is what makes a multi-step agent debuggable - without it, a wrong
final answer gives no clue which step failed. A core part of
observability.
Related: LLM observability · Evals