← All reference architectures
Reference architecture

Customer support agent

An agent that resolves support requests with real tools - order lookups, refunds, account changes - behind approval gates and a human handoff.

01 Architecture

Once an LLM can take actions, a wrong answer becomes a wrong action. The architecture earns its keep through scoped tools, output validation before any side effect, and a clean escalation path to a human.

02 When to use it

Use this when

  • Requests map to a few well-defined actions
  • You can scope tools tightly and validate their I/O
  • A human can take over when confidence is low

Reach for something else when

  • Actions are irreversible and high-value with no review path
  • You cannot validate tool inputs and outputs
  • A strict, tiny latency budget rules out multi-step planning
03 Components

What's in the box.

Channel adapter

Normalises chat, email or widget into a single request format.

Input guardrails

Catch prompt injection before the agent plans any action.

Agent / LLM

Plans steps and selects tools; fully traced per step.

Scoped tool layer

Least-privilege tools (lookup, refund) with strict input/output schemas.

Approval gate

Requires human or policy approval for high-risk actions (refunds, account changes).

Human escalation

Hands off to an agent when confidence is low or policy requires.

Feedback + evals

Resolved tickets and CSAT feed the eval set.

04 Failure modes

Where it breaks - and the fix.

Wrong tool action (e.g. erroneous refund)
Validate tool outputs and require approval for high-risk actions.
Injection that triggers actions
Treat customer input as untrusted; separate planning from execution; test jailbreaks.
Hallucinated policy or entitlements
Ground answers in retrieved policy; never let the model invent rules.
Loops or runaway tool calls
Step and cost caps per conversation; circuit-breakers.
05 Metrics to monitor

What good looks like, measured.

  • Action precision
    Right tool with the right arguments.
  • Approval / escalation rate
    How often a human has to step in.
  • Injection block rate
    Hostile inputs caught before action.
  • Resolution rate & CSAT
    Did it actually solve the issue?
  • Steps & cost per conversation
    Loop and spend control.
06 MVP vs production-grade

Don't build everything on day one.

Ship the MVP column to get to users; the production column is what makes it durable. Choose deliberately which gaps you're leaving.

Aspect MVP Production-grade
Tools Read-only lookups Scoped actions with schemas + validation
High-risk actions Blocked entirely Explicit approval gate
Safety Basic content filter Injection tests + output validation
Handoff None Human escalation path
Limits None Step + cost caps, circuit breakers
07 Copy-paste schemas

Instrument it in minutes.

A starting point you can paste into your tracing and eval setup - then adapt to your stack.

Example trace schema
{
  "request_id": "req_77",
  "architecture": "customer-support-agent",
  "conversation_id": "c_55",
  "tools_called": [
    "lookup_order",
    "issue_refund"
  ],
  "approval_required": true,
  "approval_granted": false,
  "escalated_to_human": true,
  "steps": 4,
  "output_tokens": 180,
  "latency_ms": 2600,
  "cost_usd": 0.0091
}
Example eval dataset row
{
  "input": "Can I get a refund after 45 days?",
  "expected_behavior": "Answer using refund policy only",
  "must_include": [
    "policy window",
    "support escalation"
  ],
  "must_not_include": [
    "invented exceptions"
  ],
  "risk_category": "customer_support_policy"
}
08 Checklist

Ship-ready when…

  • Outputs are validated before they trigger any action
  • High-risk actions require an explicit approval step
  • Prompt injection and jailbreaks are tested
  • There is a clear human escalation path
  • Per-conversation step and cost caps are enforced
  • Real tickets feed the eval set
Full production checklist Score your maturity
09 Related
Stack layers
Deep dives