Prompt versioning with GitHub
A concrete workflow for versioning LLM prompts in GitHub - file layout, pull-request review, eval gating in CI, and one-step rollback - without buying a dedicated tool.
You don’t need a prompt-management platform to start treating prompts like code. You already have the tool: Git. Here’s a workflow that gives you versioning, review, testing and rollback using GitHub alone.
Get prompts out of code
First move: pull prompts out of inline strings and into versioned files.
prompts/
rag-answer/
v4.md # the current prompt
metadata.yaml # model, params, owner, changelog
support-triage/
v2.md
evals/
rag-answer.jsonl # the eval dataset for this prompt
Plain Markdown files for the prompt body, a small metadata.yaml for model and
parameters. Now every change has a diff, an author and a history - the three things
you lose when prompts live as string literals.
Version with intent, not just commits
Two complementary layers:
- Git history gives you a commit per change automatically.
- A semantic version in the filename or frontmatter (
v4) marks meaningful releases you reference from code and traces. Log that exact version on every request (prompt_version: "rag-answer-v4") so a trace can be tied back to the prompt that produced it.
Review prompts like code
Make prompt changes go through a pull request:
- A diff shows exactly what changed - invaluable when quality shifts.
- A reviewer (often a domain expert, not just an engineer) approves.
- The PR description records why, which your future self will thank you for.
This single habit prevents the most common cause of mystery regressions: someone quietly editing the live prompt.
Gate the merge on evals
This is where it becomes real LLMOps. Wire your eval dataset into CI so it runs on every prompt PR:
# .github/workflows/prompt-eval.yml (sketch)
on:
pull_request:
paths: ['prompts/**', 'evals/**']
jobs:
eval:
steps:
- run: python run_evals.py --prompts prompts/ --data evals/
# fail the check if pass rate drops or a critical case regresses
Now a prompt change can’t merge if it regresses quality - the same guarantee CI gives you for code. The check posts the pass rate right on the PR.
Make rollback one step
Because prompts are versioned files referenced by version, reverting is a
one-line change (point production back at v3) or a git revert - no code
redeploy of logic, no archaeology. That reversibility is the entire point of
versioning.
For extra safety, load the active prompt version from config (an env var or a small config file) so you can roll back by flipping a value rather than redeploying at all.
When to graduate to a platform
This Git workflow takes you a long way. Consider a dedicated tool (PromptLayer, Humanloop, Langfuse) when you need:
- non-engineers editing prompts without touching Git,
- side-by-side A/B comparison and analytics in a UI, or
- prompt changes decoupled from your deploy cycle.
Until then, prompts/ in your repo plus an eval check in CI is a genuinely
production-grade setup. Pair it with the
Production Checklist to see what else your prompt layer needs.