Prompt versioning with GitHub

A concrete workflow for versioning LLM prompts in GitHub - file layout, pull-request review, eval gating in CI, and one-step rollback - without buying a dedicated tool.

You don’t need a prompt-management platform to start treating prompts like code. You already have the tool: Git. Here’s a workflow that gives you versioning, review, testing and rollback using GitHub alone.

Get prompts out of code

First move: pull prompts out of inline strings and into versioned files.

prompts/
  rag-answer/
    v4.md          # the current prompt
    metadata.yaml  # model, params, owner, changelog
  support-triage/
    v2.md
evals/
  rag-answer.jsonl # the eval dataset for this prompt

Plain Markdown files for the prompt body, a small metadata.yaml for model and parameters. Now every change has a diff, an author and a history - the three things you lose when prompts live as string literals.

Version with intent, not just commits

Two complementary layers:

Git history gives you a commit per change automatically.
A semantic version in the filename or frontmatter (v4) marks meaningful releases you reference from code and traces. Log that exact version on every request (prompt_version: "rag-answer-v4") so a trace can be tied back to the prompt that produced it.

Review prompts like code

Make prompt changes go through a pull request:

A diff shows exactly what changed - invaluable when quality shifts.
A reviewer (often a domain expert, not just an engineer) approves.
The PR description records why, which your future self will thank you for.

This single habit prevents the most common cause of mystery regressions: someone quietly editing the live prompt.

Gate the merge on evals

This is where it becomes real LLMOps. Wire your eval dataset into CI so it runs on every prompt PR:

# .github/workflows/prompt-eval.yml (sketch)
on:
  pull_request:
    paths: ['prompts/**', 'evals/**']
jobs:
  eval:
    steps:
      - run: python run_evals.py --prompts prompts/ --data evals/
      # fail the check if pass rate drops or a critical case regresses

Now a prompt change can’t merge if it regresses quality - the same guarantee CI gives you for code. The check posts the pass rate right on the PR.

Make rollback one step

Because prompts are versioned files referenced by version, reverting is a one-line change (point production back at v3) or a git revert - no code redeploy of logic, no archaeology. That reversibility is the entire point of versioning.

For extra safety, load the active prompt version from config (an env var or a small config file) so you can roll back by flipping a value rather than redeploying at all.

When to graduate to a platform

This Git workflow takes you a long way. Consider a dedicated tool (PromptLayer, Humanloop, Langfuse) when you need:

non-engineers editing prompts without touching Git,
side-by-side A/B comparison and analytics in a UI, or
prompt changes decoupled from your deploy cycle.

Until then, prompts/ in your repo plus an eval check in CI is a genuinely production-grade setup. Pair it with the Production Checklist to see what else your prompt layer needs.

Prompt versioning with GitHub

Get prompts out of code#

Version with intent, not just commits#

Review prompts like code#

Gate the merge on evals#

Make rollback one step#

When to graduate to a platform#

Prompt Versioning: Why prompts should be treated like code

How to build your first eval dataset

What to log in an LLM trace