Session 5.8: Prompt Iteration: The Testing Mindset

Course → Module 5: Prompt Engineering

Session 8 of 10

A Prompt Is a Specification

A prompt is not a wish. It is a specification. Specifications require testing. You write a prompt, run it, evaluate the output, identify the gap between output and desired result, modify the prompt, and run it again. This is an engineering feedback loop.

Most people write a prompt once and accept whatever comes back. If it is roughly acceptable, they publish it. If it is bad, they rewrite the prompt from scratch. Neither approach is efficient. The first produces inconsistent quality. The second wastes the information contained in the failed attempt.

A prompt is not done when it works once. A prompt is done when it works every time. Consistency is the standard. If your prompt produces good output 3 out of 5 runs, it has a 40% failure rate. In production, that means 40% of your batch needs regeneration. That is not a working prompt. That is a draft prompt.

The Iteration Loop

Each iteration follows the same steps. The discipline is in doing all five steps every time, not skipping the evaluation to save time.

graph TD A["1. Run prompt
(5 times, same parameters)"] --> B["2. Evaluate each output
against quality criteria"] B --> C["3. Identify the most
common failure mode"] C --> D["4. Modify prompt to
address that failure"] D --> E["5. Run modified prompt
(5 more times)"] E --> F{"4/5 or better
pass rate?"} F -->|Yes| G["Prompt is production-ready"] F -->|No| A style A fill:#222221,stroke:#c8a882,color:#ede9e3 style C fill:#222221,stroke:#c47a5a,color:#ede9e3 style G fill:#222221,stroke:#6b8f71,color:#ede9e3

Running the prompt five times is not optional. A single successful run tells you nothing about consistency. Five runs reveal patterns: does the output always start with a generic opening? Does paragraph three always contain hedging? Does the AI ignore a specific constraint in 2 out of 5 runs? These patterns guide your modifications.

Common Failure Modes and Fixes

Failure Mode	Symptom	Prompt Fix
Structural drift	Output ignores the requested structure	Add "Follow this structure exactly. Do not add, remove, or reorder sections."
Constraint violation	Output uses forbidden phrases	Move constraints to both the beginning and end of the prompt
Tone inconsistency	Output shifts tone between sections	Add a few-shot example demonstrating consistent tone throughout
Length violation	Output is too long or too short	Specify word count per section, not just total
Hedging	Output qualifies every statement	Add "State claims directly. Do not hedge with 'arguably,' 'perhaps,' or 'it could be said.'"
Generic opening	Output starts with "In today's..." or similar	Add "Begin with a specific fact, example, or direct statement. Never begin with a generic framing sentence."

The Iteration Journal

Track your iterations. For each prompt, maintain a log:

Version: v1, v2, v3...
Change made: What you modified and why
Pass rate: How many of 5 runs met your quality criteria
Remaining failure: What still goes wrong

This journal serves two purposes. First, it prevents you from repeating failed modifications. Second, it builds a knowledge base of what works for your content type. After iterating on ten prompts, you will notice patterns: certain constraints always need to be repeated at the end, certain failure modes require few-shot examples rather than instructions, certain structures need explicit section-by-section word counts.

When to Stop Iterating

The target is a 4/5 or better pass rate across five runs. This means the prompt produces acceptable output at least 80% of the time. Pushing for 5/5 is often not worth the effort. The marginal improvement from 4/5 to 5/5 usually requires disproportionate prompt complexity. Accept 4/5 and handle the occasional failure through your human review process.

If you cannot achieve 3/5 after five iterations, the problem may not be the prompt. The task may be too complex for a single prompt, requiring a chain-of-thought approach or a multi-agent workflow (covered in Module 9).

Assignment

Pick a prompt you have been using. Run it five times and evaluate each output against your quality criteria. Identify the most common failure mode. Modify the prompt to address that failure. Run five more times. Did the failure rate decrease? Document each iteration in a log with: version number, change made, pass rate, and remaining failures. Repeat until you achieve 4/5 or better.

Prompt Iteration: The Testing Mindset