Course → Module 9: Multi-Agent Workflows
Session 5 of 7

Neither Workers Nor Oracles

Two failure modes plague multi-agent workflows. The first is abdication: trusting agents to make decisions they should not make, rubber-stamping their output, and publishing whatever comes out. The second is micromanagement: reviewing every intermediate output, rewriting agent results by hand, and defeating the purpose of automation entirely.

The correct mental model sits between these extremes. Think of each agent as a colleague with specific expertise. You respect their competence within their domain. You do not ask the research assistant to make editorial decisions. You do not ask the copy editor to choose topics. And you review their work at defined checkpoints, not constantly.

The Colleague Framework

flowchart TD A["You
(Editor-in-Chief)"] --> B["Research Assistant
Agent 1"] A --> C["Ghostwriter
Agent 2"] A --> D["Copy Editor
Agent 3"] B -- "Delivers research brief" --> A C -- "Delivers draft" --> A D -- "Delivers review" --> A A -- "Approves/rejects at each gate" --> E["Published Content"] style A fill:#222221,stroke:#c8a882,color:#ede9e3 style B fill:#222221,stroke:#6b8f71,color:#ede9e3 style C fill:#222221,stroke:#8a8478,color:#ede9e3 style D fill:#222221,stroke:#c47a5a,color:#ede9e3 style E fill:#222221,stroke:#c8a882,color:#ede9e3

You are the editor-in-chief. You do not do everything, but everything goes through you. You set the direction (topic, audience, angle). You review the deliverables. You make the final call. The agents execute within the boundaries you define.

Agent Job Descriptions

A job description for each agent clarifies its role, its boundaries, and its handoff responsibilities. This prevents scope creep, where agents start doing things outside their role and producing unpredictable results.

Agent Role Expertise Limitations Decisions Allowed
Research Assistant Information gathering Search, filtering, source evaluation Cannot judge relevance to audience; cannot assess strategic fit Which sources to include; how to structure the brief
Ghostwriter Prose generation Voice matching, narrative structure, word economy Cannot decide what to write about; cannot verify facts Sentence-level phrasing; paragraph-level structure within outline
Copy Editor Quality assessment Pattern detection, rubric scoring, artifact identification Cannot make editorial judgments about content direction What to flag; severity scoring

Delegation Levels

Not all tasks within an agent's domain deserve the same level of trust. Some tasks the agent handles autonomously. Others require your approval before the chain continues.

Level Description Example
Autonomous Agent executes without review Research Agent formats output as JSON; Writer uses paragraph transitions
Review on exception Agent executes; you review only flagged items Editor flags issues; you review only items scored below 5
Review always Agent executes; you review every output Writer produces draft; you read every word before it moves forward
Human only Agent is not involved Topic selection, publication approval, ethical review

As your agents prove reliable, you can shift tasks from "review always" to "review on exception." This is earned trust, not blind trust. It comes from tracking agent performance over many runs and seeing consistent quality.

The Feedback Loop

When an agent underperforms, the fix is not to discard the agent. The fix is to improve its instructions. If the Writing Agent consistently produces voice breaks in opening paragraphs, the fix is a more specific opening-paragraph instruction in its system prompt, not a return to manual writing.

Track agent performance per dimension:

These metrics tell you where to invest system prompt improvements. A Writing Agent with a 6/10 voice score needs voice fingerprint refinement. An Editing Agent with a 40% false positive rate needs calibration.

The goal is not to remove yourself from the pipeline. The goal is to position yourself where human judgment adds the most value: at decision points and quality gates. Everything else can be delegated to agents whose performance you track and whose instructions you refine.

Further Reading

Assignment

Write a "job description" for each agent in your chain. Include:

  1. Role (one sentence)
  2. Expertise (what it is good at)
  3. Limitations (what it cannot do)
  4. Decisions it can make autonomously
  5. Decisions that require your approval

Then assign a delegation level (autonomous, review on exception, review always, human only) to each task in your pipeline. Be honest about where you trust the agents and where you do not. This framework evolves as you collect performance data.