Session 1.1: Why AI Defaults to Mediocre

Course → Module 1: What Makes Slop, Slop

Session 1 of 10

Ask ChatGPT, Claude, or Gemini to "write a paragraph about coffee" with no system prompt and no additional instructions. Do it three times. You will get three slightly different paragraphs that all sound the same. They will mention aroma. They will mention ritual. They will use the phrase "more than just a beverage" or something functionally identical. They will be competent, inoffensive, and completely forgettable.

This is not a bug. This is how language models work.

Training on the Average

Large language models are trained on massive datasets of text scraped from the internet: books, articles, forums, websites, documentation, social media. The model learns statistical patterns. Given a sequence of words, it predicts the most likely next word. Given a prompt about coffee, it generates the most statistically probable sentences about coffee based on everything it has read.

The internet is mostly mediocre. Not bad, not good. Mediocre. The average quality of writing online follows a normal distribution, and the peak of that distribution is competent, generic, unremarkable prose. The model learns this distribution. Its default output sits at the peak.

graph LR A["Training data:
Billions of web pages"] --> B["Model learns
statistical patterns"] B --> C["Default output =
statistical average"] C --> D["Mediocre by design"]

A language model's default output is the statistical average of all human writing. The average of everything is nothing in particular.

RLHF: The Smoothing Layer

After initial training, models go through Reinforcement Learning from Human Feedback (RLHF). Human raters evaluate pairs of model outputs and indicate which one is better. The model then adjusts to produce more of what raters prefer.

In theory, RLHF should improve quality. In practice, it optimizes for a specific definition of "better" that favors safety and agreeableness over specificity and originality. The raters are typically not domain experts. They are contractors evaluating whether a response is helpful, harmless, and honest. A response that is cautious, balanced, and covers multiple angles scores well. A response that makes a strong, specific claim scores worse, because strong claims risk being wrong.

What RLHF Optimizes For	What This Produces	What Gets Lost
Helpfulness	Comprehensive, covers all angles	Conciseness, decisiveness
Harmlessness	Cautious, hedged, qualified	Strong opinions, bold claims
Honesty	Acknowledges uncertainty	Confidence, authority
Broad appeal	Generic, suitable for any audience	Voice, personality, specificity

The RLHF process takes a model that already defaults to average and smooths it further. The resulting "helpful assistant" persona is not a creative choice by the model. It is the optimization target that the training process converges toward.

The Helpful Assistant Cage

The default persona of most AI models is a helpful, slightly formal, endlessly patient assistant. This persona exists because it is what the training process selects for. It is not the only possible persona. It is the one that scores highest with the broadest range of raters evaluating the broadest range of queries.

The helpful assistant:

Never takes a firm position ("There are arguments on both sides...")
Never admits ignorance directly ("While I don't have specific data on this...")
Always hedges ("It's important to note that...")
Always acknowledges complexity ("This is a nuanced topic...")
Always offers a balanced view, even when one side is clearly wrong

Every one of these patterns is a rational optimization given the training objective. Each one also makes the output less useful for anyone who wants a clear, specific, opinionated answer to a concrete question.

graph TD A["Pre-training
Learns internet average"] --> B["RLHF
Optimizes for safe + helpful"] B --> C["Default persona
'Helpful Assistant'"] C --> D["Hedging"] C --> E["Generic tone"] C --> F["False balance"] C --> G["Enthusiasm substituting
for specificity"]

Why This Matters for Content Production

If you use AI with default settings and no constraints, you will get output that sits precisely at the intersection of "internet average" and "RLHF-smoothed safety." This output will be:

Grammatically correct
Topically relevant
Structurally predictable
Completely devoid of anything a reader would remember

The model is not being lazy. It is doing exactly what it was trained to do. The problem is not the model. The problem is using the model without overriding its defaults. System prompts, few-shot examples, temperature adjustments, and structured output specifications exist precisely to pull the model away from its gravitational center. Without those interventions, you get the average. The average is mediocre.

The next sessions dissect the specific markers of that mediocrity, one pattern at a time.

Assignment

Ask any AI model the same question three times with no system prompt: "Write a paragraph about coffee."
Compare the three outputs side by side. Highlight every phrase that appears in at least two of the three. Those repeated phrases are the model's gravitational center.
Count the highlighted phrases. Create a table: Repeated Phrase | Appears In (2/3 or 3/3) | Why It's Default (what makes this the "safe" choice).
Write one paragraph describing what the model's "voice" sounds like when given no direction. What does the average of everything sound like?

Why AI Defaults to Mediocre

Training on the Average

RLHF: The Smoothing Layer

The Helpful Assistant Cage

Why This Matters for Content Production

Further Reading

Assignment