Session 5.6: Temperature, Top-p, and Output Control

Course → Module 5: Prompt Engineering

Session 6 of 10

Quality Control Knobs

Temperature, top-p, and max tokens are not abstract technical parameters. They are quality control knobs that determine the character of your AI's output. Setting them deliberately is the difference between a production tool and a slot machine.

Most people accept the defaults. Defaults are chosen by the API provider to be safe for the widest range of use cases. Safe for everyone means optimized for no one. Your content type has specific requirements that the defaults do not address.

Temperature controls the risk your AI takes. At 0, the AI always picks the most probable next word. The output is predictable and repetitive. At 1, the AI takes more risks with word choices. The output is varied and potentially incoherent. Your job is finding the value that produces varied but reliable output for your specific content type.

Temperature in Practice

Temperature is a number between 0 and 2 (though values above 1 are rarely useful for content production). The practical range is 0 to 1.

Temperature	Behavior	Good For	Bad For
0	Always picks the most likely word	Factual summaries, data extraction, code	Creative writing, anything requiring variety
0.2-0.3	Slight variation, mostly predictable	Technical documentation, reports	Content that needs a distinctive voice
0.5-0.7	Balanced variety and coherence	Blog posts, articles, reviews	Highly factual or highly creative tasks
0.8-1.0	High variety, occasional unexpected choices	Brainstorming, creative fiction, ideation	Anything requiring accuracy or consistency

graph LR subgraph TScale["Temperature Scale"] direction LR T0["0
Deterministic"] --- T3["0.3
Conservative"] --- T5["0.5
Balanced"] --- T7["0.7
Creative"] --- T10["1.0
Experimental"] end T0 --> U1["Fact extraction
Code generation"] T3 --> U2["Technical docs
Reports"] T5 --> U3["Blog posts
Articles"] T7 --> U4["Marketing copy
Fiction"] T10 --> U5["Brainstorming
Ideation"] style T0 fill:#222221,stroke:#6b8f71,color:#ede9e3 style T5 fill:#222221,stroke:#c8a882,color:#ede9e3 style T10 fill:#222221,stroke:#c47a5a,color:#ede9e3

Top-p (Nucleus Sampling)

Top-p controls the pool of words the AI considers. At top-p 0.1, the AI only considers the top 10% most likely words. At top-p 0.9, it considers the top 90%. A smaller pool means more predictable output. A larger pool means more diverse vocabulary.

The general recommendation: adjust either temperature or top-p, not both simultaneously. They influence the same dimension of output (randomness vs. predictability). Changing both at once makes it difficult to isolate which parameter caused a quality change.

For most content production, set top-p to 1 (the default) and control output character entirely through temperature. This simplifies your parameter space without sacrificing control.

Max Tokens

Max tokens sets the ceiling on output length. One token is approximately 0.75 words in English. A 1000-word article requires roughly 1300-1500 tokens. Setting max tokens too low truncates your output mid-sentence. Setting it too high wastes your budget on potential output length you do not need.

Set max tokens to approximately 1.5 times your target word count (in tokens). For a 1000-word article, set max tokens to 2000. This gives the AI room to complete its thought without leaving excessive unused capacity.

Finding Your Parameters

The only way to find optimal parameters for your content type is testing. Generate the same piece at temperature 0, 0.3, 0.5, 0.7, and 1.0. Read all five outputs. Identify the temperature at which the output becomes unreliable (factual errors, incoherent sentences, off-topic tangents). Identify the temperature at which the output becomes too robotic (repetitive phrasing, flat rhythm, no personality). Your production temperature lives between those two boundaries.

Document your findings. "Blog posts: temperature 0.5, top-p 1, max tokens 2000" becomes a production parameter that you set once and reuse for every blog post generation. Different content types may have different optimal parameters.

Assignment

Generate the same piece of content at temperatures 0, 0.3, 0.7, and 1.0. Keep all other parameters identical. Compare all four outputs. At what temperature does the output start becoming unreliable? At what temperature is it too robotic? Find your sweet spot for your content type. Document it as: "Content type: [X], optimal temperature: [Y], rationale: [Z]."

Temperature, Top-p, and Output Control