LLM Temperature Explained: Why AI Gives Different Answers Each Time
Temperature controls how random or deterministic an LLM's responses are. Learn when to turn it up for creativity or down for consistency.

LLM Temperature Explained
You ask ChatGPT the same question twice and get two different answers. Is it broken? No. It's working exactly as designed.
Temperature is the parameter that controls this behavior. It determines whether an LLM gives you the same predictable answer every time or surprises you with creative variations.
Understanding temperature is essential for anyone building AI applications or trying to get consistent results from language models.
How LLMs Generate Text
Before we dive into temperature, you need to understand how LLMs produce output.
When generating each word (or token), the model doesn't just pick "the right answer." Instead, it calculates a probability distribution over its entire vocabulary. Every possible next token gets a probability score.
For example, given the prompt "The sky is...", the model might calculate:
Token | Probability |
|---|---|
blue | 45% |
clear | 20% |
dark | 12% |
beautiful | 8% |
cloudy | 6% |
... | ... |
Now the question becomes: how do we pick which token to output?
The simplest approach is greedy decoding: always pick the highest probability token ("blue" in this case). But this leads to boring, repetitive text.
Instead, most LLMs use sampling: randomly select a token based on the probability distribution. Higher probability tokens are more likely to be chosen, but lower probability options still have a chance.
This is where temperature comes in.
What Temperature Actually Does
Temperature is a number (typically between 0 and 2) that scales the probability distribution before sampling.
Low Temperature (0 to 0.3)
Low temperature makes the probability distribution sharper. High-probability tokens become even more likely, while low-probability tokens become nearly impossible.
With temperature = 0.1:
Token | Original | After Temperature |
|---|---|---|
blue | 45% | 85% |
clear | 20% | 10% |
dark | 12% | 3% |
beautiful | 8% | 1% |
cloudy | 6% | 1% |
The model almost always picks "blue." Responses become predictable and consistent.
High Temperature (0.7 to 1.5+)
High temperature flattens the distribution. Lower-probability tokens get a better chance of being selected.
With temperature = 1.5:
Token | Original | After Temperature |
|---|---|---|
blue | 45% | 30% |
clear | 20% | 18% |
dark | 12% | 15% |
beautiful | 8% | 13% |
cloudy | 6% | 10% |
Now "dark," "beautiful," and even "cloudy" have reasonable chances. Responses become more varied and creative.
Temperature = 0
At exactly zero, the model uses greedy decoding. It always picks the highest probability token. Every identical prompt produces an identical response.
The Math Behind Temperature
If you want to understand the mechanics, here's the formula.
Without temperature, the model converts raw scores (logits) into probabilities using softmax:
P(token_i) = exp(logit_i) / sum(exp(all_logits))
With temperature T, the formula becomes:
P(token_i) = exp(logit_i / T) / sum(exp(all_logits / T))
When T < 1: Dividing by a small number makes differences between logits larger, creating a sharper distribution.
When T > 1: Dividing by a large number makes differences smaller, flattening the distribution.
When T = 1: No change to the original distribution.
When to Use Different Temperature Settings
Use Low Temperature (0 to 0.3) For:
Use Case | Why |
|---|---|
Code generation | You want correct syntax, not creative bugs |
Factual Q&A | Consistency matters more than variety |
Data extraction | Structured output should be predictable |
Math problems | There's usually one right answer |
Classification tasks | You want the most likely category |
Production APIs | Reproducible results are essential |
Example: An AI assistant that answers customer support questions should use low temperature to give consistent, accurate answers and reduce hallucinations.
Use Medium Temperature (0.4 to 0.7) For:
Use Case | Why |
|---|---|
General conversation | Natural dialogue has some variation |
Summarization | Some flexibility in phrasing is okay |
Translation | Multiple valid phrasings exist |
Email drafting | Professional but not robotic |
Example: A chatbot for casual conversation can use medium temperature to feel more natural without going off the rails.
Use High Temperature (0.8 to 1.2) For:
Use Case | Why |
|---|---|
Creative writing | You want surprising, original content |
Brainstorming | Diverse ideas are the goal |
Poetry and fiction | Unconventional word choices work well |
Generating alternatives | You want multiple different options |
Example: An AI writing assistant helping with a novel should use higher temperature to produce unexpected plot twists and vivid descriptions.
Avoid Very High Temperature (1.5+):
Going too high causes incoherent output. The model starts picking very unlikely tokens, leading to nonsensical text.
Temperature vs Other Sampling Parameters
Temperature isn't the only knob for controlling randomness. Here are the other common ones:
Top-p (Nucleus Sampling)
Instead of scaling probabilities, top-p limits which tokens can be selected.
With top-p = 0.9, the model only considers tokens that together make up 90% of the probability mass. Very unlikely tokens are excluded entirely.
Parameter | How It Works |
|---|---|
Temperature | Scales all probabilities up or down |
Top-p | Cuts off the long tail of unlikely tokens |
Many practitioners use both together: temperature = 0.7 with top-p = 0.95.
Top-k
Top-k only considers the k most likely tokens. With top-k = 50, only the top 50 tokens can be selected.
This is simpler but less adaptive than top-p. A fixed k might be too restrictive in some contexts and too loose in others.
Frequency Penalty and Presence Penalty
These penalize tokens that have already appeared, reducing repetition:
- Frequency penalty: Reduces probability based on how many times a token has appeared
- Presence penalty: Flat reduction if the token has appeared at all
Practical Examples
Example 1: Code Generation
Prompt: "Write a Python function to reverse a string"
Temperature 0:
def reverse_string(s):
return s[::-1]Temperature 1.2:
def flip_text_backwards(input_text):
reversed_chars = []
for i in range(len(input_text) - 1, -1, -1):
reversed_chars.append(input_text[i])
return ''.join(reversed_chars)The high temperature version works but uses an unconventional approach. For production code, you want the predictable, idiomatic solution.
Example 2: Creative Writing
Prompt: "Describe a sunset in one sentence"
Temperature 0.2 (run 3 times):
- "The sun dipped below the horizon, painting the sky in shades of orange and pink."
- "The sun dipped below the horizon, painting the sky in shades of orange and pink."
- "The sun dipped below the horizon, painting the sky in shades of orange and pink."
Temperature 1.0 (run 3 times):
- "The sun melted into the sea like a burning coin dropped into mercury."
- "Ribbons of fire stretched across the sky as daylight surrendered to the approaching night."
- "The horizon swallowed the sun whole, leaving only a bruise of purple and gold."
For creative writing, the variety from higher temperature produces more interesting results.
Best Practices
1. Start with the Default
Most APIs default to temperature 0.7 or 1.0. This works reasonably well for general use.
2. Adjust Based on Your Use Case
Ask yourself: "Do I want the same answer every time, or different answers?"
- Same answer → Lower temperature
- Different answers → Higher temperature
3. Use Temperature 0 for Testing
When debugging prompts, set temperature to 0 so you can reliably reproduce issues.
4. Don't Go Too Extreme
- Below 0.1 is rarely necessary (just use 0)
- Above 1.5 usually produces nonsense
5. Combine with Top-p
A common production setting: temperature = 0.7, top-p = 0.95. This gives some creativity while preventing completely random outputs.
Temperature in Popular APIs
Service | Default | Range | Notes |
|---|---|---|---|
OpenAI GPT-4 | 1.0 | 0-2 | Also supports top-p |
Anthropic Claude | 1.0 | 0-1 | Lower max than GPT |
Google Gemini | 0.9 | 0-2 | Also supports top-k |
Llama (local) | 0.8 | 0-2 | Varies by interface |
Key Takeaways
Concept | What to Remember |
|---|---|
Temperature | Controls randomness in LLM output |
Low (0-0.3) | Predictable, consistent, deterministic |
Medium (0.4-0.7) | Balanced, natural variation |
High (0.8-1.2) | Creative, diverse, surprising |
Too high (1.5+) | Incoherent, nonsensical |
Temperature 0 | Greedy decoding, identical outputs |
Temperature is one of the most important parameters for controlling LLM behavior. Master it, and you can tune your AI applications for exactly the right balance of consistency and creativity.
Related Concepts
Understanding temperature connects to several other LLM concepts:
- Context Windows: How much text the model can consider
- Tokenization: How text is broken into pieces before processing
- Embeddings: How tokens become meaningful numbers


