LLM-conceptsTemperatureLLMAI ParametersSamplingGenAIPrompt Engineering

LLM Temperature Explained: Why AI Gives Different Answers Each Time

Temperature controls how random or deterministic an LLM's responses are. Learn when to turn it up for creativity or down for consistency.

LLM Temperature Explained: Why AI Gives Different Answers Each Time

LLM Temperature Explained

You ask ChatGPT the same question twice and get two different answers. Is it broken? No. It's working exactly as designed.

Temperature is the parameter that controls this behavior. It determines whether an LLM gives you the same predictable answer every time or surprises you with creative variations.

Understanding temperature is essential for anyone building AI applications or trying to get consistent results from language models.


How LLMs Generate Text

Before we dive into temperature, you need to understand how LLMs produce output.

When generating each word (or token), the model doesn't just pick "the right answer." Instead, it calculates a probability distribution over its entire vocabulary. Every possible next token gets a probability score.

For example, given the prompt "The sky is...", the model might calculate:

Token
Probability
blue
45%
clear
20%
dark
12%
beautiful
8%
cloudy
6%
...
...

Now the question becomes: how do we pick which token to output?

The simplest approach is greedy decoding: always pick the highest probability token ("blue" in this case). But this leads to boring, repetitive text.

Instead, most LLMs use sampling: randomly select a token based on the probability distribution. Higher probability tokens are more likely to be chosen, but lower probability options still have a chance.

This is where temperature comes in.


What Temperature Actually Does

Temperature is a number (typically between 0 and 2) that scales the probability distribution before sampling.

Low Temperature (0 to 0.3)

Low temperature makes the probability distribution sharper. High-probability tokens become even more likely, while low-probability tokens become nearly impossible.

With temperature = 0.1:

Token
Original
After Temperature
blue
45%
85%
clear
20%
10%
dark
12%
3%
beautiful
8%
1%
cloudy
6%
1%

The model almost always picks "blue." Responses become predictable and consistent.

High Temperature (0.7 to 1.5+)

High temperature flattens the distribution. Lower-probability tokens get a better chance of being selected.

With temperature = 1.5:

Token
Original
After Temperature
blue
45%
30%
clear
20%
18%
dark
12%
15%
beautiful
8%
13%
cloudy
6%
10%

Now "dark," "beautiful," and even "cloudy" have reasonable chances. Responses become more varied and creative.

Temperature = 0

At exactly zero, the model uses greedy decoding. It always picks the highest probability token. Every identical prompt produces an identical response.


The Math Behind Temperature

If you want to understand the mechanics, here's the formula.

Without temperature, the model converts raw scores (logits) into probabilities using softmax:

P(token_i) = exp(logit_i) / sum(exp(all_logits))

With temperature T, the formula becomes:

P(token_i) = exp(logit_i / T) / sum(exp(all_logits / T))

When T < 1: Dividing by a small number makes differences between logits larger, creating a sharper distribution.

When T > 1: Dividing by a large number makes differences smaller, flattening the distribution.

When T = 1: No change to the original distribution.


When to Use Different Temperature Settings

Use Low Temperature (0 to 0.3) For:

Use Case
Why
Code generation
You want correct syntax, not creative bugs
Factual Q&A
Consistency matters more than variety
Data extraction
Structured output should be predictable
Math problems
There's usually one right answer
Classification tasks
You want the most likely category
Production APIs
Reproducible results are essential

Example: An AI assistant that answers customer support questions should use low temperature to give consistent, accurate answers and reduce hallucinations.

Use Medium Temperature (0.4 to 0.7) For:

Use Case
Why
General conversation
Natural dialogue has some variation
Summarization
Some flexibility in phrasing is okay
Translation
Multiple valid phrasings exist
Email drafting
Professional but not robotic

Example: A chatbot for casual conversation can use medium temperature to feel more natural without going off the rails.

Use High Temperature (0.8 to 1.2) For:

Use Case
Why
Creative writing
You want surprising, original content
Brainstorming
Diverse ideas are the goal
Poetry and fiction
Unconventional word choices work well
Generating alternatives
You want multiple different options

Example: An AI writing assistant helping with a novel should use higher temperature to produce unexpected plot twists and vivid descriptions.

Avoid Very High Temperature (1.5+):

Going too high causes incoherent output. The model starts picking very unlikely tokens, leading to nonsensical text.


Temperature vs Other Sampling Parameters

Temperature isn't the only knob for controlling randomness. Here are the other common ones:

Top-p (Nucleus Sampling)

Instead of scaling probabilities, top-p limits which tokens can be selected.

With top-p = 0.9, the model only considers tokens that together make up 90% of the probability mass. Very unlikely tokens are excluded entirely.

Parameter
How It Works
Temperature
Scales all probabilities up or down
Top-p
Cuts off the long tail of unlikely tokens

Many practitioners use both together: temperature = 0.7 with top-p = 0.95.

Top-k

Top-k only considers the k most likely tokens. With top-k = 50, only the top 50 tokens can be selected.

This is simpler but less adaptive than top-p. A fixed k might be too restrictive in some contexts and too loose in others.

Frequency Penalty and Presence Penalty

These penalize tokens that have already appeared, reducing repetition:

  • Frequency penalty: Reduces probability based on how many times a token has appeared
  • Presence penalty: Flat reduction if the token has appeared at all

Practical Examples

Example 1: Code Generation

Prompt: "Write a Python function to reverse a string"

Temperature 0:

python
def reverse_string(s): return s[::-1]

Temperature 1.2:

python
def flip_text_backwards(input_text): reversed_chars = [] for i in range(len(input_text) - 1, -1, -1): reversed_chars.append(input_text[i]) return ''.join(reversed_chars)

The high temperature version works but uses an unconventional approach. For production code, you want the predictable, idiomatic solution.

Example 2: Creative Writing

Prompt: "Describe a sunset in one sentence"

Temperature 0.2 (run 3 times):

  1. "The sun dipped below the horizon, painting the sky in shades of orange and pink."
  2. "The sun dipped below the horizon, painting the sky in shades of orange and pink."
  3. "The sun dipped below the horizon, painting the sky in shades of orange and pink."

Temperature 1.0 (run 3 times):

  1. "The sun melted into the sea like a burning coin dropped into mercury."
  2. "Ribbons of fire stretched across the sky as daylight surrendered to the approaching night."
  3. "The horizon swallowed the sun whole, leaving only a bruise of purple and gold."

For creative writing, the variety from higher temperature produces more interesting results.


Best Practices

1. Start with the Default

Most APIs default to temperature 0.7 or 1.0. This works reasonably well for general use.

2. Adjust Based on Your Use Case

Ask yourself: "Do I want the same answer every time, or different answers?"

  • Same answer → Lower temperature
  • Different answers → Higher temperature

3. Use Temperature 0 for Testing

When debugging prompts, set temperature to 0 so you can reliably reproduce issues.

4. Don't Go Too Extreme

  • Below 0.1 is rarely necessary (just use 0)
  • Above 1.5 usually produces nonsense

5. Combine with Top-p

A common production setting: temperature = 0.7, top-p = 0.95. This gives some creativity while preventing completely random outputs.


Temperature in Popular APIs

Service
Default
Range
Notes
OpenAI GPT-4
1.0
0-2
Also supports top-p
Anthropic Claude
1.0
0-1
Lower max than GPT
Google Gemini
0.9
0-2
Also supports top-k
Llama (local)
0.8
0-2
Varies by interface

Key Takeaways

Concept
What to Remember
Temperature
Controls randomness in LLM output
Low (0-0.3)
Predictable, consistent, deterministic
Medium (0.4-0.7)
Balanced, natural variation
High (0.8-1.2)
Creative, diverse, surprising
Too high (1.5+)
Incoherent, nonsensical
Temperature 0
Greedy decoding, identical outputs

Temperature is one of the most important parameters for controlling LLM behavior. Master it, and you can tune your AI applications for exactly the right balance of consistency and creativity.


Related Concepts

Understanding temperature connects to several other LLM concepts:

Related Articles