What is temperature in LLMs?

Temperature is a parameter that controls the randomness of an LLM's output. Low temperature (0-0.3) makes responses more focused and deterministic. High temperature (0.7-1.0+) makes responses more creative and varied.

What temperature should I use for coding tasks?

For coding, use low temperature (0-0.2). You want consistent, correct syntax rather than creative variations. Higher temperatures can introduce bugs or unconventional approaches.

Why does ChatGPT give different answers to the same question?

LLMs use probabilistic sampling to generate text. Even with the same prompt, the model randomly selects from probable next tokens. Temperature controls how much randomness is involved in this selection.

What is the difference between temperature and top-p?

Temperature scales the probability distribution of all tokens. Top-p (nucleus sampling) limits selection to tokens that make up a cumulative probability threshold. Both control randomness but in different ways.

LLM Temperature Explained

You ask ChatGPT the same question twice and get two different answers. Is it broken? No. It's working exactly as designed.

Temperature is the parameter that controls this behavior. It determines whether an LLM gives you the same predictable answer every time or surprises you with creative variations.

Understanding temperature is essential for anyone building AI applications or trying to get consistent results from language models.

How LLMs Generate Text

Before we dive into temperature, you need to understand how LLMs produce output.

When generating each word (or token), the model doesn't just pick "the right answer." Instead, it calculates a probability distribution over its entire vocabulary. Every possible next token gets a probability score.

For example, given the prompt "The sky is...", the model might calculate:

Token	Probability
blue	45%
clear	20%
dark	12%
beautiful	8%
cloudy	6%
...	...

Now the question becomes: how do we pick which token to output?

The simplest approach is greedy decoding: always pick the highest probability token ("blue" in this case). But this leads to boring, repetitive text.

Instead, most LLMs use sampling: randomly select a token based on the probability distribution. Higher probability tokens are more likely to be chosen, but lower probability options still have a chance.

This is where temperature comes in.

What Temperature Actually Does

Temperature is a number (typically between 0 and 2) that scales the probability distribution before sampling.

Low Temperature (0 to 0.3)

Low temperature makes the probability distribution sharper. High-probability tokens become even more likely, while low-probability tokens become nearly impossible.

With temperature = 0.1:

Token	Original	After Temperature
blue	45%	85%
clear	20%	10%
dark	12%	3%
beautiful	8%	1%
cloudy	6%	1%

The model almost always picks "blue." Responses become predictable and consistent.

High Temperature (0.7 to 1.5+)

High temperature flattens the distribution. Lower-probability tokens get a better chance of being selected.

With temperature = 1.5:

Token	Original	After Temperature
blue	45%	30%
clear	20%	18%
dark	12%	15%
beautiful	8%	13%
cloudy	6%	10%

Now "dark," "beautiful," and even "cloudy" have reasonable chances. Responses become more varied and creative.

Temperature = 0

At exactly zero, the model uses greedy decoding. It always picks the highest probability token. Every identical prompt produces an identical response.

The Math Behind Temperature

If you want to understand the mechanics, here's the formula.

Without temperature, the model converts raw scores (logits) into probabilities using softmax:

P(token_i) = exp(logit_i) / sum(exp(all_logits))

With temperature T, the formula becomes:

P(token_i) = exp(logit_i / T) / sum(exp(all_logits / T))

When T < 1: Dividing by a small number makes differences between logits larger, creating a sharper distribution.

When T > 1: Dividing by a large number makes differences smaller, flattening the distribution.

When T = 1: No change to the original distribution.

When to Use Different Temperature Settings

Use Low Temperature (0 to 0.3) For:

Use Case	Why
Code generation	You want correct syntax, not creative bugs
Factual Q&A	Consistency matters more than variety
Data extraction	Structured output should be predictable
Math problems	There's usually one right answer
Classification tasks	You want the most likely category
Production APIs	Reproducible results are essential

Example: An AI assistant that answers customer support questions should use low temperature to give consistent, accurate answers and reduce hallucinations.

Use Medium Temperature (0.4 to 0.7) For:

Use Case	Why
General conversation	Natural dialogue has some variation
Summarization	Some flexibility in phrasing is okay
Translation	Multiple valid phrasings exist
Email drafting	Professional but not robotic

Example: A chatbot for casual conversation can use medium temperature to feel more natural without going off the rails.

Use High Temperature (0.8 to 1.2) For:

Use Case	Why
Creative writing	You want surprising, original content
Brainstorming	Diverse ideas are the goal
Poetry and fiction	Unconventional word choices work well
Generating alternatives	You want multiple different options

Example: An AI writing assistant helping with a novel should use higher temperature to produce unexpected plot twists and vivid descriptions.

Avoid Very High Temperature (1.5+):

Going too high causes incoherent output. The model starts picking very unlikely tokens, leading to nonsensical text.

Temperature vs Other Sampling Parameters

Temperature isn't the only knob for controlling randomness. Here are the other common ones:

Top-p (Nucleus Sampling)

Instead of scaling probabilities, top-p limits which tokens can be selected.

With top-p = 0.9, the model only considers tokens that together make up 90% of the probability mass. Very unlikely tokens are excluded entirely.

Parameter	How It Works
Temperature	Scales all probabilities up or down
Top-p	Cuts off the long tail of unlikely tokens

Many practitioners use both together: temperature = 0.7 with top-p = 0.95.

Top-k

Top-k only considers the k most likely tokens. With top-k = 50, only the top 50 tokens can be selected.

This is simpler but less adaptive than top-p. A fixed k might be too restrictive in some contexts and too loose in others.

Frequency Penalty and Presence Penalty

These penalize tokens that have already appeared, reducing repetition:

Frequency penalty: Reduces probability based on how many times a token has appeared
Presence penalty: Flat reduction if the token has appeared at all

Practical Examples

Example 1: Code Generation

Prompt: "Write a Python function to reverse a string"

Temperature 0:

python

def reverse_string(s):
    return s[::-1]

Temperature 1.2:

python

def flip_text_backwards(input_text):
    reversed_chars = []
    for i in range(len(input_text) - 1, -1, -1):
        reversed_chars.append(input_text[i])
    return ''.join(reversed_chars)

The high temperature version works but uses an unconventional approach. For production code, you want the predictable, idiomatic solution.

Example 2: Creative Writing

Prompt: "Describe a sunset in one sentence"

Temperature 0.2 (run 3 times):

"The sun dipped below the horizon, painting the sky in shades of orange and pink."
"The sun dipped below the horizon, painting the sky in shades of orange and pink."
"The sun dipped below the horizon, painting the sky in shades of orange and pink."

Temperature 1.0 (run 3 times):

"The sun melted into the sea like a burning coin dropped into mercury."
"Ribbons of fire stretched across the sky as daylight surrendered to the approaching night."
"The horizon swallowed the sun whole, leaving only a bruise of purple and gold."

For creative writing, the variety from higher temperature produces more interesting results.

Best Practices

1. Start with the Default

Most APIs default to temperature 0.7 or 1.0. This works reasonably well for general use.

2. Adjust Based on Your Use Case

Ask yourself: "Do I want the same answer every time, or different answers?"

Same answer → Lower temperature
Different answers → Higher temperature

3. Use Temperature 0 for Testing

When debugging prompts, set temperature to 0 so you can reliably reproduce issues.

4. Don't Go Too Extreme

Below 0.1 is rarely necessary (just use 0)
Above 1.5 usually produces nonsense

5. Combine with Top-p

A common production setting: temperature = 0.7, top-p = 0.95. This gives some creativity while preventing completely random outputs.

Temperature in Popular APIs

Service	Default	Range	Notes
OpenAI GPT-4	1.0	0-2	Also supports top-p
Anthropic Claude	1.0	0-1	Lower max than GPT
Google Gemini	0.9	0-2	Also supports top-k
Llama (local)	0.8	0-2	Varies by interface

Key Takeaways

Concept	What to Remember
Temperature	Controls randomness in LLM output
Low (0-0.3)	Predictable, consistent, deterministic
Medium (0.4-0.7)	Balanced, natural variation
High (0.8-1.2)	Creative, diverse, surprising
Too high (1.5+)	Incoherent, nonsensical
Temperature 0	Greedy decoding, identical outputs

Temperature is one of the most important parameters for controlling LLM behavior. Master it, and you can tune your AI applications for exactly the right balance of consistency and creativity.

Related Concepts

Understanding temperature connects to several other LLM concepts:

Context Windows: How much text the model can consider
Tokenization: How text is broken into pieces before processing
Embeddings: How tokens become meaningful numbers

LLM Temperature Explained: Why AI Gives Different Answers Each Time