Temperature
Last updated: February 16, 2026
Temperature is a parameter that controls the randomness and creativity of a large language model's output. It adjusts the probability distribution over possible next tokens during text generation, directly influencing whether the model produces predictable or surprising responses.
How It Works
When an LLM generates text, it calculates a probability score for every token in its vocabulary as a candidate for the next word. Temperature scales these probabilities before a token is selected:
- Low temperature (0.0 - 0.3): Sharpens the distribution, making the highest-probability tokens far more likely to be chosen. The output becomes more deterministic, focused, and repetitive. A temperature of 0 always selects the single most probable token.
- Medium temperature (0.4 - 0.7): Balances coherence with variety. The model occasionally picks less obvious tokens, producing natural-sounding text with some creative variation.
- High temperature (0.8 - 1.5+): Flattens the distribution, giving lower-probability tokens a better chance. The output becomes more creative, diverse, and unpredictable -- but also more prone to errors or nonsensical text.
Why It Matters
Temperature is one of the most accessible levers for tuning AI behavior without changing the model or the prompt. Choosing the right temperature for your use case can significantly affect output quality and user experience.
In Practice
For AI coding assistants, a lower temperature (0.1 - 0.3) is generally preferred because code generation demands precision and consistency. You want the model to produce syntactically correct, deterministic code rather than creative variations. Conversational agents or creative writing tools may benefit from higher temperatures. Most deployment configurations expose temperature as a simple numeric setting, making it easy to experiment and find the right balance for your specific application.