Fundamentals

Temperature

A parameter that controls the randomness of a model's output. Lower values (e.g., 0.0) make responses more deterministic and focused, while higher values (e.g., 1.0) make them more creative and varied.

Temperature is a sampling parameter that scales the probability distribution over the model's vocabulary at each step of text generation. Technically, it divides the logits (raw prediction scores) before applying the softmax function. A temperature of 0 makes the model always pick the highest-probability token, while higher temperatures flatten the distribution, giving lower-probability tokens a better chance of being selected.

In practice, temperature acts as a creativity dial. For factual tasks like data extraction, code generation, or classification, a low temperature (0.0 to 0.3) produces consistent, predictable outputs. For creative writing, brainstorming, or generating diverse responses, a higher temperature (0.7 to 1.0) adds variety and surprise. Going above 1.0 is possible but often produces incoherent or nonsensical text.

Temperature interacts with other sampling parameters like top-p and top-k. Setting temperature to 0 overrides both, producing fully deterministic output (assuming the same input). When using temperature and top-p together, most practitioners adjust one and leave the other at its default. OpenAI recommends changing either temperature or top-p, but not both simultaneously.

Choosing the right temperature depends on your application. Customer-facing chatbots often use 0.3-0.5 for a balance of reliability and naturalness. Code assistants typically use 0.0-0.2 for precision. Creative applications might use 0.8-1.0. Many developers run the same prompt at multiple temperatures to find the sweet spot for their specific use case.

System Prompt

Throughput

Explore more AI concepts in the glossary

Browse Full Glossary