GPTCrunch
Back to AI Glossary
Architecture

Parameters

The learned numerical values (weights) within a neural network that determine the model's behavior. More parameters generally means more capacity to store knowledge and handle complex tasks, but also higher computational costs.

Parameters are the trainable weights in a neural network — the numbers that get adjusted during training to minimize prediction errors. When people say a model has "70 billion parameters," they mean it has 70 billion individual numerical values that collectively encode everything the model has learned. These parameters are distributed across the model's layers, attention heads, and feed-forward networks.

Model size is commonly measured in parameters and loosely correlates with capability. GPT-3 had 175 billion parameters, while estimates for GPT-4 suggest over a trillion (possibly via a mixture-of-experts architecture). Open-source models range from compact (1-3B parameters, runnable on a laptop) through mid-size (7-13B, requiring a good GPU) to large (70B+, requiring multi-GPU setups or cloud infrastructure). Anthropic's Claude and Google's Gemini models are similar in scale to GPT-4, though exact numbers are not always disclosed.

More parameters enable a model to memorize more facts, handle more nuanced reasoning, and perform well across a wider range of tasks. However, the relationship is not linear — a well-trained 7B model can outperform a poorly trained 70B model, and techniques like distillation can compress large-model performance into smaller architectures. Training methodology, data quality, and post-training alignment matter as much as raw parameter count.

From a practical standpoint, parameter count determines hardware requirements. Each parameter in float16 precision takes 2 bytes of memory, so a 70B model needs at least 140 GB of GPU VRAM just to load the weights — before any computation. Quantization techniques (reducing precision to 8-bit or 4-bit) can halve or quarter these requirements, making larger models accessible on consumer hardware. When evaluating models on GPTCrunch, consider whether the performance gains of a larger model justify the additional infrastructure costs for your use case.

Explore more AI concepts in the glossary

Browse Full Glossary