overfitting

The glossary is being gradually proof checked, but currently has many typos and misspellings.

Overfitting is when a machine learning or a statistical model starts to model very particular features of the training set rather than the more general features of the underlying populaton or phenomena the training set is intended to be representative of. In the extreme the learning process precisely matches the training examples, but nothing else. This can be intentional, as in the case of memoisation, but usually an accdient due to fitting too many model parameters.

In general there should be far fewer parameters (e.g. weights in a neural network) than features in the trainng data set. For example, if you have a thousand training items with ten features per item, then you should have far less than ten thousand (1000 items x 10 features per item) parameters. By these metrics, some large language models appear to have too many parameters, and indeed do sometimes show features of overfitting such as directly reproducing passages of training text. However, there are arguments that this is not so critical as early layers of deep neural networks are producing large amounts of non-linear diversity, which is then effectvely internally selected and pruned by later layers, especially at pinch points.

Used in Chap. 5: page 67; Chap. 8: page 105; Chap. 9: pages 121, 123, 125; Chap. 16: page 247; Chap. 18: page 280; Chap. 20: page 322; Chap. 21: page 337

Also known as overlearning

overfitting

Terms from Artificial Intelligence: humans at the heart of algorithms

Links: