DeepSeek

The glossary is being gradually proof checked, but currently has many typos and misspellings.

DeepSeek is a Mixture-of-Experts (MoE) language model that achieves performance comparable with other large language models , but using far less computational resources to train* and execute. It was developed in China as a response to US export restrictions on the export of high-end GPU chips, which precluded the brute-force techniques that had dominated the area.

* Note: there is some controversy as to whether DeepSeek was based on distillation from OpenAI models, which would mean its training is effectively making use of an existing large-scale and expensive processes. However, even if this turns out to be the case, it is still more computationally efficient during execution, and has certainly spured other big AI vendors to look at more efficient models.

Used in Chap. 16: page 248; Chap. 23: page 370

Used in glossary entries: distillation, graphics processing unit, large language model, mixture-of-experts, OpenAI

DeepSeek

Terms from Artificial Intelligence: humans at the heart of algorithms

Links: