sharding

Terms from Artificial Intelligence: humans at the heart of algorithms

Sharding is used in big data applications to divide the data into portions (called shards) so that they can be stored and processed semi-independently. If we think of simple tabel data, the shards may be split along the rows, so that all the data for a single item (row) is always stored together. Alternatively particular columns (item fields/features) may be stored sepaartely, linked by some form of key or identifier. The nature of the sharding affects the efficiency of algorithms, so there is often a desire to put data that needs to be processed together in the same shard. This is particularly difficult for graph data, such as social network data, as the links between items inevitably cut across shards.

Used on pages 162, 163

Also known as shard