sharding

The glossary is being gradually proof checked, but currently has many typos and misspellings.

Sharding is used in big data applications to divide the data into portions (called shards) so that they can be stored and processed semi-independently. If we think of simple table data, the shards may be split along the rows, so that all the data for a single item (row) is always stored together. Alternatively particular columns (item fields/features) may be stored separately, linked by some form of key or identifier. The nature of the sharding affects the efficiency of algorithms, so there is often a desire to put data that needs to be processed together in the same shard. This is particularly difficult for graph data, such as social network data, as the links between items inevitably cut across shards.

Used in Chap. 8: page 111

Also known as shard