MapReduce

The glossary is being gradually proof checked, but currently has many typos and misspellings.

MapReduce is a distributed computing framework for dealing with big data. It is inspired by the map and reduce functions found in Lisp and other functional programming languages: the map stage initially processes each item of data separately and then the reduce stage collects and aggregates related intermediate results. However, MapReduce has two crucial additional features : (i) the use of a hash as means to distribute intermediate results over different computers, so reducing the likelihood of Byzantine conditions; and (ii) means to monitor for failure of individual computers and recover from this, so making it overall more robust to failure.

each data item	→	map	→	hash + processed data
all data for a hash	→	reduce	→	one or more agggregated calculations for the hashed data (possibly including fresh hash for further reduce steps)

MapReduce was initially developed by engineers at Google, but has since become part of the open source Apache Apache Hadoop project.

Used in Chap. 8: pages 110, 111, 112, 113, 114, 115, 116; Chap. 23: page 360

Used in glossary entries: Apache Hadoop, big data, Byzantine conditions, Lisp, map, reduce, robust to failure

MapReduce

Terms from Artificial Intelligence: humans at the heart of algorithms

Links: