MapReduce

Terms from Artificial Intelligence: humans at the heart of algorithms

Page numbers are for draft copy at present; they will be replaced with correct numbers when final book is formatted. Chapter numbers are correct and will not change now.

MapReduce is a distributed computing framework for delaing with big data. It is inspired by the map and reduce functions found in Lisp and other functional programming languages: the map stage initially processes each item of data separately and then the {[reduce}} stage collects and aggregates related intermediate results. However, MapReduce has two crucial additional features : (i) the use of a hash as means to distribute intermediate results over different computers, so redcuing the likelihood of Byzantine conditions; and (ii) means to monitor for failure of individual computers and recvover from this.

each data itemmap hash + processed data
all data for a hashreduce one or more agggregated calculations for the hashed data
(possibly including fresh has for further reduce steps)
.
MapReduce was initially developed by engineers at Google, but has since become part of the open source Apache Apache Hadoop project.

Used on Chap. 8: pages 163, 165, 167, 168, 169, 170, 171, 172, 173; Chap. 23: page 559

MapReduce distributed computation pipeline