MapReduce is a distributed computing framework for delaing with big data. It is inspired by the map and reduce functions found in Lisp and other functional programming languages: the map stage initially processes each item of data separately and then the {[reduce}} stage collects and aggregates related intermediate results. However, MapReduce has two crucial additional features : (i) the use of a hash as means to distribute intermediate results over different computers, so redcuing the likelihood of Byzantine conditions; and (ii) means to monitor for failure of individual computers and recvover from this.
each data item | → | map | → | hash + processed data |
all data for a hash | → | reduce | → | one or more agggregated calculations for the hashed data (possibly including fresh has for further reduce steps) |
MapReduce was initially developed by engineers at Google, but has since become part of the open source Apache Apache Hadoop project.
Used on Chap. 8: pages 163, 165, 167, 168, 169, 170, 171, 172, 173; Chap. 23: page 559
Links:
ACM Digital Library: MapReduce: simplified data processing on large clusters
hadoop.apache.org: Apache Hadoop
MapReduce distributed computation pipeline