It is quite easy to use programming model that supports parallel design since it is very scalable and works in a distributed way. It is also helpful for huge data processing, large scale searching and data analysis within the cloud. It provides related abstraction by a process of “mapper” and “reducer”. The “mapper” is applicable to each input key-value pair trying to come up with an associated absolute range of intermediate key-value pairs. Map: produce a list of ( key , value ) pairs from the input structured as a key( k ) value( v ) pair of a different type i.e. (k1, v1) ? list (k2, v2) The “reducer” is applicable to some or all values related to identifying the intermediate key to come up with output key-value pairs. Reduce: produce a list of values from an input that consists of a key and a list of values associated with that key i.e. (k2, list (v2)) ? list (v2)
MapReduce is having adequate capability to support many real and global algorithms and tasks. It can divide the input data, schedule the execution of programs over a set of machines and handle machine failures.
MapReduce can also handle the inter-machine communication. Map/Reduce is: 1) a Programming model from Lisp and other functional languages; 2) Many problems can be phrased this way; 3) Easy to distribute across nodes; and 4) Nice retry/failure semantics.
MapReduce provides: 1) Automatic parallelization and distribution; 2) Fault tolerance; 3) I/O scheduling; and 4) Monitoring & status updates. The limitations of
MapReduce are: 1) Extremely rigid data flow; 2) Constantly hacked in Join, Union, Split; 3) Common operations must be coded by user; and 4) Semantics hidden inside map-reduce functions, Difficult to maintain, extend, and optimize.
Learn more in:
NoSQL Databases