Saturday, 12 October 2013

Map Reduce : Short Summery


This paper talks about Map Reduce a parallel data processing framework. The framework  is fairly simple and can accommodate wide variety of task and almost all complexity of running code on a distributed environment is hidden from the user. The paper considers all kinds of possible failures can occur and how they have been tackled in the framework.In short its small,simple yet powerful framework cappable of performing task on  large amount of data.

8.Summery:
1)A framework to allow for easy large scale distributed computing . Programmer has to define just two functions map and reduce without bothering about the internal details like load balancing,fault tolerance,data locality etc. these two function runs in a parallel manner. Map function does some data parallel operations and reduce function aggregates the result from map and produce the output.
2) Programming model is simple and yet powerful. A small amount of code can do powerful thing. Some of the application of map reduce are distributed Grep,count of URL access frequency, Reverse Web link graph etc .
3) The Fault tolerance mechanism is quiet effective. A single master manages all the workers. Failure of worker is handled by restarting the worker making it resilient to large scale worker failures. As there is a single master its failure is highly unlikely and map reduce computation is aborted in such case. The papers take care of straggler workers by chopping of the latency tail by starting duplicate tasks at the end of job.
4)Many extensions are provided with the framework to make It more convenient for the user , like the portioning function, combiner function, skipping bad records, counters etc.

9.Limitations :
There are cases where Map reduce programming is not a suitable choice  :
1)Real time processing.
2)Cases where computation of next value takes into account the previous value.
3)If the data size is small and can be computed on single machine ,then it is better to use it as a single map reduce operation.
4)If you don’t want to use JAVA , you need to use streaming API which in itself is complex process.

In short Map reduce is a simple and powerful framework if you are dealing with huge amount of data.

No comments:

Post a Comment