Thursday, January 5, 2023

MapReduce Programming Architecture and flow

1.Input dataset is split into multiple pieces of data (several small sets)
2.Framework creates a master and several worker processes and executes the worker processes remotely
3.Several Map tasks work simultaneously and read pieces of data that were assigned to each map. Map worker uses the
map function to extract only those data that are present on their server and generates key/value pair for the extracted
data.
4.Map worker uses partitioner function to divide the data into regions. Partitioner decides which reducer should get the
output of specified mapper.
5.When the map workers complete their work, the master instructs the reduce workers to begin their work.
6.The reduce workers in turn contact the map workers to get the key/value data for their partition (shuffle). The data thus
received from various mappers is merge sorted as per keys.
7.Then it calls reduce function on every unique key. This function writes output to the file.
8.When all the reduce workers complete their work, the master transfers the control to the user program.






---------------------------------------------------------------------------- 
All the messages below are just forwarded messages if some one feels hurt about it please add your comments we will remove the post. Host/author is not responsible for these posts.

No comments:

Post a Comment