Site icon Tutorial

MapReduce Work Flow

Here are some of the key concepts related to MapReduce.

The following diagram shows the logical flow of a MapReduce programming model.

The stages depicted above are

Word Count Example

For the purpose of understanding MapReduce, let us consider a simple example. Let us assume that we have a file which contains the following four lines of text.

In this file, we need to count the number of occurrences of each word. For instance, DW appears twice, BI appears once, SSRS appears twice, and so on. Let us see how this counting operation is performed when this file is input to MapReduce.

Below is a simplified representation of the data flow for Word Count Example.

Game Example

Say you are processing a large amount of data and trying to find out what percentage of your user base where talking about games. First, we will identify the keywords which we are going to map from the data to conclude that its something related to games. Next, we will write a mapping function to identify such patterns in our data. For example, the keywords can be Gold medals, Bronze medals, Silver medals, Olympic football, basketball, cricket, etc.

Let us take the following chunks in a big data set and see how to process it.

“Hi, how are you”

“We love football”

“He is an awesome football player”

“Merry Christmas”

“Olympics will be held in China”

“Records broken today in Olympics”

“Yes, we won 2 Gold medals”

“He qualified for Olympics”

Mapping Phase – So our map phase of our algorithm will be as

  1. Declare a function “Map”
  2. Loop: For each words equal to “football”
  3. Increment counter
  4. Return key value “football”=>counter

In the same way, we can define n number of mapping functions for mapping various words: “Olympics”, “Gold Medals”, “cricket”, etc.

Reducing Phase – The reducing function will accept the input from all these mappers in form of key value pair and then processing it. So, input to the reduce function will look like the following:

reduce(“football”=>2)

reduce(“Olympics”=>3)

Our algorithm will continue with the following steps

  1. Declare a function reduce to accept the values from map function.
  2. Where for each key-value pair, add value to counter.
  3. Return “games”=> counter.

At the end, we will get the output like “games”=>5.

Now, getting into a big picture we can write n number of mapper functions here. Let us say that you want to know who all where wishing each other. In this case you will write a mapping function to map the words like “Wishing”, “Wish”, “Happy”, “Merry” and then will write a corresponding reducer function.

Here you will need one function for shuffling which will distinguish between the “games” and “wishing” keys returned by mappers and will send it to the respective reducer function. Similarly you may need a function for splitting initially to give inputs to the mapper functions in form of chunks. The following diagram summarizes the flow of Map reduce algorithm:

In the above map reduce flow

Exit mobile version