Friday, February 14, 2014

MapReduce in MongoDB

MapReduce is one of the approaches to perform aggregate calculation in MongoDB. The other two methods are Aggregate Pipeline and Single Purpose Aggregation. MapReduce as its name says has two functions – map function and reduce function.

If you are aware of functional programming, the map() function is to segregate the elements from a list or segregate rows from a table by applying sorting and filtering. The main aim of map() function is to take each of the document/item from collection/list and convert them into a key, value pair. Thus mapping each document/item with a {key, value} pair.

The aim of reduce() function is take each key, value pair from map function and consolidate all the values by grouping them by key. Next the user defined logic is applied to the consolidate values of each key.

The life cycle of a mapReduce function is shown here.

To show the life cycle of a mapReduce function let us take an example of collection and see how mapReduce works. How data gets segregated and processed during the life cycle.

The table shows data of employee table. We want to see aggregate total of salary paid by each of the department.

{ "EmpName" : "Sam Pitroda", "Age" : 35, "Salary" : 3126, "Gender" : "M", "Dept" : 10 }
{ "EmpName" : "Bill Rama", "Age" : 48, "Salary" : 2270, "Gender" : "M", "Dept" : 10 }
{ "EmpName" : "Supriay Khanna", "Age" : 32, "Salary" : 3066, "Gender" : "F", "Dept" : 10 }
{ "EmpName" : "Pappu Kaun", "Age" : 24, "Salary" : 4133, "Gender" : "M", "Dept": 10 }
{ "EmpName" : "Akshay Kumar", "Age" : 22, "Salary" : 2651, "Gender" : "M", "Dept" : 10 }

{ "EmpName" : "Anil Shastri", "Age" : 48, "Salary" : 2724, "Gender" : "M", "Dept" : 20 }
{ "EmpName" : "Ajay Khanna", "Age" : 49, "Salary" : 3711, "Gender" : "M", "Dept" : 20 }
{ "EmpName" : "Steve Allan", "Age" : 29, "Salary" : 4391, "Gender" : "M", "Dept" : 20 }
{ "EmpName" : "Jayant Singh", "Age" : 31, "Salary" : 2931, "Gender" : "M", "Dept" : 20 }
{ "EmpName" : "Saurab Khanna", "Age" : 39, "Salary" : 2566, "Gender" : "M", "Dept" : 20 }

{ "EmpName" : "John Butler", "Age" : 45, "Salary" : 3622, "Gender" : "M", "Dept" : 30 }
{ "EmpName" : "Ismail Paun", "Age" : 32, "Salary" : 3608, "Gender" : "M", "Dept" : 30 }
{ "EmpName" : "Rahul Puri", "Age" : 32, "Salary" : 2111, "Gender" : "M", "Dept": 30 }

{ "EmpName" : "Srini Arya", "Age" : 30, "Salary" : 3966, "Gender" : "F", "Dept": 40 }
Input
{ "Salary" : 3126, "Dept" : 10 }
{ "Salary" : 2270, "Dept" : 10 }
{ "Salary" : 3066, "Dept" : 10 }
{ "Salary" : 4133, "Dept": 10 }
{ "Salary" : 2651, "Dept" : 10 }

{ "Salary" : 2724, "Dept" : 20 }
{ "Salary" : 3711, "Dept" : 20 }
{ "Salary" : 4391, "Dept" : 20 }
{ "Salary" : 2931, "Dept" : 20 }
{ "Salary" : 2566, "Dept" : 20 }

{ "Salary" : 3622, "Dept" : 30 }
{ "Salary" : 3608, "Dept" : 30 }
{ "Salary" : 2111, "Dept": 30 }


{ "Salary" : 3966, "Dept": 40 }
Map
{ "key" : 10, Values: (3126, 2270, 3066, 4133, 2651)}
{ "Key" : 20, Values:(2724, 3711, 4391, 2931, 2566,)}
{“Key”:30, Values:( 3622, 3608, 2111)}
{ "Key" : 40, Values:( 3966)}}
Reduce
{ "_id" : 10, "value" : 15246 }
{ "_id" : 20, "value" : 16323 }
{ "_id" : 30, "value" : 9341 }
{ "_id" : 40, "value" : 3966 }
Output

In MongoDB to apply mapReduce() method the syntax is following:


db.collection_name.mapReduce(
map_function() {..., emit(key, value);},
reduce_function(key, value) {..., return (...)},
{out: "output_collection_name"}

To apply mapReduce() to get sum total of Salary of each department we can issue following command on MongoDB shell.

db.employee.mapReduce(
   function(){emit(this.Dept,this.Salary);},
   function(key,values){return Array.sum(values)},
    {
    out:"DeptSalary"
   }
)







Popular Posts

Blog Archive

Real Time Web Analytics