{"id":20954,"date":"2013-05-11T11:08:49","date_gmt":"2013-05-11T05:38:49","guid":{"rendered":"http:\/\/vskills.in\/certification\/tutorial\/?p=20954"},"modified":"2024-04-12T14:16:17","modified_gmt":"2024-04-12T08:46:17","slug":"mapreduce-internals","status":"publish","type":"page","link":"https:\/\/www.vskills.in\/certification\/tutorial\/mapreduce-internals\/","title":{"rendered":"Hadoop &#038; Mapreduce Tutorial | MapReduce Internals &#8211; The Map Phase"},"content":{"rendered":"\n<p>A MapReduce program is composed of four program components<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Mapper<\/li>\n\n\n\n<li>Partitioner<\/li>\n\n\n\n<li>Combiner<\/li>\n\n\n\n<li>Reducer<\/li>\n<\/ul>\n\n\n\n<p>These components execute in a distributed environment in multiple JVM\u2019s. The two JVMs important to a MapReduce developer are the JVM which executes a Mapper and that which executes the Reducer instance. Every component you develop will execute in one of these JVMs. Unless you have turned on JVM reuse, which is not recommended, each Mapper and Reducer instance will execute in its own JVM. The diagram below shows the various components and the nodes in which they execute.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><a ref=\"magnificPopup\" href=\"http:\/\/vskills.in\/certification\/tutorial\/wp-content\/uploads\/2013\/05\/hd1-1.png\"><img loading=\"lazy\" decoding=\"async\" width=\"760\" height=\"590\" src=\"http:\/\/vskills.in\/certification\/tutorial\/wp-content\/uploads\/2013\/05\/hd1-1.png\" alt=\"mapreduce\" class=\"wp-image-55978\" title=\"mapreduce\" srcset=\"https:\/\/www.vskills.in\/certification\/tutorial\/wp-content\/uploads\/2013\/05\/hd1-1.png 760w, https:\/\/www.vskills.in\/certification\/tutorial\/wp-content\/uploads\/2013\/05\/hd1-1-300x233.png 300w, https:\/\/www.vskills.in\/certification\/tutorial\/wp-content\/uploads\/2013\/05\/hd1-1-712x553.png 712w\" sizes=\"auto, (max-width: 760px) 100vw, 760px\" \/><\/a><\/figure>\n\n\n\n<p>The Mapper, Partitioner and the Combiner all execute on the Mapper node in a single JVM designated for the Mapper. This has many implications. Static variables set in one component can be accessed by other components. If you like using the Spring API, you can exchange messages by configuring a set of objects using dependency injection. As we will discuss soon, these components will execute simultaneously during certain periods of time along with other framework threads which will be launched for various house-keeping tasks.<\/p>\n\n\n\n<p>The goal of Mapper instance, along with the corresponding Partitioner and Combiner instances, is to produce partitions (files), one per Reducer.<\/p>\n\n\n\n<p>A MapReduce program has two main phases<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Map<\/li>\n\n\n\n<li>Reduce<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\"><span lang=\"X-NONE\">The Map Phase<\/span><\/h2>\n\n\n\n<p><span lang=\"EN-US\">As a developer they key interaction you will have with the Hadoop framework is when you make the context.write(\u2026) invocation. See an example invocation from a sample WordCountMapper used in the<\/span><\/p>\n\n\n\n<p><span lang=\"EN-US\">public static class WordCountMapper<\/span><\/p>\n\n\n\n<p><span lang=\"EN-US\">extends Mapper&lt;LongWritable, Text, Text, IntWritable&gt; {<\/span><\/p>\n\n\n\n<p><span lang=\"EN-US\">public void map(LongWritable key, Text value, Context context) throws IOException,<\/span><\/p>\n\n\n\n<p><span lang=\"EN-US\">InterruptedException {<\/span><\/p>\n\n\n\n<p><span lang=\"EN-US\">String w = value.toString();<\/span><\/p>\n\n\n\n<p><span lang=\"EN-US\">context.write(new Text(w), new IntWritable(1));<\/span><\/p>\n\n\n\n<p><span lang=\"EN-US\">}<\/span><\/p>\n\n\n\n<p><span lang=\"EN-US\">}<\/span><\/p>\n\n\n\n<figure class=\"wp-block-image\"><a ref=\"magnificPopup\" href=\"http:\/\/vskills.in\/certification\/tutorial\/wp-content\/uploads\/2013\/05\/hd2.png\"><img loading=\"lazy\" decoding=\"async\" width=\"758\" height=\"518\" src=\"http:\/\/vskills.in\/certification\/tutorial\/wp-content\/uploads\/2013\/05\/hd2.png\" alt=\"mapreduce\" class=\"wp-image-55979\" title=\"mapreduce\" srcset=\"https:\/\/www.vskills.in\/certification\/tutorial\/wp-content\/uploads\/2013\/05\/hd2.png 758w, https:\/\/www.vskills.in\/certification\/tutorial\/wp-content\/uploads\/2013\/05\/hd2-300x205.png 300w, https:\/\/www.vskills.in\/certification\/tutorial\/wp-content\/uploads\/2013\/05\/hd2-712x487.png 712w\" sizes=\"auto, (max-width: 758px) 100vw, 758px\" \/><\/a><\/figure>\n\n\n\n<p><span lang=\"EN-US\">&nbsp;<\/span><\/p>\n\n\n\n<p><span lang=\"EN-US\">This is the invocation where the key and value pairs emitted by the Mapper are sent to the Reducer instance, which is on a separate JVM, and typically on a separate node. In this section, we will review what happens behind the scenes when you make this call. Figure below shows what happens on the Mapper node when you invoke this method<\/span><\/p>\n\n\n\n<p>The goal of the Mapper is to produce a partitioned file sorted by the Mapper output keys. The partitioning is with respect to the reducers the keys are meant to be processed by. After the Mapper emits its key and value pair, they are fed to a Partitioner instance that runs in the same JVM as the Mapper. The Partitioner partitions the Mapper output, based on the number of Reducers and any custom partitioning logic. The results are then sorted by the Mapper output key.<\/p>\n\n\n\n<p>At this point, the Combiner is invoked (if a Combiner is configured for the job) on the sorted output. Note that the Combiner is invoked after the Partitioner in the same JVM as the Mapper.<\/p>\n\n\n\n<p>Finally this partitioned, sorted and combined output is spilled to the disk. Optionally the Mapper intermediate outputs can be compressed. Compression reduces the I\/O as these Mapper output files are written the disk on the Mapper node. Compression also reduces the network I\/O as these compressed files are transferred to the reducer nodes.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Apply for Big Data and Hadoop Developer Certification<\/strong><\/h3>\n\n\n\n<p><a href=\"https:\/\/www.vskills.in\/certification\/certified-big-data-and-apache-hadoop-developer\">https:\/\/www.vskills.in\/certification\/certified-big-data-and-apache-hadoop-developer<\/a><\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><a href=\"https:\/\/www.vskills.in\/certification\/tutorial\/certified-hadoop-mapreduce\/\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>Back to Tutorials<\/strong><\/a><\/h4>\n","protected":false},"excerpt":{"rendered":"<p>A MapReduce program is composed of four program components These components execute in a distributed environment in multiple JVM\u2019s. The two JVMs important to a MapReduce developer are the JVM which executes a Mapper and that which executes the Reducer instance. Every component you develop will execute in one of these JVMs. Unless you have&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"categories":[65],"tags":[],"class_list":["post-20954","page","type-page","status-publish","hentry","category-hadoop-and-mapreduce"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v24.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Hadoop &amp; Mapreduce Tutorial | MapReduce Internals - The Map Phase<\/title>\n<meta name=\"description\" content=\"A MapReduce program is composed of four program components:Mapper,Partitioner,Combiner,Reducer. These components execute in a distributed environment in...\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.vskills.in\/certification\/tutorial\/mapreduce-internals\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Hadoop &amp; Mapreduce Tutorial | MapReduce Internals - The Map Phase\" \/>\n<meta property=\"og:description\" content=\"A MapReduce program is composed of four program components:Mapper,Partitioner,Combiner,Reducer. These components execute in a distributed environment in...\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.vskills.in\/certification\/tutorial\/mapreduce-internals\/\" \/>\n<meta property=\"og:site_name\" content=\"Tutorial\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/vskills.in\/\" \/>\n<meta property=\"article:modified_time\" content=\"2024-04-12T08:46:17+00:00\" \/>\n<meta property=\"og:image\" content=\"http:\/\/vskills.in\/certification\/tutorial\/wp-content\/uploads\/2013\/05\/hd1-1.png\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"3 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/mapreduce-internals\/\",\"url\":\"https:\/\/www.vskills.in\/certification\/tutorial\/mapreduce-internals\/\",\"name\":\"Hadoop & Mapreduce Tutorial | MapReduce Internals - The Map Phase\",\"isPartOf\":{\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/mapreduce-internals\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/mapreduce-internals\/#primaryimage\"},\"thumbnailUrl\":\"http:\/\/vskills.in\/certification\/tutorial\/wp-content\/uploads\/2013\/05\/hd1-1.png\",\"datePublished\":\"2013-05-11T05:38:49+00:00\",\"dateModified\":\"2024-04-12T08:46:17+00:00\",\"description\":\"A MapReduce program is composed of four program components:Mapper,Partitioner,Combiner,Reducer. These components execute in a distributed environment in...\",\"breadcrumb\":{\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/mapreduce-internals\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.vskills.in\/certification\/tutorial\/mapreduce-internals\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/mapreduce-internals\/#primaryimage\",\"url\":\"http:\/\/vskills.in\/certification\/tutorial\/wp-content\/uploads\/2013\/05\/hd1-1.png\",\"contentUrl\":\"http:\/\/vskills.in\/certification\/tutorial\/wp-content\/uploads\/2013\/05\/hd1-1.png\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/mapreduce-internals\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.vskills.in\/certification\/tutorial\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Hadoop &#038; Mapreduce Tutorial | MapReduce Internals &#8211; The Map Phase\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/#website\",\"url\":\"https:\/\/www.vskills.in\/certification\/tutorial\/\",\"name\":\"Tutorial\",\"description\":\"Vskills - A initiative in elearning and certification\",\"publisher\":{\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.vskills.in\/certification\/tutorial\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/#organization\",\"name\":\"Vskills\",\"url\":\"https:\/\/www.vskills.in\/certification\/tutorial\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.vskills.in\/certification\/tutorial\/wp-content\/uploads\/2017\/07\/vskills-min-logo.jpg\",\"contentUrl\":\"https:\/\/www.vskills.in\/certification\/tutorial\/wp-content\/uploads\/2017\/07\/vskills-min-logo.jpg\",\"width\":73,\"height\":55,\"caption\":\"Vskills\"},\"image\":{\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/vskills.in\/\",\"https:\/\/x.com\/vskills_in\",\"https:\/\/www.linkedin.com\/company-beta\/1371554\/\",\"https:\/\/www.youtube.com\/channel\/UCMWnscxPwRF_PqXo9B7q_Tw\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Hadoop & Mapreduce Tutorial | MapReduce Internals - The Map Phase","description":"A MapReduce program is composed of four program components:Mapper,Partitioner,Combiner,Reducer. These components execute in a distributed environment in...","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.vskills.in\/certification\/tutorial\/mapreduce-internals\/","og_locale":"en_US","og_type":"article","og_title":"Hadoop & Mapreduce Tutorial | MapReduce Internals - The Map Phase","og_description":"A MapReduce program is composed of four program components:Mapper,Partitioner,Combiner,Reducer. These components execute in a distributed environment in...","og_url":"https:\/\/www.vskills.in\/certification\/tutorial\/mapreduce-internals\/","og_site_name":"Tutorial","article_publisher":"https:\/\/www.facebook.com\/vskills.in\/","article_modified_time":"2024-04-12T08:46:17+00:00","og_image":[{"url":"http:\/\/vskills.in\/certification\/tutorial\/wp-content\/uploads\/2013\/05\/hd1-1.png","type":"","width":"","height":""}],"twitter_misc":{"Est. reading time":"3 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.vskills.in\/certification\/tutorial\/mapreduce-internals\/","url":"https:\/\/www.vskills.in\/certification\/tutorial\/mapreduce-internals\/","name":"Hadoop & Mapreduce Tutorial | MapReduce Internals - The Map Phase","isPartOf":{"@id":"https:\/\/www.vskills.in\/certification\/tutorial\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.vskills.in\/certification\/tutorial\/mapreduce-internals\/#primaryimage"},"image":{"@id":"https:\/\/www.vskills.in\/certification\/tutorial\/mapreduce-internals\/#primaryimage"},"thumbnailUrl":"http:\/\/vskills.in\/certification\/tutorial\/wp-content\/uploads\/2013\/05\/hd1-1.png","datePublished":"2013-05-11T05:38:49+00:00","dateModified":"2024-04-12T08:46:17+00:00","description":"A MapReduce program is composed of four program components:Mapper,Partitioner,Combiner,Reducer. These components execute in a distributed environment in...","breadcrumb":{"@id":"https:\/\/www.vskills.in\/certification\/tutorial\/mapreduce-internals\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.vskills.in\/certification\/tutorial\/mapreduce-internals\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.vskills.in\/certification\/tutorial\/mapreduce-internals\/#primaryimage","url":"http:\/\/vskills.in\/certification\/tutorial\/wp-content\/uploads\/2013\/05\/hd1-1.png","contentUrl":"http:\/\/vskills.in\/certification\/tutorial\/wp-content\/uploads\/2013\/05\/hd1-1.png"},{"@type":"BreadcrumbList","@id":"https:\/\/www.vskills.in\/certification\/tutorial\/mapreduce-internals\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.vskills.in\/certification\/tutorial\/"},{"@type":"ListItem","position":2,"name":"Hadoop &#038; Mapreduce Tutorial | MapReduce Internals &#8211; The Map Phase"}]},{"@type":"WebSite","@id":"https:\/\/www.vskills.in\/certification\/tutorial\/#website","url":"https:\/\/www.vskills.in\/certification\/tutorial\/","name":"Tutorial","description":"Vskills - A initiative in elearning and certification","publisher":{"@id":"https:\/\/www.vskills.in\/certification\/tutorial\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.vskills.in\/certification\/tutorial\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.vskills.in\/certification\/tutorial\/#organization","name":"Vskills","url":"https:\/\/www.vskills.in\/certification\/tutorial\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.vskills.in\/certification\/tutorial\/#\/schema\/logo\/image\/","url":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-content\/uploads\/2017\/07\/vskills-min-logo.jpg","contentUrl":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-content\/uploads\/2017\/07\/vskills-min-logo.jpg","width":73,"height":55,"caption":"Vskills"},"image":{"@id":"https:\/\/www.vskills.in\/certification\/tutorial\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/vskills.in\/","https:\/\/x.com\/vskills_in","https:\/\/www.linkedin.com\/company-beta\/1371554\/","https:\/\/www.youtube.com\/channel\/UCMWnscxPwRF_PqXo9B7q_Tw"]}]}},"_links":{"self":[{"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/pages\/20954","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/comments?post=20954"}],"version-history":[{"count":8,"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/pages\/20954\/revisions"}],"predecessor-version":[{"id":127248,"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/pages\/20954\/revisions\/127248"}],"wp:attachment":[{"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/media?parent=20954"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/categories?post=20954"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/tags?post=20954"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}