{"id":8813,"date":"2013-04-10T10:16:50","date_gmt":"2013-04-10T10:16:50","guid":{"rendered":"http:\/\/vskills.in\/certification\/tutorial\/?p=8813"},"modified":"2024-04-12T14:14:30","modified_gmt":"2024-04-12T08:44:30","slug":"need-and-requirement-for-hadoop","status":"publish","type":"page","link":"https:\/\/www.vskills.in\/certification\/tutorial\/need-and-requirement-for-hadoop\/","title":{"rendered":"Hadoop and Mapreduce Tutorial | Need and requirement for Hadoop"},"content":{"rendered":"<h1>Need and requirement for Hadoop<\/h1>\n<p>Hadoop is open-source software that enables reliable, scalable, distributed computing on clusters of inexpensive servers. It has become the technology of choice to support applications that in turn support petabyte-sized analytics utilizing large numbers of computing nodes. It is:<\/p>\n<ul>\n<li>Reliable &#8211; The software is fault tolerant, it expects and handles hardware and software failures<\/li>\n<li>Scalable &#8211; Designed for massive scale of processors, memory, and local attached storage<\/li>\n<li>Distributed &#8211; Handles replication. Offers massively parallel programming model, MapReduce<\/li>\n<\/ul>\n<p>It is designed to process terabytes and even petabytes of unstructured and structured data. It breaks large workloads into smaller data blocks that are distributed across a cluster of commodity hardware for faster processing. It is particularly useful when<\/p>\n<ul>\n<li>Complex information processing is needed<\/li>\n<li>Unstructured data needs to be turned into structured data<\/li>\n<li>Queries can be reasonably expressed using SQL<\/li>\n<li>Heavily recursive algorithms<\/li>\n<li>Complex but parallelizable algorithms needed, like geo-spatial analysis or genome sequencing<\/li>\n<li>Machine learning<\/li>\n<li>Data sets are too large to fit into database RAM, discs, or need too many cores (TB up to PB)<\/li>\n<li>Data value does not justify expense of constant real-time availability, such as archives or special interest info, which can be moved to Hadoop and remain available at lower cost<\/li>\n<li>Results are not needed in real time<\/li>\n<li>Fault tolerance is critical<\/li>\n<li>Significant custom coding would be required to handle job scheduling<\/li>\n<\/ul>\n<p>The problems that it address are need for analytical platforms that can rapidly scale with the following features<\/p>\n<ol>\n<li>Detailed, interactive, multivariate statistical analysis<\/li>\n<li>Aggregation, correlation, and analysis of historical and current data<\/li>\n<li>Modeling and simulation, what-if analysis, and forecasting of alternate future states<\/li>\n<li>Semantic mining of unstructured data, streaming information, and multimedia<\/li>\n<\/ol>\n<p>It helps for<\/p>\n<ul>\n<li>Iterate predictive models more rapidly.<\/li>\n<li>Run models of increasing complexity.<\/li>\n<li>Deliver model-driven decisions to more business processes.<\/li>\n<\/ul>\n<h3>Requirements<\/h3>\n<p>Hadoop clusters have two types of machines: masters (the HDFS NameNode and the MapRe\u00adduce JobTracker) and slaves (the HDFS DataNodes and the MapReduce Task\u00adTrackers). The DataNodes, TaskTrackers, and HBase RegionServers are co-located or co-deployed for optimal data locality.<\/p>\n<p>Slave nodes occupy the majority of the IT hardware infrastructure. Disk space, I\/O Bandwidth and computational power are the crucial factors for hardware sizing.<\/p>\n<p>start with 1U\/machine and use the following recommendations:<\/p>\n<p>Two quad core CPUs, 12 GB to 24 GB memory and Four to six disk drives of 2 terabyte (TB) capacity are usually needed to start the cluster. The minimum requirement for network is 1GigE connecting all nodes to a Gigabit Ethernet switch.<\/p>\n<div class=\"apply\">\n<h3><strong>Apply for Big Data and Hadoop Developer Certification<\/strong><\/h3>\n<p><a href=\"https:\/\/www.vskills.in\/certification\/certified-big-data-and-apache-hadoop-developer\">https:\/\/www.vskills.in\/certification\/certified-big-data-and-apache-hadoop-developer<\/a><\/p>\n<h4><a href=\"https:\/\/www.vskills.in\/certification\/tutorial\/certified-hadoop-mapreduce\/\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>Back to Tutorials<\/strong><\/a><\/h4>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Need and requirement for Hadoop Hadoop is open-source software that enables reliable, scalable, distributed computing on clusters of inexpensive servers. It has become the technology of choice to support applications that in turn support petabyte-sized analytics utilizing large numbers of computing nodes. It is: Reliable &#8211; The software is fault tolerant, it expects and handles&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"categories":[65,1],"tags":[],"class_list":["post-8813","page","type-page","status-publish","hentry","category-hadoop-and-mapreduce","category-uncategorized"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v24.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Hadoop and Mapreduce Tutorial | Need and requirement for Hadoop<\/title>\n<meta name=\"description\" content=\"Hadoop is open-source software that enables reliable, scalable, distributed computing on clusters of inexpensive servers. It has become the technology of...\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.vskills.in\/certification\/tutorial\/need-and-requirement-for-hadoop\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Hadoop and Mapreduce Tutorial | Need and requirement for Hadoop\" \/>\n<meta property=\"og:description\" content=\"Hadoop is open-source software that enables reliable, scalable, distributed computing on clusters of inexpensive servers. It has become the technology of...\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.vskills.in\/certification\/tutorial\/need-and-requirement-for-hadoop\/\" \/>\n<meta property=\"og:site_name\" content=\"Tutorial\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/vskills.in\/\" \/>\n<meta property=\"article:modified_time\" content=\"2024-04-12T08:44:30+00:00\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"2 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/need-and-requirement-for-hadoop\/\",\"url\":\"https:\/\/www.vskills.in\/certification\/tutorial\/need-and-requirement-for-hadoop\/\",\"name\":\"Hadoop and Mapreduce Tutorial | Need and requirement for Hadoop\",\"isPartOf\":{\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/#website\"},\"datePublished\":\"2013-04-10T10:16:50+00:00\",\"dateModified\":\"2024-04-12T08:44:30+00:00\",\"description\":\"Hadoop is open-source software that enables reliable, scalable, distributed computing on clusters of inexpensive servers. It has become the technology of...\",\"breadcrumb\":{\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/need-and-requirement-for-hadoop\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.vskills.in\/certification\/tutorial\/need-and-requirement-for-hadoop\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/need-and-requirement-for-hadoop\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.vskills.in\/certification\/tutorial\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Hadoop and Mapreduce Tutorial | Need and requirement for Hadoop\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/#website\",\"url\":\"https:\/\/www.vskills.in\/certification\/tutorial\/\",\"name\":\"Tutorial\",\"description\":\"Vskills - A initiative in elearning and certification\",\"publisher\":{\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.vskills.in\/certification\/tutorial\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/#organization\",\"name\":\"Vskills\",\"url\":\"https:\/\/www.vskills.in\/certification\/tutorial\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.vskills.in\/certification\/tutorial\/wp-content\/uploads\/2017\/07\/vskills-min-logo.jpg\",\"contentUrl\":\"https:\/\/www.vskills.in\/certification\/tutorial\/wp-content\/uploads\/2017\/07\/vskills-min-logo.jpg\",\"width\":73,\"height\":55,\"caption\":\"Vskills\"},\"image\":{\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/vskills.in\/\",\"https:\/\/x.com\/vskills_in\",\"https:\/\/www.linkedin.com\/company-beta\/1371554\/\",\"https:\/\/www.youtube.com\/channel\/UCMWnscxPwRF_PqXo9B7q_Tw\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Hadoop and Mapreduce Tutorial | Need and requirement for Hadoop","description":"Hadoop is open-source software that enables reliable, scalable, distributed computing on clusters of inexpensive servers. It has become the technology of...","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.vskills.in\/certification\/tutorial\/need-and-requirement-for-hadoop\/","og_locale":"en_US","og_type":"article","og_title":"Hadoop and Mapreduce Tutorial | Need and requirement for Hadoop","og_description":"Hadoop is open-source software that enables reliable, scalable, distributed computing on clusters of inexpensive servers. It has become the technology of...","og_url":"https:\/\/www.vskills.in\/certification\/tutorial\/need-and-requirement-for-hadoop\/","og_site_name":"Tutorial","article_publisher":"https:\/\/www.facebook.com\/vskills.in\/","article_modified_time":"2024-04-12T08:44:30+00:00","twitter_misc":{"Est. reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.vskills.in\/certification\/tutorial\/need-and-requirement-for-hadoop\/","url":"https:\/\/www.vskills.in\/certification\/tutorial\/need-and-requirement-for-hadoop\/","name":"Hadoop and Mapreduce Tutorial | Need and requirement for Hadoop","isPartOf":{"@id":"https:\/\/www.vskills.in\/certification\/tutorial\/#website"},"datePublished":"2013-04-10T10:16:50+00:00","dateModified":"2024-04-12T08:44:30+00:00","description":"Hadoop is open-source software that enables reliable, scalable, distributed computing on clusters of inexpensive servers. It has become the technology of...","breadcrumb":{"@id":"https:\/\/www.vskills.in\/certification\/tutorial\/need-and-requirement-for-hadoop\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.vskills.in\/certification\/tutorial\/need-and-requirement-for-hadoop\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.vskills.in\/certification\/tutorial\/need-and-requirement-for-hadoop\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.vskills.in\/certification\/tutorial\/"},{"@type":"ListItem","position":2,"name":"Hadoop and Mapreduce Tutorial | Need and requirement for Hadoop"}]},{"@type":"WebSite","@id":"https:\/\/www.vskills.in\/certification\/tutorial\/#website","url":"https:\/\/www.vskills.in\/certification\/tutorial\/","name":"Tutorial","description":"Vskills - A initiative in elearning and certification","publisher":{"@id":"https:\/\/www.vskills.in\/certification\/tutorial\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.vskills.in\/certification\/tutorial\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.vskills.in\/certification\/tutorial\/#organization","name":"Vskills","url":"https:\/\/www.vskills.in\/certification\/tutorial\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.vskills.in\/certification\/tutorial\/#\/schema\/logo\/image\/","url":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-content\/uploads\/2017\/07\/vskills-min-logo.jpg","contentUrl":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-content\/uploads\/2017\/07\/vskills-min-logo.jpg","width":73,"height":55,"caption":"Vskills"},"image":{"@id":"https:\/\/www.vskills.in\/certification\/tutorial\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/vskills.in\/","https:\/\/x.com\/vskills_in","https:\/\/www.linkedin.com\/company-beta\/1371554\/","https:\/\/www.youtube.com\/channel\/UCMWnscxPwRF_PqXo9B7q_Tw"]}]}},"_links":{"self":[{"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/pages\/8813","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/comments?post=8813"}],"version-history":[{"count":7,"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/pages\/8813\/revisions"}],"predecessor-version":[{"id":127216,"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/pages\/8813\/revisions\/127216"}],"wp:attachment":[{"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/media?parent=8813"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/categories?post=8813"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/tags?post=8813"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}