{"id":20952,"date":"2013-05-11T11:08:39","date_gmt":"2013-05-11T05:38:39","guid":{"rendered":"http:\/\/vskills.in\/certification\/tutorial\/?p=20952"},"modified":"2024-04-12T14:16:17","modified_gmt":"2024-04-12T08:46:17","slug":"hdfs-4","status":"publish","type":"page","link":"https:\/\/www.vskills.in\/certification\/tutorial\/hdfs-4\/","title":{"rendered":"Hadoop and Mapreduce Tutorial | HDFS"},"content":{"rendered":"\n<h1 class=\"wp-block-heading\">HDFS<\/h1>\n\n\n\n<p><span lang=\"EN-US\">HDFS is a Java-based file system that provides scalable and reliable data storage, and it was designed to span large clusters of commodity servers. HDFS has demonstrated production scalability of up to 200 PB of storage and a single cluster of 4500 servers, supporting close to a billion files and blocks. When that quantity and quality of enterprise data is available in HDFS, and YARN enables multiple data access applications to process it, Hadoop users can confidently answer questions that eluded previous data platforms.<\/span><\/p>\n\n\n\n<p><span lang=\"EN-US\">In HDP 2.2, the rolling upgrade feature and the underlying HDFS High Availability configuration enable Hadoop operators to upgrade the cluster software and restart upgraded services, without taking the entire cluster down.<\/span><\/p>\n\n\n\n<p><span lang=\"EN-US\">The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. It has many similarities with existing distributed file systems. However, the differences from other distributed file systems are significant. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets. It relaxes a few POSIX requirements to enable streaming access to file system data. HDFS was originally built as infrastructure for the Apache Nutch web search engine project. It is now an Apache Hadoop subproject.<\/span><\/p>\n\n\n\n<p><span lang=\"EN-US\">Think of a file that contains the phone numbers for everyone in the United States; the people with a last name starting with A might be stored on server 1, B on server 2, and so on. In a Hadoop world, pieces of this phonebook would be stored across the cluster, and to reconstruct the entire phonebook, your program would need the blocks from every server in the cluster. To achieve availability as components fail, HDFS replicates these smaller pieces onto two additional servers by default. (This redundancy can be increased or decreased on a per-file basis or for a whole environment; for example, a development Hadoop cluster typically doesn\u2019t need any data re\u00addundancy.) This redundancy offers multiple benefits, the most obvious being higher availability.<\/span><\/p>\n\n\n\n<p><span lang=\"EN-US\">In addition, this redundancy allows the Hadoop cluster to break work up into smaller chunks and run those jobs on all the servers in the cluster for better scalability. Finally, you get the benefit of data locality, which is critical when working with large data sets. <\/span><\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Apply for Big Data and Hadoop Developer Certification<\/strong><\/h3>\n\n\n\n<p><a href=\"https:\/\/www.vskills.in\/certification\/certified-big-data-and-apache-hadoop-developer\">https:\/\/www.vskills.in\/certification\/certified-big-data-and-apache-hadoop-developer<\/a><\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><a href=\"https:\/\/www.vskills.in\/certification\/tutorial\/certified-hadoop-mapreduce\/\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>Back to Tutorials<\/strong><\/a><\/h4>\n","protected":false},"excerpt":{"rendered":"<p>HDFS HDFS is a Java-based file system that provides scalable and reliable data storage, and it was designed to span large clusters of commodity servers. HDFS has demonstrated production scalability of up to 200 PB of storage and a single cluster of 4500 servers, supporting close to a billion files and blocks. When that quantity&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"categories":[65],"tags":[],"class_list":["post-20952","page","type-page","status-publish","hentry","category-hadoop-and-mapreduce"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v24.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Hadoop and Mapreduce Tutorial | HDFS<\/title>\n<meta name=\"description\" content=\"HDFS is a Java-based file system that provides scalable and reliable data storage, and it was designed to span large clusters of commodity servers. HDFS...\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.vskills.in\/certification\/tutorial\/hdfs-4\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Hadoop and Mapreduce Tutorial | HDFS\" \/>\n<meta property=\"og:description\" content=\"HDFS is a Java-based file system that provides scalable and reliable data storage, and it was designed to span large clusters of commodity servers. HDFS...\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.vskills.in\/certification\/tutorial\/hdfs-4\/\" \/>\n<meta property=\"og:site_name\" content=\"Tutorial\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/vskills.in\/\" \/>\n<meta property=\"article:modified_time\" content=\"2024-04-12T08:46:17+00:00\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"2 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/hdfs-4\/\",\"url\":\"https:\/\/www.vskills.in\/certification\/tutorial\/hdfs-4\/\",\"name\":\"Hadoop and Mapreduce Tutorial | HDFS\",\"isPartOf\":{\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/#website\"},\"datePublished\":\"2013-05-11T05:38:39+00:00\",\"dateModified\":\"2024-04-12T08:46:17+00:00\",\"description\":\"HDFS is a Java-based file system that provides scalable and reliable data storage, and it was designed to span large clusters of commodity servers. HDFS...\",\"breadcrumb\":{\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/hdfs-4\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.vskills.in\/certification\/tutorial\/hdfs-4\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/hdfs-4\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.vskills.in\/certification\/tutorial\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Hadoop and Mapreduce Tutorial | HDFS\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/#website\",\"url\":\"https:\/\/www.vskills.in\/certification\/tutorial\/\",\"name\":\"Tutorial\",\"description\":\"Vskills - A initiative in elearning and certification\",\"publisher\":{\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.vskills.in\/certification\/tutorial\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/#organization\",\"name\":\"Vskills\",\"url\":\"https:\/\/www.vskills.in\/certification\/tutorial\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.vskills.in\/certification\/tutorial\/wp-content\/uploads\/2017\/07\/vskills-min-logo.jpg\",\"contentUrl\":\"https:\/\/www.vskills.in\/certification\/tutorial\/wp-content\/uploads\/2017\/07\/vskills-min-logo.jpg\",\"width\":73,\"height\":55,\"caption\":\"Vskills\"},\"image\":{\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/vskills.in\/\",\"https:\/\/x.com\/vskills_in\",\"https:\/\/www.linkedin.com\/company-beta\/1371554\/\",\"https:\/\/www.youtube.com\/channel\/UCMWnscxPwRF_PqXo9B7q_Tw\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Hadoop and Mapreduce Tutorial | HDFS","description":"HDFS is a Java-based file system that provides scalable and reliable data storage, and it was designed to span large clusters of commodity servers. HDFS...","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.vskills.in\/certification\/tutorial\/hdfs-4\/","og_locale":"en_US","og_type":"article","og_title":"Hadoop and Mapreduce Tutorial | HDFS","og_description":"HDFS is a Java-based file system that provides scalable and reliable data storage, and it was designed to span large clusters of commodity servers. HDFS...","og_url":"https:\/\/www.vskills.in\/certification\/tutorial\/hdfs-4\/","og_site_name":"Tutorial","article_publisher":"https:\/\/www.facebook.com\/vskills.in\/","article_modified_time":"2024-04-12T08:46:17+00:00","twitter_misc":{"Est. reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.vskills.in\/certification\/tutorial\/hdfs-4\/","url":"https:\/\/www.vskills.in\/certification\/tutorial\/hdfs-4\/","name":"Hadoop and Mapreduce Tutorial | HDFS","isPartOf":{"@id":"https:\/\/www.vskills.in\/certification\/tutorial\/#website"},"datePublished":"2013-05-11T05:38:39+00:00","dateModified":"2024-04-12T08:46:17+00:00","description":"HDFS is a Java-based file system that provides scalable and reliable data storage, and it was designed to span large clusters of commodity servers. HDFS...","breadcrumb":{"@id":"https:\/\/www.vskills.in\/certification\/tutorial\/hdfs-4\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.vskills.in\/certification\/tutorial\/hdfs-4\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.vskills.in\/certification\/tutorial\/hdfs-4\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.vskills.in\/certification\/tutorial\/"},{"@type":"ListItem","position":2,"name":"Hadoop and Mapreduce Tutorial | HDFS"}]},{"@type":"WebSite","@id":"https:\/\/www.vskills.in\/certification\/tutorial\/#website","url":"https:\/\/www.vskills.in\/certification\/tutorial\/","name":"Tutorial","description":"Vskills - A initiative in elearning and certification","publisher":{"@id":"https:\/\/www.vskills.in\/certification\/tutorial\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.vskills.in\/certification\/tutorial\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.vskills.in\/certification\/tutorial\/#organization","name":"Vskills","url":"https:\/\/www.vskills.in\/certification\/tutorial\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.vskills.in\/certification\/tutorial\/#\/schema\/logo\/image\/","url":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-content\/uploads\/2017\/07\/vskills-min-logo.jpg","contentUrl":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-content\/uploads\/2017\/07\/vskills-min-logo.jpg","width":73,"height":55,"caption":"Vskills"},"image":{"@id":"https:\/\/www.vskills.in\/certification\/tutorial\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/vskills.in\/","https:\/\/x.com\/vskills_in","https:\/\/www.linkedin.com\/company-beta\/1371554\/","https:\/\/www.youtube.com\/channel\/UCMWnscxPwRF_PqXo9B7q_Tw"]}]}},"_links":{"self":[{"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/pages\/20952","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/comments?post=20952"}],"version-history":[{"count":6,"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/pages\/20952\/revisions"}],"predecessor-version":[{"id":127266,"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/pages\/20952\/revisions\/127266"}],"wp:attachment":[{"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/media?parent=20952"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/categories?post=20952"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/tags?post=20952"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}