{"id":20277,"date":"2013-05-10T12:52:30","date_gmt":"2013-05-10T07:22:30","guid":{"rendered":"http:\/\/vskills.in\/certification\/tutorial\/?p=20277"},"modified":"2024-04-12T14:16:38","modified_gmt":"2024-04-12T08:46:38","slug":"architecture","status":"publish","type":"page","link":"https:\/\/www.vskills.in\/certification\/tutorial\/architecture\/","title":{"rendered":"Architecture"},"content":{"rendered":"<p><a class=\"vsc\" href=\"http:\/\/www.vskills.in\/certification\/Certified-Cassandra-Professional\"><span class=\"vsc-cn\" style=\"text-align: center;\"><span style=\"color: red;\">Certify and Increase Opportunity.<\/span><br \/>\n<span style=\"color: green;\">Be <\/span><br \/>\nGovt. Certified Apache Cassandra Professional<br \/>\n<\/span><\/a><\/p>\n<h3><strong>Architecture<\/strong><\/h3>\n<p><strong>Cassandra Architecture<\/strong><\/p>\n<p>Cassandra forgoes the widely used Master-Slave setup, in favor of a peer-to-peer cluster. This contributes to Cassandra having no single-point-of-failure, as there is no master-server which, when faced with lots of requests or when breaking, would render all of its slaves useless. Any number of commodity servers can be grouped into a Cassandra cluster.<\/p>\n<p>This architecture is a lot more complex to implement behind the scenes, but we won\u2019t have to deal with that. The nice folks working at the Cassandra core bust their heads against the quirks of distributed systems.<\/p>\n<p>Not having to distinguish between a Master and a Slave node allows you to add any number of machines to any cluster in any datacenter, without having to worry about what type of machine you need at the moment. Every server accepts requests from any client. Every server is equal.<\/p>\n<p><span style=\"text-decoration: underline;\"><strong>Architecture Details<\/strong><\/span><\/p>\n<h3 id=\"CAP_theorem\">CAP theorem<\/h3>\n<p>The <strong>CAP<\/strong> theorem (Brewer) states that you have to pick two of <strong>Consistency<\/strong>, <strong>Availability<\/strong>, <strong>Partition tolerance<\/strong>: You can&#8217;t have the three at the same time and get an acceptable latency.<\/p>\n<p>Cassandra values Availability and Partitioning tolerance (<strong>AP<\/strong>). Tradeoffs between consistency and latency are tunable in Cassandra. You can get strong consistency with Cassandra (with an increased latency). But, you can&#8217;t get row locking: that is a definite win for HBase.<\/p>\n<h3 id=\"History_and_approaches\">History and approaches<\/h3>\n<p>Two famous papers<\/p>\n<ul>\n<li>Bigtable: A distributed storage system for structured data, 2006<\/li>\n<li>Dynamo: amazon&#8217;s highly available keyvalue store, 2007<\/li>\n<\/ul>\n<p>Two approaches<\/p>\n<ul>\n<li>Bigtable: &#8220;How can we build a distributed db on top of GFS?&#8221;<\/li>\n<li>Dynamo: &#8220;How can we build a distributed hash table appropriate for the data center?&#8221;<\/li>\n<\/ul>\n<h3 id=\"Cassandra_10.2C000_ft_summary\">Cassandra 10,000 ft summary<\/h3>\n<ul>\n<li>Dynamo partitioning and replication<\/li>\n<li>Log-structured ColumnFamily data model similar to Bigtable&#8217;s<\/li>\n<\/ul>\n<h3 id=\"Cassandra_highlights\">Cassandra highlights<\/h3>\n<ul>\n<li>High availability<\/li>\n<li>Incremental scalability<\/li>\n<li>Eventually consistent<\/li>\n<li>Tunable tradeoffs between consistency and latency<\/li>\n<li>Minimal administration<\/li>\n<li>No SPF (Single Point of Failure)<\/li>\n<\/ul>\n<p>p2p distribution model &#8212; which drives the consistency model &#8212; means there is no single point of failure.<\/p>\n<h2 id=\"Keys_distribution_and_Partition\">Keys distribution and Partition<\/h2>\n<p>Dynamo architecture &amp; Lookup<\/p>\n<p>In a ring of nodes A, B, C, D, E, F and G Nodes B, C and D store keys in the range (<em>a<\/em>,<em>b<\/em>) including key <em>k<\/em><\/p>\n<p>You can decide where the key should go in Cassandra using the <tt>InitialToken<\/tt> parameter for your <tt>Partitioner.<\/tt><\/p>\n<p>Architecture details<\/p>\n<ul>\n<li>O(1) node lookup<\/li>\n<li>Explicit replication<\/li>\n<li>Eventually consistent<\/li>\n<\/ul>\n<h3 id=\"Architecture_layers\">Architecture layers<\/h3>\n<div>\n<table>\n<tbody>\n<tr>\n<td>Core Layer<\/td>\n<td>Middle Layer<\/td>\n<td>Top Layer<\/td>\n<\/tr>\n<tr>\n<td>Messaging service<br \/>\nGossip Failure detection<br \/>\nCluster state<br \/>\nPartitioner<br \/>\nReplication<\/td>\n<td>Commit log<br \/>\nMemtable<br \/>\nSSTable<br \/>\nIndexes<br \/>\nCompaction<\/td>\n<td>Tombstones<br \/>\nHinted handoff<br \/>\nRead repair<br \/>\nBootstrap<br \/>\nMonitoring<br \/>\nAdmin tools<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<h2 id=\"Writes\">Writes<\/h2>\n<p>Any node Partitioner Commitlog, memtable SSTable Compaction Wait for W responses<\/p>\n<p>Write model:<\/p>\n<p>There are two write modes:<\/p>\n<ul>\n<li><em>Quorum write<\/em>: blocks until quorum is reached<\/li>\n<li><em>Async write<\/em>: sends request to any node. That node will push the data to appropriate nodes but return to client immediately<\/li>\n<\/ul>\n<p>If the node is down, then write to another node with a hint saying where it should be written to. Harvester every 15 min goes through and find hints and moves the data to the appropriate node<\/p>\n<h3 id=\"Write_path\">Write path<\/h3>\n<p>At write time,<\/p>\n<ul>\n<li>you first write to a <strong>disk commit log<\/strong> (sequential)<\/li>\n<li>After write to log it is sent to the appropriate nodes<\/li>\n<li>Each node receiving write first records it in a local log, then makes update to appropriate <strong>memtables<\/strong> (one for each column family). A Memtable is Cassandra&#8217;s in-memory representation of key\/value pairs before the data gets flushed to disk as an SSTable.<\/li>\n<li><strong>Memtables<\/strong> are flushed to disk when:\n<ul>\n<li>Out of space<\/li>\n<li>Too many keys (128 is default)<\/li>\n<li>Time duration (client provided \u2013 no cluster clock)<\/li>\n<\/ul>\n<\/li>\n<li>When memtables written out two files go out:\n<ul>\n<li>Data File (<strong>SSTable<\/strong>). A SSTable (terminology borrowed from Google) stands for Sorted Strings Table and is a file of key\/value string pairs, sorted by keys.<\/li>\n<li>Index File (<strong>SSTable Index<\/strong>). (Similar to Hadoop MapFile \/ Tfile)\n<ul>\n<li>(Key, offset) pairs (points into data file)<\/li>\n<li><strong>Bloom filter<\/strong> (all keys in data file). A Bloom filter, is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. False positives are possible, but false negatives are not. Cassandra uses bloom filters to save IO when performing a key lookup: each SSTable has a bloom filter associated with it that Cassandra checks before doing any disk seeks, making queries for keys that don&#8217;t exist almost free. Bloom filters are surprisingly simple: divide a memory area into buckets (one bit per bucket for a standard bloom filter; more -typically four &#8211; for a counting bloom filter). To insert a key, generate several hashes per key, and mark the buckets for each hash. To check if a key is present, check each bucket; if any bucket is empty, the key was never inserted in the filter. If all buckets are non-empty, though, the key is only probably inserted &#8211; other keys&#8217; hashes could have covered the same buckets.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>When a commit log has had all its column families pushed to disk, it is deleted<\/li>\n<li><strong>Compaction<\/strong>: Data files accumulate over time. Periodically data files are merged sorted into a new file (and creates new index)\n<ul>\n<li>Merge keys<\/li>\n<li>Combine columns<\/li>\n<li>Discard tombstones<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h3 id=\"Write_properties\">Write properties<\/h3>\n<ul>\n<li>No reads<\/li>\n<li>No seeks<\/li>\n<li><em>Fast<\/em><\/li>\n<li>Atomic within ColumnFamily<\/li>\n<li>Always writable<\/li>\n<\/ul>\n<h2 id=\"Remove\">Remove<\/h2>\n<p>Deletion marker (tombstone) necessary to suppress data in older SSTables, until compaction Read repair complicates things a little Eventually consistent complicates things more Solution: configurable delay before tombstone GC, after which tombstones are not repaired<\/p>\n<h2 id=\"Read\">Read<\/h2>\n<h3 id=\"Read_path\">Read path<\/h3>\n<ul>\n<li>Any node<\/li>\n<li>Partitioner<\/li>\n<li>Wait for R responses<\/li>\n<li>Wait for N -\u00ad R responses in the background and perform read repair<\/li>\n<\/ul>\n<h3 id=\"Cassandra_read_properties\">Cassandra read properties<\/h3>\n<ul>\n<li>Read multiple SSTables<\/li>\n<li>Slower than writes (but still fast)<\/li>\n<li>Seeks can be mitigated with more RAM<\/li>\n<li>Scales to billions of rows<\/li>\n<\/ul>\n<h2 id=\"Consistency\">Consistency<\/h2>\n<p>Consistency describes how and whether a system is left in a consistent state after an operation. In distributed data systems like Cassandra, this usually means that once a writer has written, all readers will see that write.<\/p>\n<p>On the contrary to the strong consistency used in most relational databases (<strong>ACID<\/strong> for <em>Atomicity Consistency Isolation Durability<\/em>) Cassandra is at the other end of the spectrum (<strong>BASE<\/strong> for <em>Basically Available Soft-state Eventual consistency<\/em>). Cassandra weak consistency comes in the form of eventual consistency which means the database eventually reaches a consistent state. As the data is replicated, the latest version of something is sitting on some node in the cluster, but older versions are still out there on other nodes, but eventually all nodes will see the latest version.<\/p>\n<p>More specifically: R=read replica count W=write replica count N=replication factor Q=<strong>QUORUM<\/strong> (Q = N \/ 2 + 1)<\/p>\n<ul>\n<li>If W + R &gt; N, you will have consistency<\/li>\n<li>W=1, R=N<\/li>\n<li>W=N, R=1<\/li>\n<li>W=Q, R=Q where Q = N \/ 2 + 1<\/li>\n<\/ul>\n<p>Cassandra provides consistency when R + W &gt; N (read replica count + write replica count &gt; replication factor).<\/p>\n<p>You get consistency if R + W &gt; N, where R is the number of records to read, W is the number of records to write, and N is the replication factor. A ConsistencyLevel of ONE means R or W is 1. A ConsistencyLevel of QUORUM means R or W is ceiling((N+1)\/2). A ConsistencyLevel of ALL means R or W is N. So if you want to write with a ConsistencyLevel of ONE and then get the same data when you read, you need to read with ConsistencyLevel ALL.<\/p>\n<div class=\"apply\">\n<h3>Apply for Apache Cassandra Certification Now!!<\/h3>\n<p><a href=\"http:\/\/www.vskills.in\/certification\/Certified-Cassandra-Professional\">http:\/\/www.vskills.in\/certification\/Certified-Apache-Cassandra-Professional<\/a><\/p>\n<h4>Go to Tutorial- <a href=\"https:\/\/www.vskills.in\/certification\/tutorial\/apache-cassandra-professional\/\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>Certified Apache Cassandra Professional Tutorial<\/strong><\/a><\/h4>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Certify and Increase Opportunity. Be Govt. Certified Apache Cassandra Professional Architecture Cassandra Architecture Cassandra forgoes the widely used Master-Slave setup, in favor of a peer-to-peer cluster. This contributes to Cassandra having no single-point-of-failure, as there is no master-server which, when faced with lots of requests or when breaking, would render all of its slaves useless&#8230;.<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"categories":[63],"tags":[],"class_list":["post-20277","page","type-page","status-publish","hentry","category-apache-cassandra"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v24.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Architecture - Tutorial<\/title>\n<meta name=\"description\" content=\"Architecture Vskills Government Certification in Apache Cassandra is very popular in India amongst developers working in the IT Bid data industry.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.vskills.in\/certification\/tutorial\/architecture\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Architecture - Tutorial\" \/>\n<meta property=\"og:description\" content=\"Architecture Vskills Government Certification in Apache Cassandra is very popular in India amongst developers working in the IT Bid data industry.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.vskills.in\/certification\/tutorial\/architecture\/\" \/>\n<meta property=\"og:site_name\" content=\"Tutorial\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/vskills.in\/\" \/>\n<meta property=\"article:modified_time\" content=\"2024-04-12T08:46:38+00:00\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"6 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/architecture\/\",\"url\":\"https:\/\/www.vskills.in\/certification\/tutorial\/architecture\/\",\"name\":\"Architecture - Tutorial\",\"isPartOf\":{\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/#website\"},\"datePublished\":\"2013-05-10T07:22:30+00:00\",\"dateModified\":\"2024-04-12T08:46:38+00:00\",\"description\":\"Architecture Vskills Government Certification in Apache Cassandra is very popular in India amongst developers working in the IT Bid data industry.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/architecture\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.vskills.in\/certification\/tutorial\/architecture\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/architecture\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.vskills.in\/certification\/tutorial\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Architecture\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/#website\",\"url\":\"https:\/\/www.vskills.in\/certification\/tutorial\/\",\"name\":\"Tutorial\",\"description\":\"Vskills - A initiative in elearning and certification\",\"publisher\":{\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.vskills.in\/certification\/tutorial\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/#organization\",\"name\":\"Vskills\",\"url\":\"https:\/\/www.vskills.in\/certification\/tutorial\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.vskills.in\/certification\/tutorial\/wp-content\/uploads\/2017\/07\/vskills-min-logo.jpg\",\"contentUrl\":\"https:\/\/www.vskills.in\/certification\/tutorial\/wp-content\/uploads\/2017\/07\/vskills-min-logo.jpg\",\"width\":73,\"height\":55,\"caption\":\"Vskills\"},\"image\":{\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/vskills.in\/\",\"https:\/\/x.com\/vskills_in\",\"https:\/\/www.linkedin.com\/company-beta\/1371554\/\",\"https:\/\/www.youtube.com\/channel\/UCMWnscxPwRF_PqXo9B7q_Tw\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Architecture - Tutorial","description":"Architecture Vskills Government Certification in Apache Cassandra is very popular in India amongst developers working in the IT Bid data industry.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.vskills.in\/certification\/tutorial\/architecture\/","og_locale":"en_US","og_type":"article","og_title":"Architecture - Tutorial","og_description":"Architecture Vskills Government Certification in Apache Cassandra is very popular in India amongst developers working in the IT Bid data industry.","og_url":"https:\/\/www.vskills.in\/certification\/tutorial\/architecture\/","og_site_name":"Tutorial","article_publisher":"https:\/\/www.facebook.com\/vskills.in\/","article_modified_time":"2024-04-12T08:46:38+00:00","twitter_misc":{"Est. reading time":"6 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.vskills.in\/certification\/tutorial\/architecture\/","url":"https:\/\/www.vskills.in\/certification\/tutorial\/architecture\/","name":"Architecture - Tutorial","isPartOf":{"@id":"https:\/\/www.vskills.in\/certification\/tutorial\/#website"},"datePublished":"2013-05-10T07:22:30+00:00","dateModified":"2024-04-12T08:46:38+00:00","description":"Architecture Vskills Government Certification in Apache Cassandra is very popular in India amongst developers working in the IT Bid data industry.","breadcrumb":{"@id":"https:\/\/www.vskills.in\/certification\/tutorial\/architecture\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.vskills.in\/certification\/tutorial\/architecture\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.vskills.in\/certification\/tutorial\/architecture\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.vskills.in\/certification\/tutorial\/"},{"@type":"ListItem","position":2,"name":"Architecture"}]},{"@type":"WebSite","@id":"https:\/\/www.vskills.in\/certification\/tutorial\/#website","url":"https:\/\/www.vskills.in\/certification\/tutorial\/","name":"Tutorial","description":"Vskills - A initiative in elearning and certification","publisher":{"@id":"https:\/\/www.vskills.in\/certification\/tutorial\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.vskills.in\/certification\/tutorial\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.vskills.in\/certification\/tutorial\/#organization","name":"Vskills","url":"https:\/\/www.vskills.in\/certification\/tutorial\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.vskills.in\/certification\/tutorial\/#\/schema\/logo\/image\/","url":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-content\/uploads\/2017\/07\/vskills-min-logo.jpg","contentUrl":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-content\/uploads\/2017\/07\/vskills-min-logo.jpg","width":73,"height":55,"caption":"Vskills"},"image":{"@id":"https:\/\/www.vskills.in\/certification\/tutorial\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/vskills.in\/","https:\/\/x.com\/vskills_in","https:\/\/www.linkedin.com\/company-beta\/1371554\/","https:\/\/www.youtube.com\/channel\/UCMWnscxPwRF_PqXo9B7q_Tw"]}]}},"_links":{"self":[{"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/pages\/20277","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/comments?post=20277"}],"version-history":[{"count":8,"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/pages\/20277\/revisions"}],"predecessor-version":[{"id":83090,"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/pages\/20277\/revisions\/83090"}],"wp:attachment":[{"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/media?parent=20277"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/categories?post=20277"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/tags?post=20277"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}