Site icon Tutorial

Architecture

Certify and Increase Opportunity.
Be
Govt. Certified Apache Cassandra Professional

Architecture

Cassandra Architecture

Cassandra forgoes the widely used Master-Slave setup, in favor of a peer-to-peer cluster. This contributes to Cassandra having no single-point-of-failure, as there is no master-server which, when faced with lots of requests or when breaking, would render all of its slaves useless. Any number of commodity servers can be grouped into a Cassandra cluster.

This architecture is a lot more complex to implement behind the scenes, but we won’t have to deal with that. The nice folks working at the Cassandra core bust their heads against the quirks of distributed systems.

Not having to distinguish between a Master and a Slave node allows you to add any number of machines to any cluster in any datacenter, without having to worry about what type of machine you need at the moment. Every server accepts requests from any client. Every server is equal.

Architecture Details

CAP theorem

The CAP theorem (Brewer) states that you have to pick two of Consistency, Availability, Partition tolerance: You can’t have the three at the same time and get an acceptable latency.

Cassandra values Availability and Partitioning tolerance (AP). Tradeoffs between consistency and latency are tunable in Cassandra. You can get strong consistency with Cassandra (with an increased latency). But, you can’t get row locking: that is a definite win for HBase.

History and approaches

Two famous papers

Two approaches

Cassandra 10,000 ft summary

Cassandra highlights

p2p distribution model — which drives the consistency model — means there is no single point of failure.

Keys distribution and Partition

Dynamo architecture & Lookup

In a ring of nodes A, B, C, D, E, F and G Nodes B, C and D store keys in the range (a,b) including key k

You can decide where the key should go in Cassandra using the InitialToken parameter for your Partitioner.

Architecture details

Architecture layers

Core Layer Middle Layer Top Layer
Messaging service
Gossip Failure detection
Cluster state
Partitioner
Replication
Commit log
Memtable
SSTable
Indexes
Compaction
Tombstones
Hinted handoff
Read repair
Bootstrap
Monitoring
Admin tools

Writes

Any node Partitioner Commitlog, memtable SSTable Compaction Wait for W responses

Write model:

There are two write modes:

If the node is down, then write to another node with a hint saying where it should be written to. Harvester every 15 min goes through and find hints and moves the data to the appropriate node

Write path

At write time,

Write properties

Remove

Deletion marker (tombstone) necessary to suppress data in older SSTables, until compaction Read repair complicates things a little Eventually consistent complicates things more Solution: configurable delay before tombstone GC, after which tombstones are not repaired

Read

Read path

Cassandra read properties

Consistency

Consistency describes how and whether a system is left in a consistent state after an operation. In distributed data systems like Cassandra, this usually means that once a writer has written, all readers will see that write.

On the contrary to the strong consistency used in most relational databases (ACID for Atomicity Consistency Isolation Durability) Cassandra is at the other end of the spectrum (BASE for Basically Available Soft-state Eventual consistency). Cassandra weak consistency comes in the form of eventual consistency which means the database eventually reaches a consistent state. As the data is replicated, the latest version of something is sitting on some node in the cluster, but older versions are still out there on other nodes, but eventually all nodes will see the latest version.

More specifically: R=read replica count W=write replica count N=replication factor Q=QUORUM (Q = N / 2 + 1)

Cassandra provides consistency when R + W > N (read replica count + write replica count > replication factor).

You get consistency if R + W > N, where R is the number of records to read, W is the number of records to write, and N is the replication factor. A ConsistencyLevel of ONE means R or W is 1. A ConsistencyLevel of QUORUM means R or W is ceiling((N+1)/2). A ConsistencyLevel of ALL means R or W is N. So if you want to write with a ConsistencyLevel of ONE and then get the same data when you read, you need to read with ConsistencyLevel ALL.

Exit mobile version