Cassandra’s need and advantages

Certify and Increase Opportunity.
Be
Govt. Certified Apache Cassandra Professional

Apache Cassandra

 The need for Apache Cassandra is felt due to it’s various features which are addressed as –

  • Elastic scalability – Easily scales to terabytes and petabytes of data (even more if you need it). Can handle billions of columns per row and millions of operations per day. Scales down just as readily.
  • Distributed database design with no single point of failure – Uses distributed architecture with no single point of failure (as with traditional master/slave RDBMS and other NoSQL solutions). The result – continuous availability for business-critical big data applications that can’t afford to ever go down.
  • Blistering linear performance – Enables sub-second response times with linear scalability (double your throughput with two nodes, quadruple it with four, and so on) to deliver the blink-of-an-eye speed your customers have come to expect.
  • Flexible, dynamic schema – Easily accommodates the full range of data formats – structured, semi-structured and unstructured – coursing through today’s big data applications. Also dynamically accommodates changes to schema as your data needs evolve.
  • Multiple datacenter and cloud readiness – Gives you maximum flexibility to distribution wherever you need to by replicating easily across multiple datacenters, the cloud and even mixed cloud/on-premise environments.
  • Location independence – Enables data to be read from and written to any node in a database no matter where it happens to be on the planet, a common requirement of big data environments when database clusters often span multiple geographies and datacenters.
  • Tunable data consistency – Provides a means for “tuning” the level of consistency required (from very strong to eventual consistency) for individual use cases and available on a per-operation basis.
  • Basic transaction support – Delivers the “D” in ACID compliance through its use of a commit log to capture all writes and built-in redundancies that ensure data durability in the event of hardware failures, as well as transaction isolation, atomicity, with consistency being tunable.

Various advanatages are provided by Apache Cassandra for big data related processing and querying which usually includes the following

  • Decentralized – Every node in the cluster has the same role. There is no single point of failure. Data is distributed across the cluster (so each node contains different data), but there is no master as every node can service any request.
  • Supports replication and multi data center replication – Replication strategies are configurable. Cassandra is designed as a distributed system, for deployment of large numbers of nodes across multiple data centers. Key features of Cassandra’s distributed architecture are specifically tailored for multiple-data center deployment, for redundancy, for failover and disaster recovery.
  • Scalability – Read and write throughput both increase linearly as new machines are added, with no downtime or interruption to applications.
  • Fault-tolerant – Data is automatically replicated to multiple nodes for fault-tolerance. Replication across multiple data centers is supported. Failed nodes can be replaced with no downtime.
  • Tunable consistency – Writes and reads offer a tunable level of consistency, all the way from “writes never fail” to “block for all replicas to be readable”, with the quorum level in the middle.
  • MapReduce support – Cassandra has Hadoop integration, with MapReduce support. There is support also for Apache Pig and Apache Hive.
  • Query language – CQL (Cassandra Query Language) was introduced, a SQL-like alternative to the traditional RPC interface. Language drivers are available for Java (JDBC), Python (DBAPI2) and Node.JS (Helenus).

Get industry recognized certification – Contact us

Menu