Site icon Tutorial

Caching and buffer sizing

Certify and Increase Opportunity.
Be
Govt. Certified Apache Cassandra Professional

Caching and buffer sizing

Tuning Caching in Cassandra

Key Cache
The key cache holds the location of keys in memory on a per-SSTable basis. For column family level read optimizations, turning this value up can have an immediate impact (as soon as the cache warms) when there are large numbers of frequently accessed rows or the size of the columns in the rows makes it impractical to cache the row itself.

Key cache performance can by using nodetool cfstats and examining the reported ‘Key cache hit rate’. JMX/jconsole may also be used similarly.

Row Cache
Unlike the key cache, the row cache holds the entire contents of the row in memory. It is best used when you have a small subset of data to keep hot and you frequently need most or all of the columns returned. For these use cases, row cache can have substantial performance benefits.

Row cache performance can by using nodetool cfstats and examining the reported ‘Row cache hit rate’. JMX/jconsole may also be used similarly.

Estimating Cache Sizes
Either nodetool cfstats or JMX/jconsole can be used to get the necessary information.

To calculate the approximate row cache size, multiply the reported ‘Row cache size’, which is the number of rows in the cache, by the ‘Compacted row mean size’ for every column family and sum them.

To approximate the key cache size, multiply the reported ‘Key cache size’ for each column family by average size of keys for that column family, and sum the results over all column families.

Maximizing Benefit from Cache Memory
Key cache usually provides the most benefit for the least cost. Keys are typically very small compared to row size, so even caching many of them uses little memory. The key cache potentially eliminates one seek per SSTable that needs to be examined during a read, substantially reducing the number of read seeks. Of course, if keys within a column family are accessed uniformly randomly and it is too expensive to keep a large majority of the keys cached, the key cache will not be effective. However, in most applications, the probability of key accesses fits a normal (or Gaussian) distribution, which is conducive to caching. Studying your key access patterns may help you determine if a key cache is appropriate and what size is optimal.

Row cache is typically much more expensive per cached item, as the entire row is cached. For row cache to be effective, a very small set of rows must be very hot; in other words, probability of row access must be a normal distribution with a very low standard deviation. Row cache also provides a greater benefit when the entire row is accessed at once. Keep in mind that if rows are frequently evicted from the row cache, the garbage collector will be under more pressure, a problem the OS buffer cache does not suffer from.

The OS buffer cache can perform a role similar to the row cache, but be more efficient. There is no need for garbage collection with the OS buffer cache, and it tends to be very effective at keeping hot blocks in the cache. Additionally, the buffer cache caches blocks on writes and reads, whereas both the row cache and the key cache only add items to the cache on reads. For these reasons, it is usually a good idea to keep the row cache small or disabled and leave space for the OS buffer cache.

Tuning Buffer Sizing in Cassandra

The buffer sizes represent the memory allocation when performing certain operations.

sliced_buffer_size_in_kb
The buffer size (in kilobytes) to use for reading contiguous columns. This should match the size of the columns typically retrieved using query operations involving a slice predicate.

rpc_recv_buff_size_in_bytes
Sets the receiving socket buffer size for remote procedure calls.

-Datastax
Exit mobile version