Runbook #4 – Disk and CPU Resource Management on a Cassandra Cluster

$10.00

Apache Cassandra is a horizontally scalable database that can increase or decrease read/write throughput linearly by adding or removing nodes in the cluster. When expecting an increase or decrease in the workload that the cluster is supporting, horizontal scaling is necessary. To check if the cluster is properly sized or requires scaling up or down, it is recommended to monitor metrics such as CPU usage, average read/write throughput, and data volume. Two methods can be used to horizontally scale the cluster: vertical scaling, which adds resources to individual nodes, and horizontal scaling, which adds or removes nodes in the cluster. Horizontal scaling is time-consuming but provides long-term stability, while vertical scaling is suited for short-term stability.

Excerpted from the text –  “Apache Cassandra is a horizontally scalable database in which read\write throughput can both be increased linearly as new machines are added to the cluster. […] You need to horizontally scale up\scale down a Cassandra cluster when you are expecting an increase\decrease in the workload that the cluster is supporting.

  1. Garbage collections are triggered when a JVM’s region is full and it needs some space in memory to continue. These cleanups are ‘stop the world’ events and all Cassandra transactions are suspended while they are running.
  2. A 100-word summary: Batching in Cassandra combines multiple DML statements to achieve atomicity and isolation when targeting a single partition, or only atomicity when targeting multiple partitions. While a well-constructed batch targeting a single partition can reduce client-server traffic and more efficiently update a table with a single row mutation, batch operations that involve multiple nodes are an anti-pattern. Improper usage of large batches by writing to several partitions can create a bottleneck on the coordinator node and lead to slow queries and unresponsive nodes. On the other hand, garbage collection in Java can cause large GC pauses, which can affect read/write latencies in Cassandra. Diagnosing GC pauses involves checking the system.log and gc.log for events longer than a second and using the “nodetool gcstats” command to evaluate garbage collections.”

Questions the Runbook Answers:

  1. What are the symptoms of needing to horizontally scale up or scale down a Cassandra cluster?

  2. What metrics should be monitored to check if a Cassandra cluster is properly sized or if it requires to be scaled up or down?

  3. How can Cassandra clusters be horizontally scaled up or down and what are the differences between the methods?

  4. What is batching in Cassandra?

  5. What are the symptoms of improper usage of large batches in Cassandra?

  6. How can batch statements larger than the batch_size_warn_threshold_in_kb or batch_size_fail_threshold_in_kb be identified?

  7. What is the method to resolve batch warnings/errors in Cassandra?

  8. What are the best practices to tune batch statements in Cassandra?

  9. What is garbage collection in Java?

  10. How can large GC pauses affect Cassandra read/write latencies?

  11. What are the methods to diagnose large GC pauses in Cassandra?

Reviews

There are no reviews yet.

Be the first to review “Runbook #4 – Disk and CPU Resource Management on a Cassandra Cluster”

Your email address will not be published. Required fields are marked *