Cassandra lunch #21 hero

Apache Cassandra Lunch #21: Cassandra Stages / Thread Pools

In Cassandra Lunch #21, we discuss Cassandra and Staged Event-Driven Architecture with an emphasis on Cassandra stages/thread pools. Additionally, there is a video of Cassandra Lunch #21 embedded in this blog as well. Join Cassandra Lunch weekly at 12 PM EST every Wednesday here and check out our youtube channel for past Cassandra Lunches.

In Cassandra Lunch #21, we discuss Cassandra and Staged Event-Driven Architecture with an emphasis on Cassandra stages/thread pools. Cassandra is based on a Staged Event-Driven Architecture, where Cassandra separates different tasks into stages connected by a messaging service and each stage has a queue and a thread pool. Although, some stages skip the messaging service and queue tasks immediately on a different stage when it exists on the same node. Cassandra can back up a queue if the next stage is too busy and lead to performance bottlenecks.

Example of a read request

Here is a list with quick descriptions of the different stages and thread pools we also discussed in Cassandra Lunch #21.

  • AntiEntropyStage
    • Processing repair messages and streaming
  • CacheCleanupExecutor
    • Clearing the cache
  • CommitlogArchiver
    • Copying or archiving commitlog files for recovery
  • CompactionExecutor
    • Running compaction
  • CounterMutationStage
    • Processing local counter changes. Will back up if the write rate exceeds the mutation rate. A high pending count will be seen if the consistency level is set to ONE and there is a high counter increment workload.
  • GossipStage
    • Distributing node information via Gossip. Out of sync schemas can cause issues. You may have to sync using nodetool resetlocalschema
  • HintedHandoff
    • Sending missed mutations to other nodes. Usually symptom of a problem elsewhere. Use nodetool disablehandoff and run repair.
  • InternalResponseStage
    • Responding to non-client initiated messages, including bootstrapping and schema checking
  • MemtableFlushWriter
    • Writing memtable contents to disk. May back up if the queue overruns the disk I/O, or because of sorting processes. WARNING: nodetool tpstats no longer reports blocked threads in the MemtableFlushWriter pool. Check the Pending Flushes metric reported by nodetool tablestats.
  • MemtablePostFlush
    • Cleaning up after flushing the memtable (discarding commit logs and secondary indexes as needed)
  • MemtableReclaimMemory
    • Making unused memory available
  • MigrationStage
    • Processing schema changes
  • MiscStage
    • Snapshotting, replicating data after node remove completed.
  • MutationStage
    • Performing local inserts/updates, schema merges, commit log replays or hints in progress. A high number of Pending write requests indicates the node is having a problem handling them. Fix this by adding a node, tuning hardware and configuration, and/or updating data models.
  • Native-Transport-Requests
    • Processing CQL requests to the server
  • PendingRangeCalculator
    • Calculating pending ranges per bootstraps and departed nodes Reporting by this tool is not useful.
  • ReadRepairStage
    • Performing read repairs. Usually fast, if there is good connectivity between replicas. If Pending grows too large, attempt to lower the rate for high-read tables by altering the table to use a smaller read_repair_chance value, like 0.11.
  • ReadStage
    • Performing local reads. Also includes deserializing data from row cache. Pending values can cause increased read latency. Generally resolved by adding nodes or tuning the system.
  • RequestResponseStage
    • Handling responses from other nodes
  • ValidationExecutor
    • Validating schema

We also covered nodetool tpstats, which provides usage statistics of thread pools. The nodetool tpstats command reports on each stage of Cassandra operations by thread pool:

  • The number of active threads.
  • The number of pending requests waiting to be executed by this thread pool.
  • The number of tasks completed by this thread pool.
  • The number of requests that are currently blocked because the thread pool for the next step in the service is full.
  • The total number of all-time blocked requests, which are all requests blocked in this thread pool up to now.

In addition to nodetool tpstats, check out this blog where you can check out more resources for monitoring Datastax, Cassandra, Spark, & Solr performance.

Again as mentioned above, you can join Cassandra Lunch weekly at 12 PM EST every Wednesday here, and also check out our youtube channel for past Cassandra Lunches!

Cassandra Lunch #21 Recording

Resources

Cassandra.Link

Cassandra.Link is a knowledge base that we created for all things Apache Cassandra. Our goal with Cassandra.Link was to not only fill the gap of Planet Cassandra, but to bring the Cassandra community together. Feel free to reach out if you wish to collaborate with us on this project in any capacity.

We are a technology company that specializes in building business platforms. If you have any questions about the tools discussed in this post or about any of our services, feel free to send us an email!