In Cassandra Lunch #20, we discuss Cassandra read and write paths which is how Cassandra stores and retrieves data at high speeds. We won’t cover how Cassandra replicates data because that is its own subject, but we will take a look at these four sub-topics: Write Path, Update / Delete, Maintenance Path, and Read Path.
Write Path
How is Data Written?
Cassandra processes data at several stages on the write path, starting with the immediate logging of a write and ending in with a write of data to disk:
- Logging data in the commit log
- Writing data to the memtable
- Flushing data from the memtable
- Storing data on disk in SSTables
More info can be found here.
Coordinator
When a request is sent to any Cassandra node, this node acts as a proxy for the application and the nodes involved in the request flow. This proxy node is called as the coordinator. The coordinator is responsible for managing the entire request path and to respond back to the client.
Besides, sometimes when the coordinator forwards a write request to the replica nodes, they may happen to be unavailable at that very moment. In this case, the coordinator plays an important role in implementing a mechanism called Hinted Handoff.
Parallel Sync
Replica 1-N
- Logging data in the commit log
- Writing data to the memtable
Parallel Async
Replica 1-N
- Flushing data from the memtable
- Storing data on disk in SSTables
- All data in the commit log until that point is cleaned
Update / Delete
How is Data Updated?
During a write, Cassandra adds each new row to the database without checking on whether a duplicate record exists. This policy makes it possible for many versions of the same row to exist in the database. For more details about writes, see How is data written?
Periodically, the rows stored in memory are streamed to disk into structures called SSTables. At certain intervals, Cassandra compacts smaller SSTables into larger SSTables. If Cassandra encounters two or more versions of the same row during this process, Cassandra only writes the most recent version to the new SSTable. After compaction, Cassandra drops the original SSTables, deleting the outdated rows.
More info can be found here.
How is Data Deleted?
Cassandra’s processes for deleting data are designed to improve performance and to work with Cassandra’s built-in properties for data distribution and fault-tolerance.
Cassandra treats a delete as an insert or upsert. The data being added to the partition in the DELETE command is a deletion marker called a tombstone. The tombstones go through Cassandra’s write path and are written to SSTables on one or more nodes. The key difference feature of a tombstone: it has a built-in expiration date/time. At the end of its expiration period (for details see below), the tombstone is deleted as part of Cassandra’s normal compaction process.
You can also mark a Cassandra record (row or column) with a time-to-live value. After this amount of time has ended, Cassandra marks the record with a tombstone and handles it like other tombstoned records.
More info can be found here.
How are Indexes Stored and Updated?
Secondary indexes can be built for a column in a table. These indexes are stored locally on each node in a hidden table and built-in a background process. If a secondary index is used in a query that is not restricted to a particular partition key, the query will have prohibitive read latency because all nodes will be queried. A query with these parameters is only allowed if the query option ALLOW FILTERING
is used. This option is not appropriate for production environments. If a query includes both a partition key condition and a secondary index column condition, the query will be successful because the query can be directed to a single node partition.
As with relational databases, keeping indexes up to date uses processing time and resources, so unnecessary indexes should be avoided. When a column is updated, the index is updated as well. If the old column value still exists in the memtable, which typically occurs when updating a small set of rows repeatedly, Cassandra removes the corresponding obsolete index entry; otherwise, the old entry remains to be purged by compaction. If a read sees a stale index entry before compaction purges it, the reader thread invalidates it.
Maintenance Path
How is Data Maintained?
The Cassandra write process stores data in files called SSTables. SSTables are immutable. Instead of overwriting existing rows with inserts or updates, Cassandra writes new timestamped versions of the inserted or updated data in new SSTables. Cassandra does not perform deletes by removing the deleted data: instead, Cassandra marks it with tombstones.
Over time, Cassandra may write many versions of a row in different SSTables. Each version may have a unique set of columns stored with a different timestamp. As SSTables accumulate, the distribution of data can require accessing more and more SSTables to retrieve a complete row.
To keep the database healthy, Cassandra periodically merges SSTables and discards old data. This process is called compaction.
More info can be found here.
Read Path
How is Data Read?
To satisfy a read, Cassandra must combine results from the active memtable and potentially multiple SSTables.Cassandra processes data at several stages on the read path to discover where the data is stored, starting with the data in the memtable and finishing with SSTables:
- Check the memtable
- Check row cache, if enabled
- Checks Bloom filter
- Checks partition key cache, if enabled
- Goes directly to the compression offset map if a partition key is found in the partition key cache, or checks the partition summary if notIf the partition summary is checked, then the partition index is accessed
- Locates the data on disk using the compression offset map
- Fetches the data from the SSTable on disk
How Do Write Patterns Affect Reads?
It is important to consider how the write operations will affect the read operations in the cluster. The type of compaction strategy Cassandra performs on your data is configurable and can significantly affect read performance. Using the SizeTieredCompactionStrategy or DateTieredCompactionStrategy tends to cause data fragmentation when rows are frequently updated. The LeveledCompactionStrategy (LCS) was designed to prevent fragmentation under this condition.
Additional Resources
- How Cassandra reads and writes data | Apache Cassandra 3.0
https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/dml/dmlIntro.html - Cassandra Architecture FTW
https://www.slideshare.net/JeffreyCarpenter/cassandra-architecture-ftw - Introduction to Cassandra
https://www.slideshare.net/gokhanatil/introduction-to-cassandra-88223524 - Introduction to Apache Cassandra
https://www.slideshare.net/knoldus/introduction-to-apache-cassandra-66854683
ICYMI
- Cassandra Lunch #15 – Restore & Backup
- Cassandra Lunch #16 – Cassandra Anti-entropy, Repair & Synchronization
- Cassandra Lunch #17 – Tombstones
- Cassandra Lunch #18 – Connecting Cassandra to Kafka
- Cassandra Lunch #19 – Relational DBs and Cassandra
Cassandra.Link
Cassandra.Link is a knowledge base that we created for all things Apache Cassandra. Our goal with Cassandra.Link was to not only fill the gap of Planet Cassandra, but to bring the Cassandra community together. Feel free to reach out if you wish to collaborate with us on this project in any capacity.
We are a technology company that specializes in building business platforms. If you have any questions about the tools discussed in this post or about any of our services, feel free to send us an email!