Cassandra SSTables Overview

This post will contain an overview of SSTables in Cassandra. We will look at what they are used for in Cassandra as well as their place in the read and write processes. We will take a look at some utilities that allow us to create Cassandra SSTables and extract data from them in order to see what data is actually stored in the SSTables.

SSTable Overview

First, we need to define what SSTables are and what place they have in the normal functioning of Cassandra. They have been previously mentioned in Apache Cassandra Lunch # 20 on Cassandra Read and Write Paths. Essentially SSTables are Cassandra’s on-disk storage method for data. Unlike Cassandra’s other forms of internal data storage, SSTables are immutable, they don’t get changed after being written. SSTables only get read from and deleted after being created. SSTable means Sorted String Table. They contain hashed string representations of data rows, sorted by a token. Several other files exist that support the Cassandra read process in telling if particular data is in a particular SSTable and in getting that data quickly from the file.

Cassandra SSTable Components

  • Data.db: The actual data, i.e. the contents of rows.
  • Index.db: An index from partition keys to positions in the Data.db file. For wide partitions, this may also include an index to rows within a partition.
  • Summary.db: A sampling of (by default) every 128th entry in the Index.db file.
  • Filter.db: A Bloom Filter of the partition keys in the SSTable.
  • CompressionInfo.db: Metadata about the offsets and lengths of compression chunks in the Data.db file.
  • Statistics.db: Stores metadata about the SSTable, including information about timestamps, tombstones, clustering keys, compaction, repair, compression, TTLs, and more.
  • Digest.crc32: A CRC-32 digest of the Data.db file.
  • TOC.txt: A plain text list of the component files for the SSTable.

Cassandra Reads and SSTables

  • Check the memtable
  • Check row cache, if enabled
  • Checks Bloom filter
  • Checks partition key cache, if enabled
  • Goes directly to the compression offset map if a partition key is found in the partition key cache, or checks the partition summary if not
  • If the partition summary is checked, then the partition index is accessed
  • Locates the data on disk using the compression offset map
  • Fetches the data from the SSTable on disk

In the Cassandra read sequence, SSTables are the location of any data that is not still in the memtable, easily accessible in memory. Since accessing data on disk is slower and more costly than accessing data in memory, some checks are made to see if the data exists within the SSTables at all to avoid searching for that that isn’t there. Then several methods are used to narrow down the location of data in the SSTable’s Data.db file before it fetches that data from the hard disk.

Cassandra Writes and SSTables

  • Logging data in the commit log
  • Writing data to the memtable
  • Flushing data from the memtable
  • Storing data on disk in SSTables

Cassandra processes data at several stages on the write path, starting with the immediate logging of a write and ending in with a write of data to disk. The commit log just logs any data changes in order. Memtables and SSTables store data on a per-table basis. Data that gets written to memtable eventually get flushed to disk and becomes SSTables.

Compaction and Cassandra SSTables

Since SSTables are immutable they don’t grow with new data after being written. This means that a single table may have many SSTables, all storing time updated copies of the same rows. Things like updates, deletions, or expiration of data all end up having the same rows in a number of SSTables. The read process deals with this by comparing the timestamps associated with each update and returning the most up-to-date information. As this process continues reads of that data start to take longer, so as data accumulates it triggers a process called compaction. In compaction SSTables are combined, updates are made to rows that have changes and a single new SSTable is written to disk before its precursors are deleted.

Viewing Cassandra SSTables

Cassandra’s SSTables aren’t actually stored in a human-readable format. They are compressed on disk to save space. We can however use sstabledump to view SSTables in JSON or in a special format more similar to their internal representation. First, we would need to create some SSTables. We do this by creating a keyspace and table and loading some data into it. For this example, we will use some made-up test data from another project. We open the CQL shell and create the keyspace and table and load 24 rows into the table. 

Then we exit the CQL shell and use the command nodetool flush to force the Cassandra cluster to flush data in memtable to disk as an SSTable. Then we need to locate the SSTable on disk. In this case, it is located at /var/lib/cassandra/data/dedupe_test/test_data-2bb75831784311ebb3db9116fc548b6b/ac-1-bti-Data.db. We then run sstabledump with only the SSTable location as an argument in order to view the JSON representation of our file.

We can add the -d option in order to view what is called the internal format. A more concise but less immediately readable format to get another view.

Cassandra.Link

Cassandra.Link is a knowledge base that we created for all things Apache Cassandra. Our goal with Cassandra.Link was to not only fill the gap of Planet Cassandra, but to bring the Cassandra community together. Feel free to reach out if you wish to collaborate with us on this project in any capacity. We are a technology company that specializes in building business platforms. If you have any questions about the tools discussed in this post or about any of our services, feel free to send us an email!