This series, data operations guide for Apache Cassandra, will outline the general approaches for data operations in business-critical environments that leverage Cassandra and must maintain high availability in an agile development and continuous delivery environment.
It will also explain the tactical and specific ways to move information on-demand, on-going, and as part of a production migration if needed.
In our experience working on global scale data & analytics platforms, Apache Cassandra or Cassandra-based data platforms like Datastax, we see that our clients don’t have a consistent approach to managing their platform.
This series is an adaptation of our Cassandra Data Operations Guide which we train our team on and share with clients that subscribe to our advisory, hire us to engineer their projects, or manage their platforms using our playbook.
There are various ways to move data in and out of Apache Cassandra. Here are the main ways that we will cover with examples so you can copy and paste your way into Apache Cassandra mastery.
- CQL Copy – How to use CQL Copy to import and export CSV Files with Apache Cassandra
- SSTable Files with SSTableloader – How to use SSTableloader to migrate data
- SSTable Files with Apache Spark – How to use Apache Spark in SparkSQL
- Apache Spark SQL – How to use SQL to move data in Apache Cassandra tables.
- Apache Spark Scripts in Python or Scala – How to use Apache Spark scripts to quickly manipulate and delete data.
- Apache Spark Jobs in Java or Scala – How to use custom Apache Spark jobs to do heavy data management in Apache Cassandra
- Scylla Migrator (pre-compiled Apache Spark Job) – How to use Scylla’s migration tool to move data in Apache Cassandra
- DSBulk with Sed & Awk – How to use DSBulk to extract, transform, and load data in Apache Cassandra
We’re going to cover these topics in no particular order and link them from here. If you want to get updates and eventually the full guide when it’s published, feel free to subscribe to our mailing list here or on our Cassandra.Link site.