Loading Events

« All Events

Automating Data Operations for Apache Cassandra with Apache Airflow

December 7 @ 11:00 am - 12:00 pm

We’ll go over automating Data Operations/Spark Processes with Cassandra with Airflow and provide a hands-on demonstration on Gitpod.

Most Cassandra administrators have to import / export data as part of their Database Administrator role. Being Cassandra Admin, this means at least knowing Spark, DSBulk, etc. Wouldn’t it be cool to automate these processes and allow a self-service option? This talk will go over automating Data Operations / Spark Processes with Cassandra with Airflow and provide a hands-on demonstration on Gitpod with Astra so everyone can try it out.

Take Aways:

  • Learn how Apache Airflow, Apache Spark, and Apache Cassandra can be used together for DataOps
  • Learn how Airflow can wrap complex Import/Export/ETL Spark jobs in a GUI for users
  • Learn how to delete data in Cassandra with Apache Spark
  • Hands-on: Create Tables/Keyspaces in Cassandra/Astra
  • Hands-on: Extract / Load data from a CSV file into Cassandra table
  • Hands-on: Transform data from Cassandra table into another Cassandra table

Ref:

  • https://github.com/Anant/example-cassandra-etl-with-airflow-and-spark
  • https://github.com/Anant/example-cassandra-presto-airflow

Organizer

Anant