- This event has passed.
Automating Data Operations for Apache Cassandra with Apache Airflow
December 7, 2022 @ 11:00 am - 12:00 pm
We’ll go over automating Data Operations/Spark Processes with Cassandra with Airflow and provide a hands-on demonstration on Gitpod.
Most Cassandra administrators have to import / export data as part of their Database Administrator role. Being Cassandra Admin, this means at least knowing Spark, DSBulk, etc. Wouldn’t it be cool to automate these processes and allow a self-service option? This talk will go over automating Data Operations / Spark Processes with Cassandra with Airflow and provide a hands-on demonstration on Gitpod with Astra so everyone can try it out.
Take Aways:
- Learn how Apache Airflow, Apache Spark, and Apache Cassandra can be used together for DataOps
- Learn how Airflow can wrap complex Import/Export/ETL Spark jobs in a GUI for users
- Learn how to delete data in Cassandra with Apache Spark
- Hands-on: Create Tables/Keyspaces in Cassandra/Astra
- Hands-on: Extract / Load data from a CSV file into Cassandra table
- Hands-on: Transform data from Cassandra table into another Cassandra table
Ref:
- https://github.com/Anant/example-cassandra-etl-with-airflow-and-spark
- https://github.com/Anant/example-cassandra-presto-airflow