Events : Anant Corporation

ICYMI Anant Corporation’s events are recorded and on YouTube!

Upcoming Events

September

26

2022

Data Engineer’s Lunch #77: Apache Arrow Flight SQL: A Universal Standard for High-Performance Data Transfers from Databases

This talk covers why ODBC & JDBC don’t cut it in today’s data world and the problems solved by Arrow, Arrow Flight, and Arrow Flight SQL. Alex will go through how each of these building blocks works as well as an overview of universal ODBC & JDBC drivers built on Arrow Flight SQL, enabling clients to take advantage of this increased performance with zero application changes.

September

29

2022

Apache Cassandra Lunch #117: 5 Disciplines of a Cassandra Expert

Apache Cassandra is an exceptionally powerful distributed database used by some of the world’s most popular online services. However, the adoption of Cassandra requires some fundamental disciplines to operate it effectively.

Hayato Shimizu is a veteran Cassandra architect having had a hand in some of the world’s largest deployments from media companies to banks.

Hayato will take you through some of the key disciplines you should adopt to ensure Cassandra provides your users and customers the reliability and performance you need.

Past Events

Data Engineer’s Lunch #76: Airflow and Google Dataproc

In Data Engineer’s Lunch #76, Arpan Patel covered on how to connect Airflow and Dataproc with a demo using an Airflow DAG to create a Dataproc cluster, submit an Apache Spark job to Dataproc, and destroy the Dataproc cluster upon completion.

Apache Cassandra Lunch #115: Google Dataproc and DataStax Astra

In Cassandra Lunch #115, Arpan Patel discussed how to connect Google Dataproc and DataStax Astra with a demo showing you what configurations you will need to get the connection working!

Data Engineer’s Lunch #72: Introduction to Apache Pinot

In Data Engineer’s Lunch #72, CEO of Anant, Rahul Singh, has given an overview of the up and coming Apache Pinot project that spun out of LinkedIn and is now being supported by Startree as an enterprise offering. This is first in a series of talks and workshops on why Pinot is important to the future of real-time data

Apache Cassandra Lunch #114: Cassandra Virtual Tables

In Apache Cassandra lunch #114, Dipan Shah discussed virtual Tables in Apache Cassandra 4.0

Data Engineer’s Lunch #75: Real-time change data capture, processing, and ingest into OLTP and OLAP databases

In Data Engineer’s Lunch #75, Eric Sammer, CEO of Decodable, discussed real-time change data capture, processing, and ingest into OLTP and OLAP databases!

Apache Cassandra Lunch #113: ScyllaDB V: NoSQL Innovations for Extreme Scale

With the release of ScyllaDB Open Source 5.0 users have a Raft of new capabilities to manage and scale their NoSQL databases — all puns intended. Discover what’s new, and why industry gamechangers are moving their workloads to ScyllaDB.

Data Engineer’s Lunch #74: Table Format Comparison

In Data Engineer’s Lunch #74, Alex Merced, Developer Advocate for Dremio, discussed the three major data lake table formats – Apache Iceberg, Apache Hudi, and Delta Lake – covering how they work, their features, and their limitations so you can make an informed decision when architecting your data lakehouse.

Apache Cassandra Lunch #112: Azure Cassandra Proxy

In Apache Cassandra Lunch #112, Arpan Patel discussed Azure’s Cassandra Dual Write Proxy

Apache Cassandra Lunch #110: Full Query Logging

In Apache Cassandra Lunch #110, Dipan Shah discussed full query logging

Data Engineer’s Lunch #71: Tools for Cloud Data Engineering

In Data Engineer’s Lunch #71, CEO of Anant, Rahul Singh, discussed tools for cloud data engineering!

Apache Cassandra Lunch #109: DataStax cql-proxy

In Apache Cassandra Lunch #109, Arpan Patel discussed DataStax’s cql-proxy tool and showed how you can use it with DataStax Astra

Data Engineer’s Lunch #70: Apache Iceberg

In Data Engineer’s Lunch #70, watch Alex Merced, Developer Advocate at Dremio, for this webinar to learn the architectural details of why the Hive table format falls short and why the Iceberg table format resolves them, as well as the benefits that stem from Iceberg’s approach.

Apache Cassandra Lunch #108: Developing Enterprise Consciousness with Apache Cassandra

In Apache Cassandra Lunch #108, CEO of Anant, Rahul Singh, discussed on Developing Enterprise Consciousness with Apache Cassandra

Data Engineer’s Lunch #69: Great Expectations for Data Engineering

In Data Engineer’s Lunch #69, Arpan Patel discussed Great Expectations and how it can be used for data engineering. This was part one of a series on Great Expectations and primarily focussed on introducing Great Expectations. Future talks would feature tools like Spark and Airflow in conjunction with Great Expectations!

Apache Cassandra Lunch #107: Guardrails

In Cassandra lunch #107, Dipan Shah discussed how Guardrails works in Apache Cassandra.

Apache Cassandra Lunch #106: SSL with Apache Cassandra

In Cassandra lunch #106, Dipan Shah discussed enabling SSL on an Apache Cassandra cluster.

Data Engineer’s Lunch #68: DevOps Fundamentals

In Data Engineer’s Lunch #68, Will Angel, Technical Product Manager at Caribou Financial, provided an introduction to DevOps practices and tooling including testing, deployment automation, logging, monitoring, and DevOps principles. Additionally, we discussed some of the ways that DevOps for data engineering is different from conventional application development.

Apache Cassandra Lunch #105: Cassandra, Presto, and Airflow

In Cassandra Lunch #104, Arpan Patel discussed how to run read, join, and write queries on Cassandra by Presto orchestrated via Airflow

Apache Cassandra Lunch #104: DataOps – Cleaning Data in Apache Cassandra

In Apache Cassandra Lunch #104, CEO of Anant, Rahul Singh, discussed methods and strategies to manage big data in Apache Cassandra after you’ve got it already stored. We discussed how to delete or apply TTLs after the fact, how to operationalize processes with Apache Airflow and Apache Spark, and how to manage Data hygiene as a strategy so that you’re not stuck with bad data later.

Data Engineer’s Lunch #67: Machine Learning – Feature Selection

In Data Engineer’s Lunch #67, Obioma Anomnachi discussed the process of feature selection as part of a Machine Learning process. Feature selection describes the process of picking particular, relevant data features out of a wider data set, to be used to perform model training.

Apache Cassandra Lunch #103: Cassandra Cluster Architecture in UML and the Azure Digital Twin Domain Language

In Cassandra Lunch #103, Nicholas Brackley discussed how to connect to an Azure Digital Twin resource, view the models in Azure’s environment, and investigate the functions available using the DTDL resources on Azure’s platform

Data Engineer’s Lunch #66: Airflow and Presto

In Data Engineer’s Lunch #66, Arpan Patel will discuss how to connect Airflow and Presto

Apache Cassandra Lunch #102: Choreography vs Orchestration

In Cassandra Lunch #102, Stefan Nikolovski discussed Choreography vs Orchestration / Google Workflows.

Apache Cassandra Lunch #101: IoT and Cassandra

In Apache Cassandra Lunch #101, Obioma Anomnachi discussed the use of Cassandra for IoT (Internet of Things) workloads. We discussed data modeling for IoT, as well as different ways devices might send data back to the cluster.

Apache Cassandra Lunch #100: Cassandra – Where it fits in your Product or Platform

In Cassandra Lunch #100, CEO of Anant, Rahul Singh, discussed which companies currently use Cassandra and what products use it as their backend. We also took a look at how far Cassandra has come and how players like Scylla, Yugabyte, have brought value, and how the Saas and managed service providers can help.

Apache Cassandra Lunch #99: CQL Arithmetic Operators

In Cassandra Lunch #99, Arpan Patel discussed the CQL Arithmetic Operators that are now supported in Cassandra 4.0!

Data Engineer’s Lunch #65: JanusGraph on Jupyter – Using Notebooks with Graph

In Data Engineer’s Lunch #65, Ryan Quey discussed the Graph Notebook tool put out by the AWS team on JanusGraph.

Apache Cassandra Lunch #97: Cassandra on k3s

In Cassandra Lunch #97, Stefan Nikolovski discussed Cassandra on k3s.

Data Engineer’s Lunch #64: Processing Real-time Crypto Transactions

In Data Engineer’s Lunch #64, Eric Stammer, CEO of Decodable, discussed their cloud-based streaming SQL engine and how to mine insights from data in real-time. This is part 2 of a series with DataPM on processing real-time crypto transactions fed by DataPM.

Apache Cassandra Lunch #97: Cassandra DataSource for Grafana

In Apache Cassandra Lunch #97, we discussed using the new Cassandra Datasource for Grafana to visualize any time series data stored in Cassandra.

Data Engineer’s Lunch #63: Building a Cryptocurrency Data Catalogue

In Data Engineer’s Lunch #63, Travis Collins, founder of the open source project DataPM, presented on DataPM, how to get access to cryptocurrency, and blockchain data. This is part 1 of a series with Decodable on processing real-time crypto transactions fed by DataPM.

Apache Cassandra Lunch #96: Apache Cassandra Change Data Capture (CDC) Strategies

In Cassandra Lunch #96, Rahul Singh, CEO of Anant, discussed different ways to get change data into and out of Cassandra using a few different strategies which could work out for your platform.

Cassandra Lunch #95: Spark Graph Operations with DSEGraphFrames Scala API

In Cassandra Lunch #95, Obioma Anomnachi discussed the DSEGraphFrames library which allows Spark to perform operations on graph databases.

Data Engineer’s Lunch #61: Kubevirt

In Data Engineer’s Lunch #61, Stefan Nikolovski discussed Kubevirt.

Cassandra Lunch #94: StreamSets and Cassandra

In Cassandra Lunch #94, Arpan Patel discussed how to connect StreamSets and Cassandra.

Data Engineer’s Lunch #60: Series – Developing Enterprise Consciousness

In Data Engineer’s Lunch #60, CEO of Anant, Rahul Singh, discussed modern data processing / pipeline approaches. High-level overview of different types, frameworks, and workflows in data processing and pipeline design.

K8ssandra on Digital Ocean

In Cassandra Lunch #93, we discussed how to use k8ssandra on Digital Ocean

Data Engineer’s Lunch #59: Spark Tasks and Distribution

In Data Engineer’s Lunch #59, we discussed the way that Spark splits up and distributes work between nodes. We looked at some example code and view in the Spark UI, how it was distributed between nodes.

Cassandra Lunch #92: Securing Apache Cassandra – Managing Roles and Permissions

In Cassandra Lunch #92, CEO of Anant, Rahul Singh, discussed how to design and manage roles and permissions in Apache Cassandra to secure multiple applications and users for a growing platform with new use cases.

Cassandra Lunch #91: Collections in Cassandra

In Cassandra Lunch #91, we discussed the collection types in Cassandra and how the frozen modifier changes the way that Cassandra interacts with them.

Data Engineer’s Lunch #58: InfinyOn

In Data Engineer’s Lunch #58, Sehyo Chang, founder and CTO of InfinityOn, introduced us to Fluvio OSS and InfinityOn

Cassandra Lunch #90: Securing Apache Cassandra

In Cassandra Lunch #90, CEO of Anant, Rahul Singh, discussed different ways to secure Apache Cassandra. This is an overview of the built-in features as well as other options that can be used

Cassandra Lunch #89: Semi-Structured Data in Cassandra

In Cassandra Lunch #89, we discussed how to store and parse semi-structured data in Cassandra using Spark

Data Engineer’s Lunch #57: StreamSets for Data Engineering

In Data Engineer’s Lunch #57, we discussed StreamSets and how it can be used for data engineering.

Cassandra Lunch #88: Cadence

In Cassandra Lunch #88, CEO of Anant, Rahul Singh, discussed how Cadence works on top of Cassandra to provide workflow management at scale and Cadence architecture in the context of SAGA Patterns

Cassandra Lunch #87: Recreating Cassandra.api using Astra and Stargate

In Cassandra Lunch #87, we worked on using AstraDBs included Stargate API layer to substitute for the written Node and Python APIs in our Cassandra.api project.

Cassandra Lunch #86: DataStax Astra Terraform Provider

In Cassandra Lunch #86, we discussed the DataStax Astra Terraform Provider and discuss how it can be used to manage DataStax Astra infrastructure

Data Engineer’s Lunch #56: Spring Cloud Data Flow with Cassandra

In Data Engineer’s Lunch #56, we went over how to integrate Spring Cloud Data Flow with Cassandra.

Cassandra Lunch #85: Top 10 Open-Source Projects Using Cassandra in 2022

In Cassandra Lunch #85, we discussed some of the most popular open-source projects using Cassandra in 2022.

Data Engineer’s Lunch #55: Get Started in Data Engineering

In Data Engineer’s Lunch #55, CEO of Anant, Rahul Singh, will cover 10 resources every data engineer needs to get started or master their game.

Cassandra Lunch #84: Data & Analytics Platform: Cassandra, Spark, Kafka

In Apache Cassandra Lunch #84, the CEO of Anant Rahul Singh will be presenting on Data Platform Design around Cassandra, Spark, and Kafka

Data Engineer’s Lunch #55: dbt and Spark

In Data Engineer’s Lunch #55, we discussed the data build tool, a tool for managing data transformations with config files rather than code. We connected it to Apache Spark and used it to perform transformations..

Apache Cassandra Lunch #83: Aiven Managed Cassandra

In Cassandra Lunch #83, we introduced Aiven’s Managed Cassandra offering and show how we can connect to Aiven with Node.js and CQLSH

Apache Cassandra Lunch #82: Instaclustr Managed Cassandra and Next.js

In Cassandra Lunch #82, we discussed how to set up a Instaclustr managed Cassandra on Next.js

Data Engineer’s Lunch #53: 2021 in Review

In Data Engineer’s Lunch #53, we discussed some of our most popular webinars from 2021 and received feedback from the audience about what they would like to see in 2022.

Apache Cassandra Lunch #81: Redash and Cassandra

In Cassandra Lunch #81, we discussed how we can use Redash to do BI on Cassandra data!

Data Engineer’s Lunch #52: JupyterHub/JupyterLab on Kubernetes

In Data Engineer’s Lunch #52 we showed on how to deploy JupyterHub/JupyterLab on Kubernetes

Apache Cassandra Lunch #80: How to Use Cassandra for Content Management

In Cassandra Lunch #80: we used DataStax Astra as our database to demonstrate content management in Cassandra.

Data Engineer’s Lunch #51: Comparison of Managed Airflow Options

In Data Engineer’s Lunch #51: Guest speaker Andres Namm compared AWS Airflow, GCP Airflow, Astronomer vs. self-managed Airflow.

Apache Cassandra Lunch #79: Cassandra API in Cosmos DB

In Cassandra Lunch #79 we discussed how Cosmos DB compares to Cassandra, by setting up an old project that puts a REST API over a Cassandra table using Cassandra drivers to use data stored in CosmosDB instead. We discussed Cosmos DBs Cassandra API and it’s connections to cqlsh.

Data Engineer’s Lunch #50: Airbyte for Data Engineering

In Data Engineer’s Lunch #50, we introduced Airbyte and discussed how it can be used for data engineering

Apache Cassandra Lunch #78: Deploy Cassandra using DSE Operator to Kubernetes

In Cassandra Lunch #78, we showed on how to deploy Cassandra using DSE Operator to Kubernetes

Data Engineer’s Lunch #49: Meltano for Data Engineering

Data Engineer’s Lunch #49, we introduced Meltano and how it can be used for ELT in data engineering.

Apache Cassandra Lunch #77: Connect to DataStax Astra via Standalone CQLSH

In Cassandra Lunch #77, we showed on how you can connect to your DataStax Astra database using standalone CQLSH.

Data Engineer’s Lunch #48: Veezoo – João Pedro Monteiro

In Data Engineer’s Lunch #48, João Pedro Monteiro (JP), co-founder and CTO of Veezoo, introduced Veezoo and showed how natural language interfaces are the key to enabling data democratization at companies.

Apache Cassandra Lunch #76: Tombstone Mitigation Strategies – Aaron Ploetz

In Cassandra Lunch #76, Aaron Ploetz, Tech Author at DataStax has presented on Tombstone Mitigation Strategies.

Data Engineer’s Lunch #47: Airflow on Kubernetes

In Data Engineer’s Lunch #47, we used Kubernetes to deploy airflow

Apache Cassandra Lunch #75: Getting Started with DataStax Enterprise on Docker

In Cassandra Lunch #75, we looked at getting started with DataStax Enterprises on Docker.

Data Engineer’s Lunch #46: Node.js and API calls

In Data Engineer’s Lunch #46, we discussed the architecture of Node.js and use it to initiate and harvest some data from an API call.

Apache Cassandra Lunch #74: ScyllaDB – Peter Corless

In Cassandra Lunch #74, Technical Marketing Manager at ScyllaDB, Peter Corless, presented on ScyllaDB and some of the advantages of using ScyllaDB over open-source Cassandra.

Data Engineer’s Lunch #45: Apache Livy

In Data Engineer’s Lunch #45, we discussed the use of Apache Livy, which creates a REST API for interacting with Spark.

Apache Cassandra Lunch #73: An Overview and Comparison of Datastax Dependencies for Cassandra, Spark and Graph

In Cassandra Lunch #73, we discussed an overview and comparison of Datastax dependencies for Cassandra, Spark, and Graph.

Data Engineer’s Lunch #44: Prefect

In Data Engineer’s Lunch #44, we discussed Prefect and how it compares to Airflow when scheduling tasks.

Apache Cassandra Lunch #72: Databricks and Cassandra

In Cassandra Lunch #72, we discussed how we can use Databricks with Cassandra.

Data Engineer’s Lunch #43: Bodo.ai – Karthik Narayanan

In Data Engineer’s Lunch #43, Karthik Narayanan Principal Solutions Architect and Bodo.ai has demonstrated what Bodo.ai is and its capabilities.

Apache Cassandra Lunch #71: Creating a User Profile Using DataStax Astra and React

In Cassandra Lunch #71, we discussed how DataStax Astra can be used as a back-end for a React client. We did demo a small application with a user profile.

Apache Cassandra Lunch #70: Basics of Apache Cassandra

In Cassandra Lunch #70, we discussed the Basics of Apache Cassandra and setup a stand-alone Apache Cassandra.

True Cloud Transformation

Anant and DataStax team up for a 3 part workshop series. In Part 3, we discussed how to continue delivering value now that the transition to the cloud is complete. We discussed Astra native protocols, how to leverage Vercel or Netlify to deploy serverless apps, and how to wire up APIs in a low code serverless framework.

Data Engineer’s Lunch #42: Introduction to Databricks

In Data Engineer’s Lunch #42, we introduced Databricks and how it can be used for data engineering.

Apache Cassandra Lunch #69: k8ssandra

In Apache Cassandra Lunch #69, we discussed on getting started with k8ssandra

Migrating Data to the Cloud

Anant and DataStax team up for a 3 part workshop series. In Part 2, we discussed migrating your data to the cloud with zero downtime. Learn about migration patterns for zero downtime. Common tools for ETL/ongoing migration to the cloud. Learn about Airflow and why it is a quintessential tool for data practitioners.

Data Engineer’s Lunch #41: PygramETL

In Data Engineer’s Lunch #41, we discussed pygrametl as part of our discussion of python ETL tools.

Apache Cassandra Lunch #68: DataStax Apache Kafka Connector

In Apache Cassandra Lunch #68, We introduced the DataStax Apache Kafka Connector and discussed how we can use it to connect Apache Kafka and Cassandra

Designing & Planning a Cloud Migration

Anant and DataStax team up for a 3 part workshop series. In Part 1, learn how to transform your legacy platform to use a server-less cloud database & API. Learn to think like an Architect and understand what you are getting into when planning a cloud migration.

Data Engineer’s Lunch #40: Streaming Real Time vs Batch for ETL

In Data Engineer’s Lunch #40: Streaming Real Time vs Batch for ETL, we discussed use cases for using real time stream processing or processing in batches.

Apache Cassandra Lunch #67: Moving Data from Cassandra to Datastax Astra

In Apache Cassandra Lunch #67, we discussed how to move data from Open Source Cassandra to Datastax Astra using dsbulk/scylla migrator.

Data Engineer’s Lunch #39: Dapr Cloud

In Data Engineer’s Lunch #39: Dapr Cloud we discussed how to use Dapr to make a cloud Application

Apache Cassandra Lunch #66: Using DBeaver with Cassandra

In Cassandra Lunch #66, We discussed how DBeaver can be used with a Cassandra Database.

Apache Cassandra Lunch #65: Spark Cassandra Connector Pushdown

In Apache Cassandra Lunch #65 we discussed how the Spark Cassandra Connector pushes some parts of a query down to Cassandra, and what that has to do with normal Spark SQL predicate pushdown.

Data Engineer’s Lunch #37: Pipedream: Serverless Integration and Compute Platform

In Data Engineer’s Lunch #37, we discussed Pipedream, a serverless integration and compute platform that is free for individual developers to use.

Apache Cassandra Lunch #64: Cassandra for .NET Developers

In Cassandra Lunch #64: Cassandra for .NET Developers, Co-founder, Customer Experience Architect, and Sitecore MVP of Anant, Eric Ramseur presented on Cassandra for .NET developers.

Data Engineer’s Lunch #36: Amundsen/DSE + Airflow

In Data Engineer’s Lunch #36, we discussed data discovery with Amundsen.

Apache Cassandra Lunch #63: How To Install Cassandra 4.0 From a Tarball On Linux

In Apache Cassandra Lunch #63, the CEO of Anant, Rahul Singh demonstrated how to install Cassandra 4.0 on Linux from a tarball.

Data Engineer’s Lunch #35: Introduction to Snowflake

In Data Engineer’s Lunch #35, We introduced Snowflake and discuss how it can be used for Data Engineering.

Apache Cassandra Lunch #62: Grafana Dashboard for Apache Cassandra

In Apache Cassandra Lunch #62, we had guest speaker Sarma Pydipally present on the Grafana Dashboard for Cassandra.

Data Engineer’s Lunch #34: DBeaver

In Data Engineer’s Lunch #34: DBeaver, we discuss what DBeaver is and how it can be used in data engineering.

Apache Cassandra Lunch #61: Elassandra

In Apache Cassandra Lunch #61, we discuss different ways of indexing and working with Elassandra.

Data Engineer’s Lunch #33: Using Spark, Cassandra, and Elasticsearch for Data Processing

In Data Engineer’s Lunch #33, we discuss how you can use Spark and Spark jobs to load data from a CSV file, and save + load the data into Cassandra and Elasticsearch.

Apache Cassandra Lunch #60: Apache Cassandra and Apache Nifi

In Apache Cassandra Lunch #60, we discuss how we can use Apache Nifi with Apache Cassandra.

Data Engineer’s Lunch #32: Converting JSON to CSV

In Data Engineer’s Lunch #32, we discussed different ways to convert JSON files into CSV files.

Apache Cassandra Lunch #59: Functions in Cassandra

In Cassandra Lunch #59, we discussed the use of Default functions as well as User Defined Functions (UDFs) in Cassandra and demonstrated some of these functions in action.

Apache Cassandra Lunch #58: Cassandra.Toolkit Tools for Cassandra Titans

In Cassandra Lunch #58, Rahul Singh covered some of the tools available to teams developing their Cassandra Databases to run better, faster, and smarter.

Apache Cassandra Lunch #57: Using Secondary Indexes in Cassandra

In Cassandra Lunch #57, we had guest speaker Anil Mittana present on using Secondary Indexes in Cassandra.

Data Engineer’s Lunch #30: Databand

In Data Engineer’s Lunch #30 we discuss the differences between the open-source and paid versions of Databand and have Databand CEO Josh Benamram walk us through a demo of the paid version.

Apache Cassandra Lunch #56: Using Spark SQL Parquet Tables in DSEFS / DSE Analytics

In Cassandra Lunch #56, we discuss using Spark Parquet tables in DSEFS and DSE Analytics.

Data Engineer’s Lunch #29: Introduction to Apache Nifi

In Data Engineer’s Lunch #29, we introduce Apache Nifi and discuss how we can use it for data engineering.

Apache Cassandra Lunch #55: Migrating PostgreSQL to Cassandra

In Cassandra Lunch #55, we discuss the process and reasons for migrating your database from SQL(PostgreSQL) to NoSQL(Cassandra).

Data Engineer’s Lunch #28: Petl for Data Engineering

In Data Engineer’s Lunch #28, we continue our discussion of Python ETL tools with a more in-depth look at Petl.

Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2

In Cassandra Lunch #54, Nikita Torosyan, a junior engineer at Anant, continued his discussion covering machine learning using Spark and Cassandra.

Data Engineer’s Lunch #27: Data Processing with Containers: Docker & Kubernetes Tools for Data Engineering

In Data Engineer’s Lunch #27, Rahul Singh covered tools used for data processing in containers like Docker and Kubernetes along with many more.

Apache Cassandra Lunch #53: Cassandra ETL with Airflow and Spark

In Cassandra Lunch #53, we discussed how we can set up a Cassandra ETL pipeline using Airflow and Spark.

Data Engineer’s Lunch #26: Akka Actors for Data Processing

In Data Engineer’s Lunch #26, we discussed how to use Akka Actors for concurrent data processing operations.

Apache Cassandra Lunch #51: Cassandra Cluster Design & Architecture

In Cassandra Lunch #51, we discussed an overview of Cassandra cluster architecture, not to be confused with the Cassandra database architecture. Specifically, using Cassandra Datacenters to isolate workloads.

Data Engineer’s Lunch #25: Airflow and Spark

In Data Engineer’s Lunch #25, we discussed how we can use Airflow to schedule Spark jobs.

Apache Cassandra Lunch #50: Machine Learning with Spark + Cassandra

In Apache Cassandra Lunch #50, we discussed how you can use Apache Spark and Apache Cassandra to perform basic Machine Learning tasks.

Data Engineer’s Lunch #24: Pandas for Data Engineering

In Data Engineer’s Lunch #24, we continued our discussion of Python ETL tools with a more in-depth look at Pandas.

Apache Cassandra Lunch #49: Spark SQL for Cassandra Data Operations

In Cassandra Lunch #49, we discussed how to use Spark SQL for Cassandra data operations such as moving data in Apache Cassandra tables.

Data Engineer’s Lunch #23: Thanos/Cortex

In Data Engineer’s Lunch #23, Rahul Singh covered the topics of Thanos and Cortex.

Apache Cassandra Lunch #48: Airflow and Cassandra

In Cassandra Lunch #48, we will discuss using Airflow and Cassandra together. Airflow provides a Cassandra connection type and a Cassandra operator. We will explore what we can do to manage a Cassandra cluster via Airflow.

Data Engineer’s Lunch #22: Prometheus

In this weeks edition of Data Engineer’s Lunch, Guest speaker Will Angel covers the topic of using Prometheus for data engineering. Prometheus is a monitoring system & time series database.

Data Engineer’s Lunch #21: Python ETL Tools

This week, on Data Engineer’s Lunch #21, we will discuss, compare, and contrast a number of ETL tools for Python. We will discuss the usage of base python, python packages, and outside ETL tools for use doing ETL.

Apache Cassandra Lunch #46: Apache Spark Jobs in Scala for Cassandra Data Operations

On this installment of our weekly Cassandra Lunch, we will discuss how we can use Apache Spark jobs written in Scala to do Cassandra data operations, which will include a live walkthrough!

Data Engineer’s Lunch #20: DataOps vs. DevOps

This week, on Data Engineer’s Lunch #20, we covered what DataOps and DevOps are and how they play into data engineering.

Apache Cassandra Lunch #45: Alpakka Cassandra and Twitter

On this installment of our weekly Cassandra Lunch, we will discuss how you can stream tweets using Twitter4S (Scala Twitter client) and save them to Cassandra using Alpakka Cassandra.

Data Engineer’s Lunch #19: Introduction to jq for Data Engineering

This week, on Data Engineer’s Lunch #19, we will introduce jq and how we can use it for data engineering. jq is a command-line tool like sed for JSON data and can be used to slice, filter, map, and transform structured data. If you missed our talk on sed, find it on youtube.com/anantcorp.

Apache Cassandra Lunch #44: Cassandra on Kubernetes Part 2

On this installment of our weekly Cassandra Lunch, we continue our discussion on Cassandra on Kubernetes and Docker, Kubernetes, and Helm.

Data Engineer’s Lunch #18: Luigi For Scheduling

This week, on Data Engineer’s Lunch #18, we will discuss Luigi as a scheduling platform alongside our previous discussions of Jenkins and Airflow. Luigi is a Python package that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization, handling failures, and command line integration.

Apache Cassandra Lunch #43: DSBulk with Sed and Awk

On this installment of our weekly Cassandra Lunch, we will introduce DSBulk or DataStax Bulk Loader, and show how we can use it with tools like sed and awk to do ETL on Cassandra data

Data Engineer’s Lunch #17: NoSQL Part 3: Data Store Types

This week, on Data Engineer’s Lunch #17, we will talk about different NoSQL Data Store Types.

Apache Cassandra Lunch #42: SSTable Files with SSTableloader

On this installment of our weekly Cassandra Lunch, we will cover using SSTableLoader to handle different types of data transfer in Cassandra. SSTable loader works with SSTables directly bypassing the Cassandra read/write processes just like Spark in a previous Cassandra Lunch.

Data Engineer’s Lunch #16: Introduction to awk

This week, on Data Engineer’s Lunch #16, we will introduce awk, a program that you can use to select particular records in a file and perform operations upon them. We will also have a demonstration to show some examples of how we can use a command-line tool like awk.

Apache Cassandra Lunch #41: Cassandra on Kubernetes – Docker/Kubernetes/Helm – Part 1

On this installment of our weekly Cassandra Lunch, we discuss Cassandra on Kubernetes and give introductions to Docker, Kubernetes, and Helm

Data Engineer’s Lunch #15: Introduction to Jenkins

This week, on Data Engineer’s Lunch #15, as a follow up to a recent discussion of Airflow, we will cover the use of Jenkins as a scheduling tool. We will have a general overview of Jenkins capabilities and a comparison of how it stacks up against Airflow as a scheduling tool.

Apache Cassandra Lunch #40: Scylla Migrator for Cassandra Data Operations

On this installment of our weekly Cassandra Lunch, we will introduce the Scylla Migrator and show you how you can move Cassandra data with the Scylla Spark Migrator.

Data Engineer’s Lunch #14: NoSQL Databases Part 2 – CAP Theorem

This week, on Data Engineer’s Lunch #14, we cover the fundamental difference between relational vs most non-relation databases with ACID vs Base.

Data Engineer’s Lunch #13: Introduction to Airflow

This week, on Data Engineer’s Lunch #13, we will cover some resources for getting started with Airflow. Airflow is a python based scheduling tool with the ability to connect to a number of different data management tools. We had an overview recently from Will Angel in Data Engineers Lunch #4. This session will help beginners learn to use Airflow.

Apache Cassandra Lunch #38: Reading Cassandra SSTables in Apache Spark

On this installment of our weekly Cassandra Lunch, we will discuss a new utility that connects Spark and Cassandra. This tool works by reading Cassandra SSTables directly into a Spark context. This is one of the Data Operations in Cassandra mentioned in Cassandra Lunch 35.

Data Engineer’s Lunch #12: Introduction to sed for Data Engineering

At this week’s edition of our Data Engineer’s Lunch, we will introduce sed, a stream editor, for data engineering. A stream editor is used to perform basic text transformations on an input stream (a file or input from a pipeline). We will also have a demonstration to show some examples of how we can use a command-line tool like sed.

Apache Cassandra Lunch #37: CQL Copy for Data Operations

In this week’s edition of our weekly Cassandra Lunch, we will cover a CQL Copy for Data Operations, which includes a demo of how to export Cassandra data as CSV and import data from CSV. We will also touch on things such as JSON.

Apache Cassandra Lunch #36: Specialized Databases On Cassandra

In this week’s edition of our weekly Cassandra Lunch, we cover 18 different databases over 3 categories (Timeseries, Graph, Miscellaneous) that use Apache Cassandra for storage.

Data Engineer’s Lunch #11 – Apache Spark Companion Technologies: MLFlow

Join us for another installment of our weekly Data Engineer’s Lunch, hosted every Monday at 12 PM EST. This week’s edition is Part 5 of our series on technologies that work together with Apache Spark. We will be covering MLFlow, a tool by Databricks for managing and cataloging machine learning workflows. We hope you will be able to join us!

Apache Cassandra Lunch #35: Data Operations with Spark and Cassandra

In this week’s edition of our weekly Cassandra Lunch, we cover a few different data operations, which include: using Spark to take data from one Cassandra table, transforming it, and writing it into another Cassandra table, and deleting data from Cassandra tables using Spark.

Data Engineer’s Lunch #10 – NoSQL Databases – Part 1

In Data Engineer’s Lunch #10: NoSQL – Part 1, we discussed NoSQL datastores. Specifically, we discussed different types of key-value stores.

Apache Cassandra Lunch #34: Liquibase and Cassandra

In this week’s edition of our weekly Cassandra Lunch, we discuss how to use Liquibase with Cassandra and DataStax Astra including a live demo.

Data Engineer’s Lunch #9 – Open Source & Cloud Data Catalogs

In Data Engineer’s Lunch #9: Open Source & Cloud Data Catalogs, we discussed data catalogs, which help users keep track of data.

Apache Cassandra Lunch #33: Cassandra Deployment – Ansible and Terraform with Cassandra

In Cassandra Lunch #33, we will be covering automation for Cassandra deployment. We will discuss using Terraform and Ansible to set up the infrastructure for and handle the provisioning of a new Cassandra cluster.

Data Engineer’s Lunch #8 – SQL Databases Part 2

In Data Engineer’s Lunch #8: SQL Databases part 2, we continued our discussion of relational concepts, popular SQL databases, and advantages and disadvantages. We also discuss Cloud Databases and database tools compatible with SQL databases.

Apache Cassandra Lunch #32: Cassandra Data Operations – Common Ways to Move Data in Cassandra

In Cassandra Lunch #32 we revisit Cassandra Lunch #30, covering the basics of Cassandra Data Operations. We discuss the various ways of moving data into and out of Cassandra clusters.

Data Engineer’s Lunch #7 – SQL Databases

In Data Engineer’s Lunch #7: SQL Databases, we discuss relational concepts including the history of RDBMS, the general need for SQL databases, rules of design, and normalization. We also discuss popular SQL databases, and their advantages and disadvantages.

Apache Cassandra Lunch #31: Business Intelligence with Cassandra

In Apache Cassandra Lunch #31: Business Intelligence with Cassandra, we discuss open-source tools that can be used for BI with Cassandra, which include Metabase, Redash, and Superset.

Data Engineer’s Lunch #6: Common Data Formats Used in Data Engineering

In Data Engineer’s Lunch #6: Common Data Formats Used in Data Engineering, we discuss common data formats used in data engineering including text/file and binary formats.

Apache Cassandra Lunch #30: Cassandra & Spark Foundations

In Cassandra Lunch #30, we cover the basics of using Spark and Cassandra together. We discuss the advantages of each and then cover the advantages and potential drawbacks of using them together.

Data Engineer’s Lunch #5: What is a Data Lake?

In Data Engineer’s Lunch #5, We discuss what data lakes are, why we need them, how we get data in and out, and different implementations of data lakes.

Apache Cassandra Lunch #29: Cassandra & Kubernetes Update

In Apache Cassandra Lunch #29: Cassandra & Kubernetes Update, we cover updates regarding Cassandra and Kubernetes after the recent KubeCon event. We also cover DataStax’s Cass-Operator updates, Orange Telecom’s CassKop updates, and the new K8ssandra.

Data Engineer’s Lunch #4: Airflow for Data Engineering

In Data Engineer’s Lunch #4, guest speaker Will Angel covers the topic of using Airflow for data engineering. Airflow is a scheduling tool for managing data pipelines.

Apache Cassandra Lunch #28: Cassandra Backup/Restore Scenarios

In Apache Cassandra Lunch #28, we discussed some methods for restoring data to a Cassandra cluster. We also covered how factors like the topology of a cluster or the need for constant uptime can affect the backup/restore process.

Data Engineer’s Lunch #3: Scripting / Shell Automation for Data Engineering

In Data Engineer’s Lunch #3, we discuss a multitude of tools you can use to do scripting and shell automation for data engineering. We cover different shells, cron, and various command-line tools with resources and examples.

Apache Cassandra Lunch #27: Cassandra on Baremetal / Virtual Machines / Containers

In Apache Cassandra Lunch #27: Cassandra on Baremetal / Virtual Machines / Containers, we cover different methods in which we can deploy Cassandra whether it be on Baremetal, VMs, or Containers; as well as, pros, cons, and deployment tools.

Data Engineer’s Lunch #2: Common ETL Frameworks

In Data Engineer’s Lunch #2, we discuss common ETL frameworks. We discuss different tools and frameworks for different languages including Python, Java, Scala, .NET, and Node.

Apache Cassandra Lunch #26: Cassandra Troubleshooting with Logs

In Cassandra Lunch #26 we discussed common Cassandra log warnings and errors. We discussed the various resources that a Cassandra cluster needs, and how we can find problems with those resources via the logs generated by the cluster.

Data Engineer’s Lunch #1: Data Engineering Road-map

In the first Data Engineer’s Lunch, we cover the data engineering roadmap. We cover the general path, which includes various technologies for programming, scripting/automation, databases, data processing, scheduling, clouds, and infrastructure.

Apache Cassandra Lunch #25: Cassandra Use Cases – Reference Architectures

In Cassandra Lunch #25: Cassandra Use Cases – Reference Architectures, we cover how Cassandra is used for real-time data platforms; as well as, cover different reference architectures in which Cassandra is and can be used.

Apache Cassandra Lunch #24: Cassandra Use Cases

In Apache Cassandra Lunch #24, we cover different use cases for Cassandra. We discussed a number of use cases for Cassandra, focusing on Cassandra’s place in running a digital business technology platform.

Apache Cassandra Lunch #23: Lucene Based Indexes on Cassandra

In Cassandra Lunch #23: Lucene Based Indexes on Cassandra, we cover packaged and DIY methods for Lucene based indexes on Cassandra; as well as, give some pros and cons for using Lucene Based Indexes on Cassandra.

Apache Cassandra Lunch #22: Cassandra Deployment and Administration Tools

In Cassandra Lunch #22, we cover deployment and administration tools for Cassandra. We also discuss a number of tools for the installation, configuration, monitoring, and administration of Cassandra clusters.

Apache Cassandra Lunch #21: Cassandra Stages / Thread Pools

In Cassandra Lunch #21, we discuss Cassandra and Staged Event Driven Architecture with an emphasis on Cassandra stages / thread pools. Additionally, there is a video of Cassandra Lunch #21 embedded in this blog as well.

Apache Cassandra Lunch #20: Cassandra Read / Write Path

In Cassandra Lunch #20, we discuss Cassandra read and write paths which is how Cassandra stores and retrieves data at high speeds. We won’t cover how Cassandra replicates data because that its own subject but we will take a look at these four sub-topics: Write Path, Update / Delete, Maintenance Path, and Read Path.

Apache Cassandra Lunch #19: Combined Use of Relational Databases and Cassandra

In Cassandra Lunch #19, we cover the combined use of relational databases and Cassandra. We will discuss the advantages of using relational databases and Cassandra separately, before covering the advantages and methods for using both concurrently.

Apache Cassandra Lunch #18: Connecting Cassandra to Kafka

In Cassandra Lunch #18, we have our guest speaker, Ryan Quey, a full-stack data engineer, who specializes in managing and manipulating data at scale and integrating that into apps, from front to back. Ryan discusses a personal project he has been working on called java-podcast-processor, which is a tool to find podcast metadata over an external API, store them, get their RSS feeds, and run ETL using Airflow, Kafka, Spark, and Cassandra. The particular Cassandra distribution used is Elassandra, which allows seamless integration with Elasticsearch. The data is also displayed using a Gatsby app and served using Flask.

Apache Cassandra Lunch #17: Tombstones

In Cassandra Lunch #17, we discussed deletion and tombstones in Cassandra.

Apache Cassandra Lunch #16: Cassandra Anti-Entropy, Repair, and Synchronization

In Cassandra Lunch #16, we discuss Cassandra Anti-entropy which is a process of comparing the data of all replicas and updating each replica to the newest version. We also looked at repair and synchronization in Cassandra and how you can prepare for the unexpected.

Apache Cassandra Lunch #15: Cassandra Backup / Restoration

In Cassandra Lunch #15, we discuss Backup / Restoration. We discuss disaster avoidance, disaster recovery, and different tools that can be used for backup and restoration of your Cassandra data. Also, we discuss an example scenario of how someone has set up multi-node clusters and how they go about data backup and restoration.

Apache Cassandra Lunch #14: Basic Log Diagnostics with ELK/FEK/BEK

In Cassandra Lunch #14, we discuss methods for finding and diagnosing issues in Cassandra clusters.

Apache Cassandra Lunch #13: Cassandra JumpStart Projects

In Cassandra Lunch #13, we go over a number of projects and platforms that you can use to jumpstart your Cassandra projects. They make useful educational resources as well as good starting codebases for new projects. We also discuss a recent article on the Yugabyte blog about Cassandra.

Apache Cassandra Lunch #12: Cassandra & Kubernetes

In Cassandra Lunch #12, we go over Kubernetes, discussing what it is and how it works with Docker and Cassandra. We also looked at some of Kubernetes’ competitors and a variety of open-source tools for Kubernetes which will give you an insight as to why we picked Kubernetes to be a worthwhile investment when working with databases.

Apache Cassandra Lunch #11: Different Cassandra Distributions and Variants

In Cassandra Lunch #11, we discuss various Cassandra distributions ranging from Cassandra / Cassandra Compliant Databases on JVM, Cassandra Compliant Databases on C++, Cassandra as a Service / Managed Cassandra Based on Open Source Cassandra, and Cassandra as a Service / Managed Cassandra Based on Proprietary Technology.

Apache Cassandra Lunch #10: Cassandra 4.0

In Cassandra Lunch #10, we dive into the upcoming features of Cassandra 4.0.

WEBINAR: Databricks and DataStax Astra: Running a Databricks Notebook Against Astra

This webinar is Part 2 of our Databricks and DataStax Astra series. In Part 1, we discussed how to connect Databricks and DataStax Astra using the Databricks Community Edition. In Part 2, we will expand upon Part 1 and take a deeper look at Databricks notebooks and features. We will create a notebook that will extract data from our Astra database, transform it, and write it back into Astra while also exploring features that Databricks provides in the Community Edition.

WEBINAR: Databricks and DataStax Astra: Connecting Databricks and DataStax Astra

WEBINAR: Databricks and DataStax Astra: Connecting Databricks and DataStax Astra | Tuesday December 22 at 6:00 PM | Online Event

This webinar is Part 1 of our Databricks and DataStax Astra series. In Part 1, we will discuss how to connect Databricks and DataStax Astra using the Databricks Community Edition, and also verify the connection with a quick check using a Databricks notebook. We will discuss the nuances that occur in connecting Databricks and DataStax Astra that you may not encounter when connecting open-source Apache Spark and DataStax Astra, so be sure to tune in.

WEBINAR: Apache Spark Companion Technologies: Distributed ML Frameworks

WEBINAR: Apache Spark Companion Technologies: Distributed ML Frameworks | Thursday December 17 at 10:30 AM | Online Event

This webinar is Part 4 in our series on technologies that work together with Apache Spark. We will be covering distributed versions of popular machine learning libraries. Databricks offers a way to make use of Spark’s distributed computing capabilities alongside familiar machine learning libraries, and we will try to connect those with base Apache Spark instead.

WEBINAR: Apache Spark Companion Technologies: Data Lakes

WEBINAR: Apache Spark Companion Technologies: Data Lakes | Tuesday December 15 at 10:30 AM | Online Event

In the third part of our discussion of technologies that work with Apache Spark, we will be covering data lakes. Data lakes are for long term data storage. Databricks offers data lake integration as well as Delta Lake, a utility for managing connections to data lakes. In part 2, we discussed connecting spark to notebooks. Part 1 was an overview of the different Databricks features that we will try to replicate in this series.

WEBINAR: Counting in Datastax Enterprise

WEBINAR: Counting in Datastax Enterprise | Monday November 30 at 12:00 PM | Online Event

In this webinar we will discuss a number of different ways to get row counts in Datastax Enterprise. We will also review inbuilt aggregation as well as Spark SQL integrations, as well as potential counters and UDAs. This webinar will include a short demo of this material. We hope to see you there!

WEBINAR: Running a Spark Job on DataStax Astra

WEBINAR: Running a Spark Job on DataStax Astra | Tuesday November 24 at 6:00 PM | Online Event

Following our previous webinar in this series, Connecting Apache Spark and DataStax Astra, we will be discussing how to run a Spark job on DataStax Astra with a live demo using Gitpod, Spark-Submit, SBT, and Scala. We hope to see you there!

WEBINAR: Apache Spark Companion Technologies: Notebooks

WEBINAR: Apache Spark Companion Technologies: Notebooks | Thursday November 19 at 6:00 PM | Online Event

In this webinar, we will discuss connecting to Apache Spark using notebooks. This is part of our series on attempting to replicate the functionality of Databricks using open source technologies. This webinar will contain a discussion of notebook tools and also some discussion of PySpark and the functionality of Spark Core. We hope to see you there!

WEBINAR: Connecting Apache Spark and DataStax Astra

WEBINAR: Connecting Apache Spark and DataStax Astra | Tuesday November 17 at 6:00 PM | Online Event

Join us for a live demo where we will review how to connect Apache Spark and DataStax Astra using Gitpod and Spark-Shell.

WEBINAR: Databricks and Apache Spark

WEBINAR: Databricks and Apache Spark | Thursday October 29 at 6:00 PM | Online Event

Databricks is a cloud analytics platform built around Apache Spark. It provides many resources for data exploration and management. As well as providing access to Apache Sparks Spark Streaming and MLLib capabilities Databricks notebook functionality also provides access through Spark’s interfaces with various programming languages, including R, Python, and SQL. In this webinar, we will discuss some of the functionality that Databricks provides and how that compares to available Apache Spark libraries or cluster functionality.

WEBINAR: Exploring DataStax Astra’s GraphQL API

WEBINAR: Exploring DataStax Astra’s GraphQL API | Tuesday October 27 at 6:00 PM | Online Event

In this webinar, we will introduce and explore DataStax Astra’s GraphQL API. The DataStax Astra GraphQL API allows us to easily interact with our data using GraphQL types, queries, and mutations. For every table in our keyspace, a series of GraphQL objects are generated, along with queries and mutations that allow us to search and modify the table data.

WEBINAR: Exploring DataStax Astra’s REST API

WEBINAR: Exploring DataStax Astra’s REST API | Tuesday October 20 at 6:00 PM | Online Event

In this webinar, we will introduce and explore DataStax Astra’s REST API. We can use the REST API that Astra provides to create, update, and delete tables, columns, and rows in our Astra keyspace.

#Boss: Going It Alone in Data Science

#Boss: Going It Alone in Data Science | Thursday October 15 at 5:30 PM | Online Event

Have you considered striking out on your own as an independent consultant, solopreneur, or founder? Join us during Data Week DC 2020 (https://www.dataweekdc.com/) to learn from four Data Community DC leaders who answered this question with a resounding yes! They’ll share their origin stories, professional highlights, lessons learned and advice for those considering taking the leap—including how DC2 can be a tool in your arsenal as you build your business and brand.

WEBINAR: Monitoring Cassandra with DSE OpsCenter

WEBINAR: Monitoring Cassandra with DSE OpsCenter | Thursday October 15 at 6:00 PM | Online Event

DSE OpsCenter monitors and manages instances of Datastax Enterprise. It is composed of two parts OpsCenter Monitoring and OpsCenter Lifecycle Manager. In this webinar we will cover some of the monitoring tools provided by OpsCenter Monitor including the monitoring dashboard, alerts, and historical metrics.

WEBINAR: Cassandra as a Service in the Cloud: Astra, Keyspaces, and Cosmos DB

WEBINAR: Cassandra as a Service in the Cloud: Astra, Keyspaces, and Cosmos DB | Thursday September 24 at 6:00 PM | Online Event

In this webinar, we will discuss 3 different ways you can do Cassandra in the cloud. We will introduce and discuss the differences between DataStax Astra, AWS Keyspaces, and Microsoft Cosmos DB. We will also demo how to get started with these services if you want to learn how to quickly spin up a database in the cloud!

Intro to Datastax Enterprise Graph

WEBINAR: Intro to Datastax Enterprise Graph | Wednesday September 23 at 6:00 PM | Online Event

DSE Graph is a NoSQL database tool that is optimized for storing objects and their relationships similar to the graph data structure used in graph theory. In this webinar, we will introduce and learn some of the basics of DSE graph.

WEBINAR: Getting Started with Terraform and Kubernetes on AWS Part 2

WEBINAR: Getting Started with Terraform and Kubernetes on AWS Part 2 | Tuesday September 22 at 6:00 PM | Online Event

In the first part of this series, we introduced you to Terraform and walked you through how to get started using the tool on AWS. In this webinar, we’ll continue that demo and walk you through launching a Kubernetes cluster.

WEBINAR: Open Source BI Tools and Cassandra: Doing SQL and Reporting on Cassandra

WEBINAR: Open Source BI Tools and Cassandra: Doing SQL and Reporting on Cassandra | Thursday September 17 at 6:00 PM | Online Event

In this webinar, we will wrap up our “Doing SQL and Reporting on Apache Cassandra with Open Source Tools” series. We will cover BI tools like Metabase, Redash, and Apache Superset with a demonstration using Metabase with the Presto connector to connect with Cassandra. If you missed parts 1-3 of this series, they are linked below. Part 1 may be helpful since we will be using Presto in this webinar’s demo!

WEBINAR: Spark and Cassandra for Machine Learning: Model Deployment

WEBINAR: Spark and Cassandra for Machine Learning: Model Deployment | Wednesday September 16 at 6:00 PM | Online Event

In Part 6 of our series on Machine Learning with Spark and Cassandra, we will be discussing model deployment. At the end of our machine learning process, we end up with a trained and tested model that we are sure performs to our requirements. Deployment is the process by which that model is made to do the actual processing that it was designed for.

WEBINAR: Introduction to Ansible

WEBINAR: Introduction to Ansible | Tuesday September 15 at 6:00 PM | Online Event

In this webinar, we are going to introduce you to Ansible, an automation tool that can transform how your company operates. We will cover what it is, how and why it is used then we’ll walk you through a quick demo to demonstrate usefulness and simplicity.

WEBINAR: Open Source Notebooks and Cassandra

WEBINAR: Open Source Notebooks and Cassandra | Thursday August 27 at 6:00 PM | Online Event

Join us on August 27th 2020, for a webinar focused on open source notebooks that you can use to do SQL on Cassandra. If you are working with Cassandra in any capacity, we believe you will find this webinar very informative. Registration is limited so please register ASAP. We hope to see you there!

WEBINAR: Spark and Cassandra for Machine Learning: Model Selection Tests

WEBINAR: Spark and Cassandra for Machine Learning: Model Selection Tests | Wednesday August 26 at 6:00 PM | Online Event

This is Part 5 of our series on Machine Learning with Spark and Cassandra. This time, we will be discussing model-selections tests, which are used to dertermine which of two trained machine learning models performs better on our datasets. The point of model selection tests is to predict which model will generalize better to unsees data and thus comparisons of single test results are not enough.

WEBINAR: Running Cassandra on Kubernetes with Terraform

WEBINAR: Running Cassandra on Kubernetes with Terraform | Tuesday August 25 at 6:00 PM | Online Event

In this webinar, I’ll be combining my two previous webinars together and walking you through how to run Cassandra on Kubernetes. If you would like to take some time and view previously hosted webinars, please visit our Anant YouTube playlists and check out “Webinars”. We hope you’ll be able to join us!

WEBINAR: Getting Started with Terraform and Kubernetes on AWS

WEBINAR: Getting Started with Terraform and Kubernetes on AWS | Thursday August 20 at 6:00 PM | Online Event

In this webinar, I’ll introduce you to using Terraform to spin up a Kubernetes cluster on Amazon’s Web Services. We will be creating an EKS cluster and an auto-scaling group of workers for the cluster. If you’re interested in Terraform, this webinar will be beneficial to you. We hope to see you there!

WEBINAR: Spark and Cassandra

WEBINAR: Spark and Cassandra | Wednesday August 19 at 6:00 PM | Online Event

In this webinar, we will introduce Spark and give some background information about what it is and what it can do. We will also discuss how to connect Spark to Cassandra for users who want to use Spark to do SQL queries on a NoSQL database like Cassandra.

WEBINAR: Spark and Cassandra for Machine Learning: Cross-Validation

WEBINAR: Spark and Cassandra for Machine Learning: Cross-Validation | Tuesday August 18 at 6:00 PM | Online Event

Cross validation is a collection of methods for repeated training and testing of our machine learning models, in order to learn more than simple testing can tell us. These tests can help us tune our model parameters before any final evaluation takes place and we try to move forwards to deployment. We are planning to cover: Train/Test/Validation Split, K-Fold Cross-Validation, Leave One Out Cross-Validation, and Nested Cross-Validation.

WEBINAR: Spark and Cassandra for Machine Learning: Testing/Validation

WEBINAR: Spark and Cassandra for Machine Learning: Testing/Validation | Thursday July 30 at 6:00 PM | Online Event

Testing and validation are important parts of any machine learning process. The basics may seem obvious, but specific testing schemes and validation strategies can help ensure that you are getting the most out of your data while preventing you from creating models that are useless on your real data. The previous installments in this webinar (found at blog.anant.us) discussed data pre-processing methods. This installment focuses on how we test the efficacy of our machine learning models and tells us how well they might generalize to real data.

WEBINAR: Presto and Cassandra

WEBINAR: Presto and Cassandra | Wednesday July 29 at 6:00 PM | Online Event

In this webinar, we will introduce Presto and give some background information about what it is and what it can do. We will also discuss how to connect Presto to Cassandra for users who want to use Presto to do SQL queries on a NoSQL database like Cassandra.

WEBINAR: Getting Started with Terraform & Cassandra on AWS

WEBINAR: Getting Started with Terraform & Cassandra on AWS | Tuesday July 28 at 6:00 PM | Online Event

This is going to be a webinar introducing you to Terraform and how you can use it to spin up an Apache Cassandra instance on AWS. We’ll going over the basics of this this infrastructure software to show you the potential it has when it comes to maintaining and deploying your environments.

WEBINAR: Netlify Serverless Lambda Functions Vs. Cloud Functions for Firebase

WEBINAR: Netlify Serverless Lambda Functions Vs. Cloud Functions for Firebase| Friday June 26 at 6:00 PM | Online Event

During this webinar, we will overview and compare Netlify Serverless Lambda functions with Firebase cloud functions. We hope to see you there!

WORKSHOP: Building a REST API with Cassandra using Python and Node

WORKSHOP: Building a REST API with Cassandra using Python and Node| Wednesday June 24 12:00 PM – 2:00 PM| Online Event

REST API’s have been used for years to create easy to use and flexible application interfaces that don’t require verbose language or heavy objects to operate. In this 2 hour workshop we’ll combine the ease of REST with the power of Apache Cassandra and bring you through the process of creating a REST API for Cassandra using Python and NodeJS languages. Packed with fully online hands-on exercises you’ll be able to jump right in to level-up your game.

WEBINAR: Generating CRUD Swagger API Docs on Gitpod with Node.js, and Express

WEBINAR: Generating CRUD Swagger API Docs on Gitpod with Node.js, and Express| Tuesday June 23 at 6:00 PM | Online Event

In this webinar, we will discuss how to generate Swagger API docs for CRUD operations using basic endpoints written in Node.js with the Express.js framework. We will demo this on Gitpod, which is an online open-source IDE based on Eclipse Theia.

WEBINAR: Spark and Cassandra for Machine Learning: Data Preprocessing

WEBINAR: Spark and Cassandra for Machine Learning: Data Preprocessing| Monday June 22 at 6:00 PM | Online Event

Join us as we walk through the first actual step on the road to a complete machine learning model, dealing with our data. We will learn about how to load data into our training environment as well as some of the different ways that data may have to be manipulated before we can use it to train our models.

WEBINAR: Building a Marketing Site Using ButterCMS Pt. 1

WEBINAR: Building a Marketing Site Using ButterCMS Pt. 1| Thursday Jun 18 at 6:00 PM | Online Event

In this webinar, we’ll cover what a headless CMS is, and why it’s increasingly popular. We’ll take a look at how it stacks up against a traditional CMS, so you can weigh which kind of CMS is right for your business needs.

WEBINAR: Become a Sitecore Entrepreneur by Partnering with Studios

WEBINAR: Become a Sitecore Entrepreneur by Partnering with Studios| Thursday May 21 at 6:00 PM | Online Event

Join Sitecore MVP, Eric Ramseur, in a discussion on becoming a Sitecore Entrepreneur. Learn from past and current engagement experiences to discover new methods to elevate your career. Ask questions to plan for the future of working in Sitecore.

WEBINAR: Building Node & Python REST APIs w/ Cassandra on Gitpod and Datastax Astra

WEBINAR: Building Node & Python REST APIs w/ Cassandra on Gitpod and Datastax Astra| Thursday May 07 at 6:30 PM | Online Event

DataStax Astra provides the ability to develop and deploy data-driven applications with a cloud-native service, without the hassles of database and infrastructure administration. In this webinar, we are going to walk you through creating a REST API and exposing that to your Cassandra database. We hope to see you there!

WEBINAR: Creating a REST API with Datastax Astra Cassandra as a Service

WEBINAR: Creating a REST API with Datastax Astra Cassandra as a Service | Thursday April 30 at 6:30 PM | Online Event

WEBINAR: Deliver the Sitecore Platform like an Award Winning Production

WEBINAR: Deliver the Sitecore Platform like an Award Winning Production| Friday April 24 at 6:00 PM | Online Event

Join Sitecore MVP, Eric Ramseur from ANANT Corp for a discussion on selling and delivering Sitecore in a Movie Studio Approach. Attendees will review current Sitecore Business Development strategies and discuss ways to chart the Future of Work in the Sitecore Industry. Please RSVP to ensure your spot in our webinar!!

WEBINAR: How to Recruit a Sitecore Professional

WEBINAR: How to Recruit a Sitecore Professional | Thursday March 19 at 6:00 PM | Online Event

Join Sitecore MVP, Eric Ramseur, in an important discussion on effectively recruiting a Sitecore Professional. We will discuss talent acquisition scenarios from the Sitecore Community and Identify ways Recruiters can provide the best value for our members and their clients. The audience for this webinar is obviously Recruiters however Sitecore Professionals, Consultants, and Community members are welcome to add value to the discussion.

WEBINAR: How Sitecore Helps Organizations with their Digital Transformation

WEBINAR: How Sitecore Helps Organizations with their Digital Transformation | Thursday February 27 at 6:00 PM | Online Event

Join Sitecore MVP Eric Ramseur for an online discussion on the ways in which Sitecore can improve customer experience and lead your organization’s Digital Transformation. Learn from past experiences on licensing, feature development budgets, cloud deployments and more!! Get your questions answered so that your company can get the most return out of your investment.

Moving from a Relational Database to Cassandra: Why, Where, When and How

Moving from a Relational Database to Cassandra: Why, Where, When and How | Wednesday November 20 at 5:30 PM | Anant DC Office, 2315 Pennsylvania Ave NW, Suite 301, Washington, D.C. 20037

Join us on Wednesday, November 20th at 5:30 PM for an informative event focused on everything you need to know about moving from a relational database to Cassandra. We will be covering the why, where, when, and how.

Business Platform Strategy Breakfast

Business Platform Strategy Breakfast | Friday November 08 at 7:30 AM | Anant DC Office, 2315 Pennsylvania Ave NW, Suite 301, Washington, D.C. 20037

We’d love to have you join us on November 8th, 2019 from 7:30 AM to 8:45 AM for our monthly Business Platform Strategy Breakfast. This is a great opportunity to connect with other technology professionals and discuss ideas. We hope to see you there!

Moving from a Relational Database to Cassandra: Why, Where, When and How

Moving from a Relational Database to Cassandra: Why, Where, When and How | Tuesday October 22 at 5:30 PM | Live Webinar

Join us on Tuesday, October 22nd at 5:30 CT to learn everything you need to know about moving from a relational database to Cassandra. We will be covering the why, where, when, and how.

DataStax Graph Workshop and Hands-On Lab Got Graph?

DataStax Graph Workshop and Hands-On Lab Got Graph? Get More from Your Connected Data | Thursday October 3 at 12:00 PM | George Washington University, 1957 E Street NW, Washington, DC 20052, Room 113

DataStax experts are hosting a half-day workshop to learn how to get more value from your data by leveraging a relationship-first approach with a distributed graph database. Get an introduction to graph technology, followed by a technical hands-on lab session.

Business Platform Strategy Breakfast – Data Edition

Business Platform Strategy Breakfast – Data Edition | Friday October 4 at 7:30 AM | Anant DC Office, 3 Washington Circle NW, Suite 301, Washington, D.C. 20037

We’d love for you to join us for the Data Edition of our monthly Business Platform Strategy Breakfast! This is a great opportunity to meet other platform architects and discuss issues you’re encountering.

How to Build a Multi-DC Cassandra Cluster in AWS w/ OpsCenter LCM

How to Build a Multi-DC Cassandra Cluster in AWS w/ OpsCenter LCM | Wednesday September 25 at 5:30 PM | Online Event

Learn how to implement the powerful NoSQL database Cassandra in the AWS cloud using the Datastax distribution and their Ops Center Lifecycle Management Software.

Survey of Real-time Data Platforms (Cassandra, Spark, Akka, Kafka, etc.)

Survey of Real-time Data Platforms (Cassandra, Spark, Akka, Kafka, etc.) | Thursday September 19 at 6:30 PM | Anant DC Office, 3 Washington Circle NW, Suite 301, Washington, D.C. 20037

Learn how to efficiently manage your data using the following Real-time data platforms: Cassandra, Spark, Akka, Kafka.

Datastax – Automation Training + Lunch

Datastax – Automation Training + Lunch | Tuesday September 17 at 12:00 PM | Anant DC Office, 3 Washington Circle NW, Suite 301, Washington, D.C. 20037

Interested in getting some hands-on experience with Datastax? Join us for our Datastax Automation Training!

Business Platform Strategy Breakfast

Business Platform Strategy Breakfast | Friday September 6 at 7:30 AM | Anant DC Office, 3 Washington Circle NW, Suite 301, Washington, D.C. 20037

We’d love for you to join us on August 30, 2019, from 7:30 AM to 8:45 AM for our monthly Business Platform Strategy Breakfast.

An Overview of Spreadsheet and CRM integrations with Zapier

An Overview of Spreadsheet and CRM Integrations with Zapier | Wednesday August 21 at 3:00 PM | Online Event

Join us for a quick demonstration on how to leverage Zapier to automate tasks that cost your company time and money.

Survey of Real-time Data Platforms (Cassandra, Spark, Akka, Kafka, etc.)

Survey of Real-time Data Platforms (Cassandra, Spark, Akka, Kafka, etc.) | Wednesday August 7 at 5:30 PM | Wavicle Data Solutions HQ

Learn how to efficiently manage your data using the following Real-time data platforms: Cassandra, Spark, Akka, Kafka.

How to Build a Multi-DC Cassandra Cluster in AWS w/ OpsCenter LCM

How to Build a Multi-DC Cassandra Cluster in AWS w/ OpsCenter LCM | Thursday August 1 at 6:30 PM | Phillips Hall

Learn how to implement the powerful NoSQL database Cassandra in the AWS cloud using the Datastax distribution and their Ops Center Lifecycle Management Software.

Datastax Accelerate

Datastax Accelerate | Thursday May 23 at 9:00 AM | Gaylord National Resort & Convention Center, Maryland

Join us at the world’s premier Apache Cassandra™ conference to learn from your peers, industry experts, and community leaders about how you can transform modern enterprise applications on any cloud at scale.

Intelligent Graphs: A Look at the Future of Enterprise AI

Webinar – C# Corner Chicago Chapter | Saturday December 22 at 10:00 AM | Online Webinar

Rahul Singh has been working in the Internet IT industry for over 20 years and specializes in designing Digital Business Technology Platforms. His company specializes in creating and integrating Data & Analytics Platforms with Information Systems and Customer Experience Platforms. He has consulted to several large organizations such as McDonalds, Kroger, USPS, USPTO that use Apache Cassandra, Apache Spark, Apache Kafka, and Apache Solr. Rahul will be showing the audience how to build large scale global data & analytics platforms that are needed by the fortune 500 and fast growing technology startups with millions or even billions of customers worldwide.

Intelligent Graphs: A Look at the Future of Enterprise AI

Meetup | Tuesday April 10 at 6:30 PM | George Washington University, Funger Hall, Room 108, 2201 G St. NW, Washington, DC

DVDC is happy to host its 1Q Roundup in partnership with our friends at Data Science DC (https://www.meetup.com/Data-Science-DC/), Columbia GraphDB (www.meetup.com/Columbia-GraphDB-MeetUp), GraphDB Baltimore-Washington (www.meetup.com/graphdb-baltimore/).

Data Modeling in Cassandra: Avoiding Tombstones, Wide Partitions, and Data Skew

Meetup | Tuesday February 20 at 7:00 PM | Eastern Foundry Rosslyn – 1100 Wilson Blvd, 10th Floor, Arlington, VA

Cassandra is the easiest big data (truly big data) open source database to get started with. Whether you are using Apache Cassandra or DataStax Enterprise Cassandra, there are some pitfalls that new Cassandra Developers/Admins/Architects end up making which are easy to avoid.

Data Wranglers DC – Data Processing for Machine Learning / AI / Natural Language Processing

Meetup | Tuesday February 13 at 6:00 PM | Eastern Foundry Rosslyn – 1100 Wilson Blvd, 10th Floor, Arlington, VA

In this meetup there will be a group of three speakers who will cover topics ranging from data wrangling for Natural Language Processing (NLP) to processing data for time-series analysis using machine learning, and harnessing crowd sourced data.

Co-Meetup – Data Wranglers/Cassandra DC – Indexing Options in Cassandra

Meetup | Thursday September 14 at 6:30 PM | Eastern Foundry Rosslyn – 1100 Wilson Blvd, 10th Floor, Arlington, VA

In this Meetup, we will look at the indexing options available in Apache Cassandra, and how indexes are created. Indexing options have greatly improved in the most recent versions of Cassandra. We will discuss when to use indexes and also when not to use them Performance considerations will be discussed.

COO Training – Supporting The Modern Business – Cloud / IoT / Big Data / Data Science

Workshop | Saturday September at 10:30 AM | Anant DC Office

Join Tom Meylan of Digital Clones and Rahul Singh of Anant to learn more about bringing people and technology together within an organization. Digital Clones develops system based training materials for high performance leadership teams. Anant provides modern enterprise consulting and data engineering experience, knowledge, and innovation to Internet Teams and Internet Software Teams.

Scaling Cloud Applications Using Docker

Meetup | Tuesday September 4 at 6:30 PM | Solution Street Herndon Office

In this talk, Rahul Singh, CEO at Anant Corporation, will give a presentation on Scaling Cloud Applications using Docker. He will cover the current popular technologies of Kafka, Spark, Cassandra, and Docker as a part of a growing landscape of distributed computing tools.

Weekend Cowork

Cowork | Saturday August 5 at 9:00 AM to 4:00 PM | Anant Labs, 1010 Wisconsin Avenue NW, Suite 250, Washington, DC

Want to get stuff done on a Saturday as it relates to your startup but don’t have a subscription to a coworking space or don’t like working in a Starbucks? Come by our office for a cowork session!

Data Processing for Search and Information Retrieval

Meetup | Tuesday July 18 at 6:30 PM | George Washington University, Funger Hall, Room 108, 2201 G St. NW, Washington, DC

Deploying machine learning onto enterprise search required an entire team of data scientists, developers and database experts to manage the complex machine learning algorithms and deploy them in search systems. Until now.

Data Wranglers DC – Data Wrangling & Visualization of Public / Government Data

Meetup | Tuesday May 9 at 6:30 PM | Eastern Foundry Rosslyn – 1100 Wilson Blvd, 10th Floor, Arlington, VA

This presentation provides an overview of how tools such as the National Water Information System (NWIS) can be used for continuous and discrete hydrologic data maintenance and exploratory analysis to better provide scientific views and insights at various resolutions.

Baltimore IIBA Chapter – Software Algebra

In-Person Presentation | Tuesday April 11 at 6:00 PM | UMBC Training Center, 6996 Columbia Gateway Drive, Suite #100

How to define and create business applications and systems in the modern era by connecting and assembling available online and open source tools.

WebTech Conference – Software Algebra

In-Person Presentation | Thursday March 30 at 5:30 PM | Iron Yard DC, 1341 G Street, Floor 2

WebTech is a 4-hour conference on web technology trends and techniques, specifically focused on the people who build, manage and design web properties.

Learning Docker – Containerizing Software, Platform & Infrastructure

In-Person Workshop | Saturday December 17 2016 at 9:30 AM | Anant Offices, 1010 Wisconsin Ave. NW #250, Washington, DC 20007

Data Wranglers DC – Data Processing with Zeppelin/ Solr / Spark / Nifi

Meetup | Wednesday December 14 2016 at 6:30 PM | George Washington University, Funger Hall, Room 108, 2201 G St. NW, Washington, DC

Data Wranglers DC Meetup – Integrating Real-Time Data – Spark and Kafka for Video and Data Stream Analysis

Meetup | Wednesday November 9 2016 from 6:30 pM to 8:00 PM (EDT) | ByteCubed Offices 2231 Crystal Drive, Suite 401, Arlington, VA 22202

Webinar: Organizing Business Information with Enterprise Search & Knowledge Management (B2B)

Webinar | Friday November 4 2016 from 10:00 AM to 10:50 AM (EDT) | Online Webinar

Webinar: Unifying Business Information with Portals and Dashboards (B2B)

Webinar | Friday October 14 2016 from 10:00 AM to 10:50 AM (EDT) | Online Webinar

Data Wranglers DC Meetup – The Business Value of Data Wrangling

Meetup | Tuesday October 11 2016 from 6:30 PM to 9:00 PM | George Washington University, Funger Hall, Room 108, 2201 G St. NW, Washington, DC

Business Strategy Breakfast: Successes and Failures with Big Data Analysis

Webinar: Connecting Online Business Software 101 (B2B)

Webinar | Friday September 16 2016 from 10:00 AM to 10:50 AM (EDT) | Online Webinar

Machine Learning and Graph Processing on Accumulo w/ Spark

Meetup | Wednesday August 10 2016 from 6:30 PM to 9:00 PM | ByteCubed Offices 2231 Crystal Drive, Suite 401, Arlington, VA 22202

The goal of this presentation was to add new tools to the the data scientist toolbox. We went through the benefits of using Accumulo and Spark as a custom data analytic platform and provided some simple examples to ease processing.

Business Strategy Breakfast: Sweet Spot for Big Data

Business Strategy Breakfast | Friday August 26 2016 from 8:00 AM to 9:30 AM | 700 12th St NW #700, Washington, DC 20005

Webinar: Building Online Business Software 101 (B2B)

Webinar | Friday August 19 2016 from 10:00 AM to 10:50 AM (EDT) | Online Webinar

Native American Contractors Association – Emerging Native Leaders Summit

Educational Program | Tuesday August 16 to Thursday August 18 2016 | Blank Rome Government Relations LLC Offices, Watergate 600 New Hampshire Avenue, NW, Washington, DC 20037

Data Cleansing with Spark / Scala

Meetup | Wednesday August 10 2016 from 6:30 PM to 9:00 PM | GWU, Funger Hall, Room 108 2201 G St. NW, Washington, DC

Asynchronous Data Processing w/ Message Oriented Architecture

Meetup | Wednesday July 13 2016 from 6:30 PM to 9:00 PM | GWU, Funger Hall, Room 108 2201 G St. NW, Washington, DC

Protecting User Privacy with Fuzzy-feeling Test Data

Meetup | Wednesday June 15 2016 from 6:30 PM to 9:00 PM | GWU, Funger Hall, Room 108 2201 G St. NW, Washington, DC

Leveraging Big Data Technologies at the Government

Meetup | Wednesday May 11 2016 at 6:30 PM | GWU, Funger Hall, Room 108 2201 G St. NW, Washington, DC

Organizing People, Process, Information, and Systems for the Modern Era

Meetup | Tuesday May 10 2016 at 6 PM | UMBC Training Center, 6996 Columbia Gateway Drive, Suite #100, Columbia, MD

Is it easier today to start a business, create and market products and provide services to the masses than ever before? It’s a question worth asking. We at Anant believe so.

Private Presentation | Friday May 6 | Fairfax County, VA

Hybrid Approach to Project Management

Meetup | Thursday May 5 2016 at 6:30PM | 1900 Gallows Rd, Vienna, VA 22182

Business Applications Demo DC Enterprise Tech Meetup

Meetup | Thursday April 28 2016 at 5:30 PM | 1010 Wisconsin Ave. NW #250, Washington, DC 20007

Getting to Continuous, One Iteration at a Time

Private Presentation | Monday April 18 | Washington, DC

Building Enterprise Search with Open Source Components

Meetup | Wednesday March 30 2016 at 6:30 PM | 1445 New York Ave NW, Suite 200, Washington, DC

Mastering Services in the Service of Others

Meetup | Wednesday March 23 2016 at 6:00 PM | 1300 17th St N #1800, Arlington, VA

Comparing SolR/Lucene/ElasticSearch vs. Cloud based Search Providers

Meetup | Monday February 29 2016 at 6:15 PM | 727 Elden Street, 2nd Floor, Herndon, VA

[tribe_events]