Data Engineer's Lunch #50: Airbyte

In Data Engineer’s Lunch #50: Airbyte for data engineering, we discussed Airbyte and how it can be used for data engineering. The live recording of the Data Engineer’s Lunch, which includes a more in-depth discussion, is also embedded below in case you were not able to attend live. If you would like to attend a Data Engineer’s Lunch live, it is hosted every Monday at noon EST. Register here now!

In Data Engineer’s Lunch #50: Airbyte for data engineering, we discussed Airbyte and how it can be used for data engineering, including a live demo. Airbyte is an open-source data integration tool that focuses on EL(T). Some of the features that Airbyte includes are:

140+ out-of-the-box connectors
Custom or new connectors, access to CDK
Database replication with Change Data Capture

Normalization and custom transformations via dbt

Full-grade scheduler
Real-time monitoring
Incremental updates
Manual full refresh
Integration with Kubernetes and Airflow
Cloud hosting & management

Airbyte supports all API streams and lets you select the ones that you want to replicate specifically. Furthermore, you can opt for normalized schemas or JSON format, and even explode nested API objects into separate tables or get a serialized JSON. As mentioned above, Airbtye focuses more on the extract and load aspects of ETL, but for transformation, they provide the ability to do data transformations using dbt. Additionally, they have an API and tons of recipes to help you get started.

In addition to running pipelines, Airbyte also provides pipeline visibility in the forms of real-time monitoring with error logging, notification for failed syncs, and debugging autonomy that allows you to modify and debug pipelines without waiting.

Airbyte provides many different open-source deployment options ranging from:

Local -> Docker
- Some users using Macs with an M1 chip are facing some problems running Airbyte
Airbyte Cloud
AWS -> EC2
GCP -> Compute Engine
Azure -> VM
K8
Digital Ocean
Oracle -> Cloud Infrastructure VM

As mentioned above, we have a demo included in the live recording of Data Engineer’s Lunch #50: Airbyte for data engineering. In this demo, we spin up Airbyte on Gitpod and do 2 simple E+L pipelines. The first step is to get a CSV file from GitHub and stores it to local as JSON. The second does E+L from one instance of PostgreSQL to another instance of PostgreSQL. Be sure to watch the video below!

Data Engineer's Lunch #50: Airbyte for Data Engineering from Anant Corporation

Cassandra.Link

Cassandra.Link is a knowledge base that we created for all things Apache Cassandra. Our goal with Cassandra.Link was to not only fill the gap of Planet Cassandra but to bring the Cassandra community together. Feel free to reach out if you wish to collaborate with us on this project in any capacity.

We are a technology company that specializes in building business platforms. If you have any questions about the tools discussed in this post or about any of our services, feel free to send us an email!

Data Engineer’s Lunch #50: Airbyte

Cassandra.Link

Join Our Newsletter!

CONTACT INFO

RESOURCES

PROPERTIES

FOLLOW US

2022 Anant Corporation, All Rights Reserved.
All logos, trademarks and registered trademarks are the property of their respective owners.

Cassandra.Link

Related Posts

Join Our Newsletter!

CONTACT INFO

RESOURCES

PROPERTIES

FOLLOW US

2022 Anant Corporation, All Rights Reserved. All logos, trademarks and registered trademarks are the property of their respective owners.

2022 Anant Corporation, All Rights Reserved.
All logos, trademarks and registered trademarks are the property of their respective owners.