Data lakes are a tool for long term data storage. They can be implemented on-premises for use cases requiring high security or in the cloud for more accessible solutions. The Databricks runtime includes code specifically for easing the connection between spark and Data lake technologies as well as its own companion tech, Delta Lake. Delta Lake makes interacting with data in data lakes easier and more consistent but it is possible to work with data lakes without it, as we will see today.[Read more…] about Apache Spark Companion Technologies: Data Lakes
In Data Engineer’s Lunch #5: What is a Data Lake?, we discuss what data lakes are, why we need them, how we get data in and out, and different implementations of data lakes. The live recording of the Data Engineer’s Lunch, which includes a more in-depth discussion, is also embedded below in case you were not able to attend live. If you would like to attend Data Engineer’s Lunch in person, it is hosted every Monday at 12 PM EST. Register here now![Read more…] about Data Engineer’s Lunch #5: What is a Data Lake?