One of Apache Spark’s main core features is Spark MLLib, a library for doing machine learning in Spark. Most data science education relies on specific machine learning libraries, like Sci-Kit Learn. Having data scientists retrain to use Spark MLLib can be an extra cost on top of the data engineering work that needs to be done in the first place, just to use Spark. Databricks offers distributed versions of some of these Machine Learning frameworks as part of the Databricks platform.[Read more…] about Apache Spark Companion Technologies: Distributed Machine Learning Frameworks
What is model deployment?
Model deployment is the process that we take to put our trained models to work. It involves moving our model to somewhere with the resources to do serious processing. That place also needs the ability to receive or retrieve data to be processed. We place that trained model within an architecture that delivers data to the model for processing. It then retrieves and delivers or stores the results so that they can be used or seen by users. Similar choices need to be made about whether the model gets retrained, updated, or replaced during operation.[Read more…] about Machine Learning with Spark and Cassandra: Model Deployment
Model-selection tests are used to determine which of the two trained machine learning models performs better. The point of model selection tests is to predict which model will generalize better to unseen data and thus comparisons of single test results are not enough. Today we will run through a number of different model selection tests, discuss how they work and how we interpret their results.[Read more…] about Spark and Cassandra For Machine Learning: Model Selection Tests
Machine learning is increasingly becoming a part of people’s business platforms. In order to make full use of machine learning in our business platforms, we will need a tool with similar characteristics to our database tools. It needs to be distributed and scale-able, and integrate near seamlessly with our data store. Luckily Spark is a great tool for this purpose. In this post and future ones, we will learn about how to set up an environment for performing machine learning using Apache Spark and Cassandra, and also learning more about machine learning in general.[Read more…] about Spark and Cassandra For Machine Learning: Setup
The first part of any machine learning project is to gather data. This sounds easy. You may think that this puts you in the perfect position to work with data you have in relational databases. In some circumstances that may be correct. However, most of the ways that we store data in databases for business platforms are sub-optimal for using machine learning. They require more work to gain the insights we want out of our data.[Read more…] about Database Aggregations for Machine Learning
This is the fourth part of our “Diving deep into Gartner’s Top 10 Data and Analytics Technology Trends for 2019” blog series. In this series, we’ll explore the top 10 Data & Analytics 2019 trends identified by Gartner. If you haven’t yet seen the third post on Continuous Intelligence check it out here.[Read more…] about Data & Analytics Trend 4 – Explainable AI