Home GOTO Bookclub Scaling Machine ...

Scaling Machine Learning with Spark: Distributed ML with MLlib, TensorFlow, and PyTorch

Publication date:
April 23, 2023
Adi Polak

Adi Polak

VP of Developer Experience at Treeverse & Contributing to lakeFS OSS

Learn how to build end-to-end scalable machine learning solutions with Apache Spark. With this practical guide, author Adi Polak introduces data and ML practitioners to creative solutions that supersede today's traditional methods. You'll learn a more holistic approach that takes you beyond specific requirements and organizational goals--allowing data and ML practitioners to collaborate and understand each other better.

Scaling Machine Learning with Spark examines several technologies for building end-to-end distributed ML workflows based on the Apache Spark ecosystem with Spark MLlib, MLflow, TensorFlow, and PyTorch. If you're a data scientist who works with machine learning, this book shows you when and why to use each technology.

You will:

Explore machine learning, including distributed computing concepts and terminology Manage the ML lifecycle with MLflow Ingest data and perform basic preprocessing with Spark Explore feature engineering, and use Spark to extract features Train a model with MLlib and build a pipeline to reproduce it Build a data system to combine the power of Spark with deep learning Get a step-by-step example of working with distributed TensorFlow Use PyTorch to scale machine learning and its internal architecture

BOOK EPISODE

Scaling Machine Learning with Spark: Distributed ML with MLlib, TensorFlow, and PyTorch

There are several tools for building end-to-end distributed ML workflows based on the Apache Spark ecosystem such as Spark MLlib, MLflow, TensorFlow and PyTorch. But it's far from easy to build them. Adi Polak, author of 'Scaling machine learning with Spark' spoke to Holden Karau where she dived into a slew of creative solutions that supersede traditional methods as well as when and why to use each technology. Apart from exploring machine learning and distributed computing concepts, learn a holistic approach beyond specific requirements and organizational goals.

Watch the video