Spark MLlib on AWS Glue

Distributed ML on AWS that’s ready to go

AWS pushes Sagemaker as its machine learning platform. However, Spark’s MLlib is a comprehensive library that runs distributed ML natively on AWS Glue — and provides a viable alternative to their primary ML platform.

One of the big benefits of Sagemaker is that it easily supports experimentation via its Jupyter Notebooks. But operationalising your Sagemaker ML can be difficult, particularly if you need to include ETL processing at the start of your pipeline. In this situation, Apache Spark’s MLlib running on AWS Glue can be a good option — by its very nature, it is immediately operationalised, integrated with ETL pre-processing and ready to be used in production for an end-to-end machine learning pipeline.

By its very nature, [AWS Glue] is immediately operationalised, integrated with ETL pre-processing and ready to be used in production for an end-to-end machine learning pipeline

AWS Glue is a managed Spark ETL platform for processing large volumes of data via distributed machines. MLlib comes as part of Spark 2.4, which is the default version on AWS Glue. There is no need to add libraries to use MLlib within AWS Glue.

#aws-glue #apache-spark #aws #spark mllib

Distributed ML on AWS that’s ready to go

towardsdatascience.com

Spark MLlib on AWS Glue