Setting up your Amazon Web Services (AWS) Elastic MapReduce (EMR) Cluster with XGBoost

Introduction

This article assumes you are already familiar with what XGBoost/CatBoost/etc. do and that you are here to actually get them to work.

Installing packages on a local machine/single node is easy. Doing the same for a cluster environment in order to work with big data is less so and the motivation for this article. I will share code commands and screenshots to help you follow along.

This article is split into two parts and will teach you how to set-up packages such that they are available across all nodes in a cluster environment. In this example, I demonstrate with an installation of XGBoost (eXtreme Gradient Boosting) on an Amazon Web Services (AWS) EMR cluster, however these instructions generalize to other packages like CatBoost, PyOD, etc. and generalize to other cloud computing environments.

#aws #machine-learning #xgboost #aws emr

Install XGBoost for AWS EMR Notebook Environment
1.80 GEEK