Predicting Customer Churn using Spark

**Sparkify **is a fictitious music streaming platform and in this project using **Apache Spark **on IBM Cloud, I have analysed two months worth of data from the platform in order to build a machine learning model that helps predict customer churn.

Spark is a lightning-fast unified analytics engine for big data and machine learning. When dealing with extremely large datasets, It is very effective to run your analysis on Spark. Spark uses cluster computing for its computational (analytics) power as well as its storage. This means it can use resources from many computer processors linked together for its analytics.

Problem Statement

In this project, I sought to build a machine learning model that helps predict customer churn for **Sparkify. **This model if effective will help determine customers at risk of churning and targeted marketing could be directed to these set of customers to prevent them from churning and thus save the company lots of money.

Data Understanding

The Sparkify data consists of **286,500 **rows and **18 **columns and this spans over **2 **months; September to October 2018.

Image for post

Dimensions of Sparkify data

The different columns contain data for user activity like Artist, Status if logged in or not, First Name, Gender, **Level **(paid or free account), Location, Length of time spent on the app per session etc.

Image for post

#customer-churn #python #udacity #spark

Problem Statement

Data Understanding

medium.com

Predicting Customer Churn using Spark