**Sparkify **is a fictitious music streaming platform and in this project using **Apache Spark **on IBM Cloud, I have analysed two months worth of data from the platform in order to build a machine learning model that helps predict customer churn.

Spark is a lightning-fast unified analytics engine for big data and machine learning. When dealing with extremely large datasets, It is very effective to run your analysis on SparkSpark uses cluster computing for its computational (analytics) power as well as its storage. This means it can use resources from many computer processors linked together for its analytics.

Problem Statement

In this project, I sought to build a machine learning model that helps predict customer churn for **Sparkify. **This model if effective will help determine customers at risk of churning and targeted marketing could be directed to these set of customers to prevent them from churning and thus save the company lots of money.

Data Understanding

The Sparkify data consists of **286,500 **rows and **18 **columns and this spans over **2 **months; September to October 2018.

Image for post

Dimensions of Sparkify data

The different columns contain data for user activity like ArtistStatus if logged in or notFirst NameGender, **Level **(paid or free account), LocationLength of time spent on the app per session etc.

Image for post

#customer-churn #python #udacity #spark

Predicting Customer Churn using Spark
1.25 GEEK