“We’re starting the telemarketing campaign on Monday and have the budget for 500 calls. Who should we contact to maximise revenue?”

In this scenario, Bank XYZ had on-boarded 2000 new customers through acquiring a smaller bank but due to resourcing and budget constraints, only 500 of these could be contacted. Using data from Bank XYZ’s existing customer base and results of the campaign for its existing customers, my goal was to identify which of the 500 customers to contact to maximise revenue and provide recommendations for future campaigns.

Business Value

A successful subscriber translates to revenue for Bank XYZ. Based on domain knowledge, we estimated the value of subscription at $100. In general, we know that uptake for these type of campaigns is small, with the proportion of successful calls around 10–15%. As such, by contacting 500 customers randomly, we would expect revenue to be between $5,000 and $7,500.

However, by using predictive analysis and selecting the 500 customers carefully, we demonstrated that more revenue can be generated for Bank XYZ.

**Spoiler: **By using an XGBoost classifier, we were able to increase revenue to $14,500. Read on to find out how.

Image for post

I have chosen to keep the explanations high-level but do check out my GitHub repository to view the detailed analysis and code in my Jupyter Notebook. There is also an executive summary presentation for stakeholders.

Obtain data

The data used for this project is the famous bank marketing dataset from UCI Machine Learning Repository. My first step was to do a test-train split and separate 2000 entries which will represent the 2000 new customers.

After this, I was left with just over 39,000 data points and 20 predictive features including:

  • Personal attributes (age, job, education level etc.)
  • Financial (housing loan, personal loan etc.)
  • Campaign (previous campaign results, means of contact etc.)
  • Economic indicators (consumer confidence index, euribor3m etc.)

Clean Data

The dataset was mostly clean so this step was quicker than expected. There were some missing values in the categorical data denoted by ‘unknown’. I chose to replace these with the mode in the first instance, and then refined my work by using the K-Nearest Neighbours imputer from sklearninstead.

#data-science-projects #data-science #bank-marketing #xgboost #classification #data analysis

Who to call? A classification project with a twist
1.35 GEEK