About the project

This article is the 2nd out of 3 of a Machine Learning — binary classification — project which goal is to train the best machine learning model that should be able to predict the optimal number of candidates to be targeted on a Marketing Campaign, to reduce to the minimum costs and to maximize efficiency.

To determine the costs of the campaign, the marketing team has concluded:

  • For each customer identified as a good candidate and therefore defined as a target but doesn’t subscribe to the deposit, the bank had a cost of 500 EUR.
  • For each customer identified as a bad candidate and excluded from the target but would subscribe to the product, the bank had a cost of 2.000 EUR.

The metric used for evaluation is the **total costs **since the objective is to determine the minimum costs of the marketing campaign.

In this article, the focus is on the second section only, the Cleaning & Feature Selection.

In the first post, we have conducted the Exploratory Data Analysis that has allowed us to look further and beyond the initial dataset. EDA can be a very time-consuming task and rarely is a one-time-walk-through but although we may find ourselves going back to early sections changing and trying a few different approaches quite often, the detailed analysis usually pays and gives us a ton of information about the data and the variables’ behavior.

Let’s step into the first section and take a brief overview.

[0] Number of clients that haven’t subscribed the term deposit: 36548

[1] Number of clients that have subscribed the term deposit: 4640

Image for post

#data-cleaning #feature-engineering #feature-selection #machine-learning

Costs prediction of a Marketing Campaign
1.10 GEEK