Santander Case — Part A: Classification

Santander Case — Part A: Classification

Here you will find: Data Cleaning, Feature Selection, Bayesian Optimization, Classification, and Model Validation. We want to minimize the rate of FP and FN as well as maximize the rate of TP. To do so, we will use the metric AUC (area under the curve) of ROC Curve (receiver operating characteristic), because it returns us the best model as well as the best threshold.

The Problem

The Santander Group is a global banking group, led by Banco Santander S.A., the largest bank in the euro area. It has its origin in Santander, Cantabria, Spain. As every bank, they have a retention program that should be applied to unsatisfied customers.

To be able to use this program properly, we need to develop a machine learning model to classify if the customer is satisfied or not. Customers classified as unsatisfied should be the target of the retention program.

The retention program cost $10 for each customer and an effective application (in really unsatisfied customers) returns a profit of $100. In the classification task we can have the following scenarios:

  1. False Positive(FP): classify the customer as UNSATISFIED but he is SATISFIED. Cost: $ 10, Earn: $ 0;
  2. False Negative(FN): classify the customer as SATISFIED but he is DISSATISFIED. Cost: $ 0, Earn: $ 0;
  3. True Positive(TP): classify the customer as UNSATISFIED and he is UNSATISFIED. Cost: $ 10, Earn: $ 100;
  4. True Negative(TN): classify the customer as SATISFIED and he is SATISFIED. Cost: $ 0, Earn: $ 0.

In summary, we want to minimize the rate of FP and FN as well as maximize the rate of TP. To do so, we will use the metric AUC (area under the curve) of ROC Curve (receiver operating characteristic), because it returns us the best model as well as the best threshold.

You can check the complete notebook with this solution on my [Github_](https://github.com/PedroHCouto/Santander-Case)._

This Case was made as a parte of tht prize for winning the Santander Data Masters Competition. I explain more about the competition itself and the hard skills I learned and soft skills I used in my way to winning it in this [article_](https://towardsdatascience.com/how-i-won-the-santander-data-master-competition-3-central-soft-skills-i-used-and-the-hard-skills-7f989012e0e2)._

Let’s go.

programming artificial-intelligence data-science machine-learning classification

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

Most popular Data Science and Machine Learning courses — July 2020

Most popular Data Science and Machine Learning courses — August 2020. This list was last updated in August 2020 — and will be updated regularly so as to keep it relevant

Artificial Intelligence vs Machine Learning vs Data Science

Artificial Intelligence, Machine Learning, and Data Science are amongst a few terms that have become extremely popular amongst professionals in almost all the fields.

Pipelines in Machine Learning | Data Science | Machine Learning | Python

Machine Learning Pipelines performs a complete workflow with an ordered sequence of the process involved in a Machine Learning task. The Pipelines can also

Data Science Projects | Data Science | Machine Learning | Python

Practice your skills in Data Science with Python, by learning and then trying all these hands-on, interactive projects, that I have posted for you.

Data Science Projects | Data Science | Machine Learning | Python

Practice your skills in Data Science with Python, by learning and then trying all these hands-on, interactive projects, that I have posted for you.