The cost of acquiring new customers is high, so companies are spending more on customer loyalty and retention. Identifying the total value generated by a customer in the entire customer life cycle would help companies in business campaigns and in other activities. So naturally Customer Relationship Management (CRM) becomes a key element of modern marketing strategies.

If we can predict a score that allows us to project, on a given population, quantifiable information then it can be used by the information system (IS) to personalize the customer relationship.

KDD (Knowledge Discovery and Data Mining) Cup 2009 challenge consists of three tasks, predicting the churn, appentency and upselling, through the data provided by the telecom company Orange. The business idea is to :

  • Identify the churning customers before they switch operator (churn),
  • Identify the new potential customers for the brand (appentency),
  • And identify customers who may buy something additional or more profitable items from the brand (upselling)

The challenge is to beat the in-house system developed by Orange Labs For large dataset, in-house AUC score is following:

  1. Churn : 0.7435
  2. Appentency : 0.8522
  3. Up-selling : 0.8975

We have two versions of the data and both have 50,000 samples but the large version contains 15,000 features and the small version contains only 230. The target variable values are +1 or -1 indicating positive and negative class labels respectively.

In the small version of the dataset 40 features are categorical with high cardinality and rest are all numerical. As per challenge rules the performance of the predictions was evaluated according to the average area under the ROC curve of three tasks; churn, appentency and upselling (collectively called score).

#machine-learning #data-science

Building Knowledge on the Customer Through Machine Learning
1.25 GEEK