Humberto  Ratke

Humberto Ratke

1603789213

Predict Customer Churn in Python

Customer attrition (a.k.a customer churn) is one of the biggest expenditures of any organization. If we could figure out why a customer leaves and when they leave with reasonable accuracy, it would immensely help the organization to strategize their retention initiatives manifold. Let’s make use of a customer transaction dataset from Kaggle to understand the key steps involved in predicting customer attrition in Python.

Supervised Machine Learning is nothing but learning a function that maps an input to an output based on example input-output pairs. A supervised machine learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples. Given that we have data on current and prior customer transactions in the telecom dataset, this is a standardized supervised classification problem that tries to predict a binary outcome (Y/N).

By the end of this article, let’s attempt to solve some of the key business challenges pertaining to customer attrition like say, (1) what is the likelihood of an active customer leaving an organization? (2) what are key indicators of a customer churn? (3) what retention strategies can be implemented based on the results to diminish prospective customer churn?

In real-world, we need to go through seven major stages to successfully predict customer churn:

Section A: Data Preprocessing

Section B: Data Evaluation

Section C: Model Selection

Section D: Model Evaluation

Section E: Model Improvement

Section F: Future Predictions

Section G: Model Deployment

To understand the business challenge and the proposed solution, I would recommend you to download the dataset and to code with me. Feel free to ask me if you have any questions as you work along. Let’s look into each one of these aforesaid steps in detail here below

Section A: Data Preprocessing

If you had asked the 20-year-old me, I would have jumped straight into model selection as its coolest thing to do in machine learning. But like in life, wisdom kicks in at a later stage! After witnessing the real-world Machine Learning business challenges, I can’t stress the importance of Data preprocessing and Data Evaluation.

Always remember the following golden rule in predictive analytics:

“Your model is only as good as your data”

Understanding the end-to-end structure of your dataset and reshaping the variables is the gateway to a qualitative predictive modelling initiative.

Step 0: Restart the session: It’s a good practice to restart the session and to remove all the temporary variables from the interactive development environment before we start coding. So let’s restart the session, clear the cache and start afresh!

try:
    from IPython import get_ipython
    get_ipython().magic('clear')
    get_ipython().magic('reset -f')
except:
    pass

**Step 1: Import relevant libraries: **Import all the relevant python libraries for building supervised machine learning algorithms.

#Standard libraries for data analysis:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from scipy.stats import norm, skew
from scipy import stats
import statsmodels.api as sm
## sklearn modules for data preprocessing:
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
#sklearn modules for Model Selection:
from sklearn import svm, tree, linear_model, neighbors
from sklearn import naive_bayes, ensemble, discriminant_analysis, gaussian_process
from sklearn.neighbors import KNeighborsClassifier
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from xgboost import XGBClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
#sklearn modules for Model Evaluation & Improvement:

from sklearn.metrics import confusion_matrix, accuracy_score 
from sklearn.metrics import f1_score, precision_score, recall_score, fbeta_score
from statsmodels.stats.outliers_influence import variance_inflation_factor
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import ShuffleSplit
from sklearn.model_selection import KFold
from sklearn import feature_selection
from sklearn import model_selection
from sklearn import metrics
from sklearn.metrics import classification_report, precision_recall_curve
from sklearn.metrics import auc, roc_auc_score, roc_curve
from sklearn.metrics import make_scorer, recall_score, log_loss
from sklearn.metrics import average_precision_score
#Standard libraries for data visualization:
import seaborn as sn
from matplotlib import pyplot
import matplotlib.pyplot as plt
import matplotlib.pylab as pylab
import matplotlib 
%matplotlib inline
color = sn.color_palette()
import matplotlib.ticker as mtick
from IPython.display import display
pd.options.display.max_columns = None
from pandas.plotting import scatter_matrix
from sklearn.metrics import roc_curve
#Miscellaneous Utilitiy Libraries:

import random
import os
import re
import sys
import timeit
import string
import time
from datetime import datetime
from time import time
from dateutil.parser import parse
import joblib

Step 2: Set up the current working directory:

os.chdir(r”C:/Users/srees/Propensity Scoring Models/Predict Customer Churn/”)

**Step 3: Import the dataset: **Let’s load the input dataset into the python notebook in the current working directory.

dataset = pd.read_csv('1.Input/customer_churn_data.csv')

**Step 4: Evaluate data structure: **In this section, we need to look at the dataset in general and each column in detail to get a better understanding of the input data so as to aggregate the fields when needed.

From the head & column methods, we get an idea that this is a telco customer churn dataset where each record entails the nature of subscription, tenure, frequency of payment and churn (signifying their current status).

dataset.head()

Image for post

Snapshot of Input Dataset (Image by Author)

dataset.columns

Image for post

List of Column Names (Image by Author)

A quick describe method reveals that the telecom customers are staying on average for 32 months and are paying $64 per month. However, this could potentially be because different customers have different contracts.

dataset.describe()

Image for post

Describe Method (Image by Author)

From the look of it, we can presume that the dataset contains several numerical and categorical columns providing various information on the customer transactions.

dataset.dtypes

Image for post

Column Data Types (Image by Author)

**Re-validate column data types and missing values: **Always keep an eye onto the missing values in a dataset. The missing values could mess up model building and accuracy. Hence we need to take care of missing values (if any) before we compare and select a model.

dataset.columns.to_series().groupby(dataset.dtypes).groups

Image for post

Aggregated Column Data Types (Image by Author)

The dataset contains 7043 rows and 21 columns and there seem to be no missing values in the dataset.

dataset.info()

Image for post

Data Structure (Image by Author)

dataset.isna().any()

Image for post

Check NA’s (Image by Author)

**Identify unique values: **‘Payment Methods’ and ‘Contract’ are the two categorical variables in the dataset. When we look into the unique values in each categorical variables, we get an insight that the customers are either on a month-to-month rolling contract or on a fixed contract for one/two years. Also, they are paying bills via credit card, bank transfer or electronic checks.

#Unique values in each categorical variable:

dataset["PaymentMethod"].nunique()
dataset["PaymentMethod"].unique()
dataset["Contract"].nunique()
dataset["Contract"].unique()

**Step 5: Check target variable distribution: **Let’s look at the distribution of churn values. This is quite a simple yet crucial step to see if the dataset upholds any class imbalance issues. As you can see below, the data set is imbalanced with a high proportion of active customers compared to their churned counterparts.

dataset["Churn"].value_counts()

Image for post

Distribution of Churn Values (Image by Author)

Step 6: Clean the dataset:

dataset['TotalCharges'] = pd.to_numeric(dataset['TotalCharges'],errors='coerce')

dataset['TotalCharges'] = dataset['TotalCharges'].astype("float")

**Step 7: Take care of missing data: **As we saw earlier, the data provided has no missing values and hence this step is not required for the chosen dataset. I would like to showcase the steps here for any future references.

dataset.info()

Image for post

Data Structure (Image by Author)

dataset.isna().any()

Image for post

Check NA’s (Image by Author)

**Find the average and fill missing values programmatically: **If we had any missing values in the numeric columns of the dataset, then we should find the average of each one of those columns and fill their missing values. Here’s a snippet of code to do the same step programmatically.

na_cols = dataset.isna().any()

na_cols = na_cols[na_cols == True].reset_index()
na_cols = na_cols["index"].tolist()
for col in dataset.columns[1:]:
     if col in na_cols:
        if dataset[col].dtype != 'object':
             dataset[col] =  dataset[col].fillna(dataset[col].mean()).round(0)

**Revalidate NA’s: **It’s always a good practice to revalidate and ensure that we don’t have any more null values in the dataset.

dataset.isna().any()

Image for post

Re-validate NA’s (Image by Author)

**Step 8: Label Encode Binary data: **Machine Learning algorithms can typically only have numerical values as their independent variables. Hence label encoding is quite pivotal as they encode categorical labels with appropriate numerical values. Here we are label encoding all categorical variables that have only two unique values. Any categorical variable that has more than two unique values are dealt with Label Encoding and one-hot Encoding in the subsequent sections.

#Create a label encoder object

le = LabelEncoder()
## Label Encoding will be used for columns with 2 or less unique 
values
le_count = 0
for col in dataset.columns[1:]:
    if dataset[col].dtype == 'object':
        if len(list(dataset[col].unique())) <= 2:
            le.fit(dataset[col])
            dataset[col] = le.transform(dataset[col])
            le_count += 1
print('{} columns were label encoded.'.format(le_count))

#machine-learning #data-science #python #artificial-intelligence #developer

What is GEEK

Buddha Community

Predict Customer Churn in Python
Ray  Patel

Ray Patel

1619518440

top 30 Python Tips and Tricks for Beginners

Welcome to my Blog , In this article, you are going to learn the top 10 python tips and tricks.

1) swap two numbers.

2) Reversing a string in Python.

3) Create a single string from all the elements in list.

4) Chaining Of Comparison Operators.

5) Print The File Path Of Imported Modules.

6) Return Multiple Values From Functions.

7) Find The Most Frequent Value In A List.

8) Check The Memory Usage Of An Object.

#python #python hacks tricks #python learning tips #python programming tricks #python tips #python tips and tricks #python tips and tricks advanced #python tips and tricks for beginners #python tips tricks and techniques #python tutorial #tips and tricks in python #tips to learn python #top 30 python tips and tricks for beginners

Ray  Patel

Ray Patel

1619510796

Lambda, Map, Filter functions in python

Welcome to my Blog, In this article, we will learn python lambda function, Map function, and filter function.

Lambda function in python: Lambda is a one line anonymous function and lambda takes any number of arguments but can only have one expression and python lambda syntax is

Syntax: x = lambda arguments : expression

Now i will show you some python lambda function examples:

#python #anonymous function python #filter function in python #lambda #lambda python 3 #map python #python filter #python filter lambda #python lambda #python lambda examples #python map

Shardul Bhatt

Shardul Bhatt

1626775355

Why use Python for Software Development

No programming language is pretty much as diverse as Python. It enables building cutting edge applications effortlessly. Developers are as yet investigating the full capability of end-to-end Python development services in various areas. 

By areas, we mean FinTech, HealthTech, InsureTech, Cybersecurity, and that's just the beginning. These are New Economy areas, and Python has the ability to serve every one of them. The vast majority of them require massive computational abilities. Python's code is dynamic and powerful - equipped for taking care of the heavy traffic and substantial algorithmic capacities. 

Programming advancement is multidimensional today. Endeavor programming requires an intelligent application with AI and ML capacities. Shopper based applications require information examination to convey a superior client experience. Netflix, Trello, and Amazon are genuine instances of such applications. Python assists with building them effortlessly. 

5 Reasons to Utilize Python for Programming Web Apps 

Python can do such numerous things that developers can't discover enough reasons to admire it. Python application development isn't restricted to web and enterprise applications. It is exceptionally adaptable and superb for a wide range of uses.

Robust frameworks 

Python is known for its tools and frameworks. There's a structure for everything. Django is helpful for building web applications, venture applications, logical applications, and mathematical processing. Flask is another web improvement framework with no conditions. 

Web2Py, CherryPy, and Falcon offer incredible capabilities to customize Python development services. A large portion of them are open-source frameworks that allow quick turn of events. 

Simple to read and compose 

Python has an improved sentence structure - one that is like the English language. New engineers for Python can undoubtedly understand where they stand in the development process. The simplicity of composing allows quick application building. 

The motivation behind building Python, as said by its maker Guido Van Rossum, was to empower even beginner engineers to comprehend the programming language. The simple coding likewise permits developers to roll out speedy improvements without getting confused by pointless subtleties. 

Utilized by the best 

Alright - Python isn't simply one more programming language. It should have something, which is the reason the business giants use it. Furthermore, that too for different purposes. Developers at Google use Python to assemble framework organization systems, parallel information pusher, code audit, testing and QA, and substantially more. Netflix utilizes Python web development services for its recommendation algorithm and media player. 

Massive community support 

Python has a steadily developing community that offers enormous help. From amateurs to specialists, there's everybody. There are a lot of instructional exercises, documentation, and guides accessible for Python web development solutions. 

Today, numerous universities start with Python, adding to the quantity of individuals in the community. Frequently, Python designers team up on various tasks and help each other with algorithmic, utilitarian, and application critical thinking. 

Progressive applications 

Python is the greatest supporter of data science, Machine Learning, and Artificial Intelligence at any enterprise software development company. Its utilization cases in cutting edge applications are the most compelling motivation for its prosperity. Python is the second most well known tool after R for data analytics.

The simplicity of getting sorted out, overseeing, and visualizing information through unique libraries makes it ideal for data based applications. TensorFlow for neural networks and OpenCV for computer vision are two of Python's most well known use cases for Machine learning applications.

Summary

Thinking about the advances in programming and innovation, Python is a YES for an assorted scope of utilizations. Game development, web application development services, GUI advancement, ML and AI improvement, Enterprise and customer applications - every one of them uses Python to its full potential. 

The disadvantages of Python web improvement arrangements are regularly disregarded by developers and organizations because of the advantages it gives. They focus on quality over speed and performance over blunders. That is the reason it's a good idea to utilize Python for building the applications of the future.

#python development services #python development company #python app development #python development #python in web development #python software development

Art  Lind

Art Lind

1602968400

Python Tricks Every Developer Should Know

Python is awesome, it’s one of the easiest languages with simple and intuitive syntax but wait, have you ever thought that there might ways to write your python code simpler?

In this tutorial, you’re going to learn a variety of Python tricks that you can use to write your Python code in a more readable and efficient way like a pro.

Let’s get started

Swapping value in Python

Instead of creating a temporary variable to hold the value of the one while swapping, you can do this instead

>>> FirstName = "kalebu"
>>> LastName = "Jordan"
>>> FirstName, LastName = LastName, FirstName 
>>> print(FirstName, LastName)
('Jordan', 'kalebu')

#python #python-programming #python3 #python-tutorials #learn-python #python-tips #python-skills #python-development

Art  Lind

Art Lind

1602666000

How to Remove all Duplicate Files on your Drive via Python

Today you’re going to learn how to use Python programming in a way that can ultimately save a lot of space on your drive by removing all the duplicates.

Intro

In many situations you may find yourself having duplicates files on your disk and but when it comes to tracking and checking them manually it can tedious.

Heres a solution

Instead of tracking throughout your disk to see if there is a duplicate, you can automate the process using coding, by writing a program to recursively track through the disk and remove all the found duplicates and that’s what this article is about.

But How do we do it?

If we were to read the whole file and then compare it to the rest of the files recursively through the given directory it will take a very long time, then how do we do it?

The answer is hashing, with hashing can generate a given string of letters and numbers which act as the identity of a given file and if we find any other file with the same identity we gonna delete it.

There’s a variety of hashing algorithms out there such as

  • md5
  • sha1
  • sha224, sha256, sha384 and sha512

#python-programming #python-tutorials #learn-python #python-project #python3 #python #python-skills #python-tips