The aim is to develop an end to end web application that helps a user to determine his chances of getting a loan. This is done by implementing several different statistical learning techniques to try to estimate the probability of default for each loan. Machine learning models are used to classify a set of unseen data and statistical metrics are used to compare the results. The finalized model is then deployed using a flask library on the Heroku servers and a website is created for the user to navigate and predict his chances of acquiring a loan. The structure of the post is as follows:

1. Data Overview

2. Data science and Exploratory data analysis.

3. Machine Learning and Deployment.

4. Tableau Visualization.

5. Summary

Data Overview

Lending club data consists of 2,195,670 rows and 151 columns. The source of the data is from the Lending Club Website or Kaggle competitions have the latest updated datasets. The target column was identified as loan_status. For this analysis only Fully Paid Loans and Charged off/ Defaulted Loans have been taken into consideration, which reduced the data set to 1,344,251 rows and 151 columns. Upon eyeballing the data, it was divided into two subsets grouped basis on the Application Type. The reason being the features that were associated with the second applicant was null for Individual loans, which might get deleted during the data cleaning process.

The issue date was one of the features which describes when a loan is issued to the applicant. For both the subsets of data, 15% of the latest issued loans has been taken as the test data on which the final model will be tested.

#machine-learning #lending #projects #tableau #data-science #data analysis

Lending Club Data Web App
2.40 GEEK