I am writing my brief approach about** “India ML Hiring Hackathon 2019-Loan Delinquency Prediction”**. It’s a highly imbalanced dataset in which imbalances between both target variables are 99.9% to 0.05 %.

Due to such imbalances, I hypothesize that classifiers like Logistic Regression, K nearest neighbor, Naive Bayes classifier will not work efficiently. So I believe in tree-based algorithms such as Random Forest Classifier, Adaboost, Catboost, Gradient Boosting, etc. Among all the above-mentioned algorithms Random Forest classifier performs best.

To tackle the imbalanced dataset I used **ADASYN **( Adaptive Synthetic sampling approach). It is an oversampling technique, which helped me balance the dataset and overcome the situation in which our model only predicts the majority class. I hyper tuned my random forest classifier by varying depth between [6,15] and min sample split between[2,5]. I used the **criterion **as **“Entropy” **because entropy has an edge in some data cases involving a high imbalance.

I have given class weight { 0:0.8, 1:1 } to minimize **‘class1’ error **which means “It is ok that If I don’t give loan to 2–3 good guys but I don’t want to give loan to those people who can be loan defaulters”.

#hackathons #random-forest #machine-learning #deep-learning #artificial-intelligence

How I Ranked 2ⁿᵈ In Analytics Vidya’s “All India ML Hiring Hackathon 2019”
1.70 GEEK