In this tutorial, we'll learn How to Use SMOTE to Detect Fraud in Python. What is so special about it that it is so appreciated. Let's explore it with us now.
Because I have been entering Kaggle’s monthly tabular competitions in an attempt to improve my programming skills, I recently earned a bronze medal on the May 2021 tabular competition for using SMOTE to correct class imbalances. The link to my post on how I won a bronze medal on a kaggle competition can be found at:- https://medium.com/codex/how-i-won-a-bronze-medal-on-a-kaggle-competition-4abd0424c479
SMOTE is an acronym for Synthetic Minority Oversampling Technique. This is a way to correct for class imbalances so that more accurate predictions. SMOTE is one of the most commonly used techniques used to address the issue of class imbalances. It aims to balance class distribution by randomly increasing minority class examples by replacing them. SMOTE synthesises new minority instances between existing minority instances.
When I was researching SMOTE, I read that it is customary to undersample the majority class and then oversample the minority class, but I decided not to go this route to exploration because I also read that one of the disadvantages of undersampling the majority class is a loss of valuable data.
I decided that since SMOTE is an important tool when it comes to class imbalances, it would be a good idea to try the technique out on a dataset that has a very severe class imbalance, the fraud detection dataset. The fraud detection dataset can be found on the Kaggle website, with the link bto it being here:- https://www.kaggle.com/mlg-ulb/creditcardfraud
I created the program that I would use to predict on credit card fraud with Kaggle’s free online Jupyter Notebook. This is a good notebook to use because the dataset is stored on the platform’s working directory and the user does not have to use any of his own personal memory requirements. In addition, the libraries already installed on Kaggle, which makes it so much easier to write programs in Python.
When I created the Jupyter Notebook, I imported the libraries that I would need to execute the program. The first three libraries that I would use were pandas, numpy and os. Pandas is used to create and manipulate dataframes, numpy is used to perform algebraic operations, and os retrieves the datasets from the working directory of Kaggle.
🔵 Intellipaat Data Science with Python course: https://intellipaat.com/python-for-data-science-training/In this Data Science With Python Training video, you...
In Conversation With Dr Suman Sanyal, NIIT University,he shares his insights on how universities can contribute to this highly promising sector and what aspirants can do to build a successful data science career.
Enroll in our Data Science with Python training in Chennai. Best Data Science with Python Training courses in Chennai for 100% Job Placements Support.
🔥Intellipaat Python for Data Science Course: https://intellipaat.com/python-for-data-science-training/In this python for data science video you will learn e...
Master Applied Data Science with Python and get noticed by the top Hiring Companies with IgmGuru's Data Science with Python Certification Program. Enroll Now