In this article, we’ll be leveraging the power of deep learning to solve a key issue that credit card companies often have to address, namely detecting fraudulent transactions. For this task, a Deep Neural Network (DNN) will be trained in order to do exactly that. We’ll walk through the following steps:

  1. Data Overview
  2. Data Preprocessing
  3. DNN model building
  4. DNN model evaluation
  5. Conclusion

Without further ado, let’s get started!

1. Data Overview

We got the dataset we’re using today from Kaggle and it contains two days’ worth of transactions by European cardholders. It’s important to note that due to the confidential nature of the data, a PCA transformation was done on 28 features and we have no information on what those feature names are. The only features that haven’t undergone this transformation and we can identify are ‘Time’, ‘Amount’, and ‘Class’. ‘Time_’_ represents the seconds elapsed between each transaction and the first transaction in the dataset. ‘Amount_’ _denotes the amount of each transaction and ‘Class’ refers to our target variable with 0 referring to a normal transaction and 1 referring to a fraudulent one. Our dataset also contains a total of 284,807 rows.

One thing we’d expect (and hope) from this dataset is that the target variable’s instances are imbalanced. This makes sense, right? It should as fraudulent transactions typically represent a minority of the cases. We can confirm this using the code below. We’ll also be changing the target variable’s column name so that it’s more intuitive.

#Rename Class
data.rename(columns={"Class": "isFraud"}, inplace=True)

#Percentage of fraud
fraud_per = data[data.isFraud == 1].isFraud.count() / data.isFraud.count()
print(fraud_per)

This indeed turned out to be the case as only 0.17% of our transactions are fraudulent (whew)! While a low percentage of credit card fraud is certainly good news for a credit card company, it actually threatens the predictive performance of our network, especially for the fraudulent cases, so we’ll remedy this later using SMOTE. We’ll now investigate whether our dataset contains missing data.

## Looking for missing data
print(data.isnull().any().sum())

The output shows that our dataset contains no missing values. Next, a correlation matrix can help give us an all-rounded understanding of how the variables in our dataset relate to each other.

#Correlation Plot
plt.figure(figsize = (14,10))
plt.title('Correlation Plot', size = 20)
corr = data.corr()
sns.heatmap(corr,xticklabels=corr.columns,yticklabels=corr.columns,linewidths=.1,cmap="Blues",fmt='.1f',annot=True)
plt.show()

#neural-networks #deep-learning #tensorflow

Detecting Credit Card Fraud using Tensorflow
21.30 GEEK