170 likes | 497 Views
Credit Card Fraudulent Transaction Detection. As a part of CSC 219: Final Project Presentation Team Members : (Group #10) - Darshit Pandya - Sreeteja K. Guided By : - Dr. Meiliu Lu. Abstract.
E N D
Credit Card Fraudulent Transaction Detection As a part of CSC 219: Final Project Presentation Team Members: (Group #10) - Darshit Pandya - Sreeteja K. Guided By: - Dr. Meiliu Lu
Abstract • Financial fraud is a developing threat with many consequences in the finance industries, corporate companies and government organizations. • From many criminal activities occurring in the financial industry, credit card fraudulent activities are the most prevalent. • It is important for the credit card companies to be able to detect the fraud transactions so that the customers won’t get charged for the items they did not purchase.
Why important? • The credit card fraud detection becomes challenging for the following reasons: • The profiles of the genuine users and fraudulent behaviors change constantly. • Rate of online transactions have grown exponentially • The credit card fraud data sets are highly skewed. • Detecting fraudulent transactions using traditional method of manual detection is time consuming and inefficient Hence, it is necessary to develop a credit card fraud detection technique as a counter measure to fight illegal activities.
What will we be doing? • In this term project, we will try to analyze 280k transactions with different attributes. (The name of the attributes are kept secret as to maintain the privacy of the user data) • Itinerary: • Analyze the correlation between attributes. • Analyze the effect of attributes’ values on target • Feature Engineering • Balancing/Sampling the skewed dataset • Application of the machine learning algorithms • Trying Deep Neural Nets • Comparing the models designed • Improvisation techniques
How the data looks like? • The original data has 280K instances and 33 attributes • The Class attribute identifies transaction as Fraud[1] or Normal[0]. • The distribution of data is as: HIGHLY SKEWED - WE KNOW!!!!!
Step 1:Data Visualization • The original data has 225k instances and 33 attributes • In this step, we have tried to visualize in the data by finding • Cor-relation • Target Value Impact • Distribution of the attribute values • Density distribution • Outliers visualization
Step 2: Data preprocessing • For data-preprocessing step, we performed • Missing Values check and removal if any • Remove unnecessary features • Remove the outliers • Scale the values of attributes like Time and Amount
Step 3: Application of Naïve-ML Algorithms • Considering the unbalanced dataset, we will try to apply the naive machine learning algorithms like • Logistic Regression • K-Nearest Neighbors • Support Vector Machine • Decision Tree • Random Forest • GridSearchCV For model evaluation, we will try to evaluate the model using Confusion Matrix and F1-Score
Step 4: Deep Neural Networks • We have tried to use a Dense Neural Network which is originally titled as 'Artificial Neural Network' using Keras Framework. • For this approach, we have only used unbalanced dataset. • We have only used the Dense Layers in our approach by applying several optimizers like Adam and varying number of neurons in the complete layer.
Step 5: Data Balancing • As the data is unbalanced, the predictions are tend to be biased • Naïve Machine Learning Algorithms are tend to get impacted by skewed data • For doing random sampling, below equation has been implemented. • value_count=Minimum Dist Value+((total_count_cat/minumum_dist_value∗2)−2)
Step 5: Contd… • The balanced dataset looks as below:
Project Demo • We will demo a notebook created on Google Colab with all minute details implemented in the project. • Let’s GO!
Conclusion • Data Imbalance can cause bias in the prediction • SVM, K-NN and Random Forest performs comparatively better • The Best F1-Score(the parameter of evaluation) was received using Random Forest 0.99 • Applying Dense Neural Network to the dataset will help in case of unbalanced dataset too • Applying data sampling techniques can help to remove the bias in the prediction.