BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES



Similar documents
Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

A Survey on Outlier Detection Techniques for Credit Card Fraud Detection

Dimensionality Reduction: Principal Components Analysis

Network Intrusion Detection using Semi Supervised Support Vector Machine

Object Recognition and Template Matching

A Content based Spam Filtering Using Optical Back Propagation Technique

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression

The Data Mining Process

Principal Component Analysis

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Introduction to Principal Components and FactorAnalysis

A Logistic Regression Approach to Ad Click Prediction

Anomaly detection. Problem motivation. Machine Learning

Supervised Feature Selection & Unsupervised Dimensionality Reduction

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

Knowledge Discovery from patents using KMX Text Analytics

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

DATA MINING TECHNIQUES AND APPLICATIONS

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Support Vector Machines with Clustering for Training with Very Large Datasets

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

AUTOMATIC THEFT SECURITY SYSTEM (SMART SURVEILLANCE CAMERA)

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics

Predicting Solar Generation from Weather Forecasts Using Machine Learning

Using Data Mining for Mobile Communication Clustering and Characterization

Introduction: Overview of Kernel Methods

Index Terms: Face Recognition, Face Detection, Monitoring, Attendance System, and System Access Control.

Prof. Pietro Ducange Students Tutor and Practical Classes Course of Business Intelligence

Adaptive Face Recognition System from Myanmar NRC Card

Classification of Bad Accounts in Credit Card Industry

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

STA 4273H: Statistical Machine Learning

Data Mining Part 5. Prediction

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets

Rachel J. Goldberg, Guideline Research/Atlanta, Inc., Duluth, GA

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j

A Survey on Pre-processing and Post-processing Techniques in Data Mining

Knowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19 - Bagging. Tom Kelsey. Notes

How To Cluster

Azure Machine Learning, SQL Data Mining and R

Social Media Mining. Data Mining Essentials

Nonlinear Iterative Partial Least Squares Method

4. GPCRs PREDICTION USING GREY INCIDENCE DEGREE MEASURE AND PRINCIPAL COMPONENT ANALYIS

Machine Learning in Statistical Arbitrage

Multiple Kernel Learning on the Limit Order Book

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Part 2: Analysis of Relationship Between Two Variables

Principal components analysis

Efficient Attendance Management: A Face Recognition Approach

Content-Based Recommendation

Linear Threshold Units

How To Identify A Churner

Principal Component Analysis

Analysis Tools and Libraries for BigData

Volume 2, Issue 9, September 2014 International Journal of Advance Research in Computer Science and Management Studies

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Credit Card Fraud Detection Using Self Organised Map

High-Dimensional Data Visualization by PCA and LDA

The Scientific Data Mining Process

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS

10426: Large Scale Project Accounting Data Migration in E-Business Suite

Detecting Network Anomalies. Anant Shah

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

Data Mining Analytics for Business Intelligence and Decision Support

A Survey of Classification Techniques in the Area of Big Data.

Maschinelles Lernen mit MATLAB

Data Mining and Pattern Recognition for Large-Scale Scientific Data

Data, Measurements, Features

High Productivity Data Processing Analytics Methods with Applications

Factor analysis. Angela Montanari

Cleaned Data. Recommendations

Towards better accuracy for Spam predictions

CS Introduction to Data Mining Instructor: Abdullah Mueen

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

A Regression Approach for Forecasting Vendor Revenue in Telecommunication Industries

Making Sense of the Mayhem: Machine Learning and March Madness

Joint models for classification and comparison of mortality in different countries.

Multivariate Analysis (Slides 13)

Dan French Founder & CEO, Consider Solutions

Data Mining for Knowledge Management. Classification

Factor Analysis. Chapter 420. Introduction

T-test & factor analysis

Final Project Report

Prescriptive Analytics. A business guide

Scalable Developments for Big Data Analytics in Remote Sensing

Partial Least Squares (PLS) Regression.

Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza

Understanding Web personalization with Web Usage Mining and its Application: Recommender System

Association rules for improving website effectiveness: case analysis

Subspace Analysis and Optimization for AAM Based Face Alignment

Chapter 6. The stacking ensemble approach

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

Feature Selection vs. Extraction

Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management

DATA MINING AND REPORTING IN HEALTHCARE

Employer Health Insurance Premium Prediction Elliott Lui

How To Use Data Mining For Loyalty Based Management

Transcription:

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123

CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents promising ways to fraud detection, the number of attributes provided to it is huge. This generally leads to large processing times. Some attributes might even have the probability of creating a negative impact on the result. Hence identification of attributes plays a crucial role in determining the accuracy of the final result. Raw Data Figure 7.1 Data Processing in Behavior Based SVM. Further processing the attributes that are dormant in the decision making process is leads to unnecessary wastage of the processing time. All these problems 124

can be overcome by selecting the appropriate attributes initially and then passing them to the SVM. Figure 7.1 shows the Data processing used in this approach. 7.2 Data Pre-Processing When the input data to an algorithm is too large to be processed and it is suspected to be redundant then the input data will be transformed into a reduced representation set of features. Transforming the input data into the set of features is called feature extraction. If the features extracted are carefully chosen it is expected that the features set will extract the relevant information from the input data in order to perform the desired task using this reduced representation instead of the full size input. Feature extraction is special form dimensionality reduction. Here, the features represent the relevant characteristics of the input data are chosen. Instead of using full size input one may use this reduced representation set. If it is properly chosen then it will give successful task. Best results are achieved when an expert constructs a set of application-dependent features. Nevertheless, if no such expert knowledge is available general dimensionality reduction techniques may help. In this process, feature extraction is performed as the initial process. The Principal Component Analysis (PCA) is used for feature compression. PCA is a mathematical procedure that uses an orthogonal transformation to translate a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components [97]. The number of principal components is less than or equal to the number of original variables. This conversion is defined in such a way that the first principal component has the largest promising variance, and each succeeding component in turn has the highest variance possible 125

under the constraint that it be orthogonal to the preceding components. Principal components are assured to be independent if the data set is jointly normally distributed. PCA is responsive to the relative scaling of the original variables. Principal Component Analysis (PCA) is a suitable tool for feature compression. The original feature space is reduced to low dimensional spaces but it will not affect the solution. The computational cost also less for training and testing the SVM because of using PCA. PCA can be done by eigen value decomposition of a data covariance matrix, usually after mean centering the data for each feature. The covariance matrix is calculated by the following equation 7.1 covariance(x,y) = 1 1 (x x)(y y) (7.1) where, x i and y i are the values of variables, x and y are the mean variables, n is number of objects. The results of PCA come in the form of component scores. By using this formulation the principal components are calculated and the training dataset is reduced. This method used to reduce the training time. The adoption of this method makes the classifier to perform in an efficient time manner. 7.3 Behavior Based Approach Analyzing the spending behavior pattern of the customer is a promising way to detect the credit card frauds. If any new transactions deviate from this behavior, then those transactions are considered as fraudulent transactions. Each person has a 126

different spending behavior pattern. Behavior based fraud detection model means that the data use in the model are from the transactional behavior of cardholder directly or derived from them. Fraud detection based on the analysis of existing spending behavior of cardholder is a promising way to find the credit card frauds. Based on the spending pattern the customer s usual activities such as transaction amount, billing address etc are learned. Some of the anomalous behaviors include variation of billing address and shipping address, maximum amount of purchase, large transaction done far away from the living place etc. Figure 7.2 shows the behavior based fraud detection process using SVM. Figure 7.2 Behavior Based Fraud Detection using SVM 127

Initially, the attributes used in the dataset are converted into numerical data. Feature selection is a very important stage in fraud detection. The features in the data efficiently portray the usage behavior of an individual. In this model, the features which interpret the behavior of the customer are selected for detection. Adding irreverent features make the classifier inefficient. Transaction amount is the most important behavior it varies from person to person. Frequency of card usage is calculated from the Date and Time Attributes. Average amount of transactions are calculated from each transactions. After the completion of the feature selection process, the categorical attributes available in the data are converted to numerical attributes. These numerical attributes are then normalized. The normalized data falls within a defined boundary. This boundary is usually (0, 1) or (-1, +1) for SVM. The appropriate boundary is defined by the SVM designer. After the normalization process, the training and testing data sets are created. The training set is passed to the SVM. C and values are tuned in order to obtain the appropriate pair for the current dataset. The training data is then passed to the SVM for final verification. After obtaining satisfying results, the SVM can be deployed for real-time data processing. 7.4 Summary Using SVM alone cannot guarantee accurate results. Further, SVM has a high false positive rate and SVM processing is intensive due to the availability of a large number of attributes. To improve this, a behavior based pruning technique is used. After the preprocessing is complete, an analysis is performed to detect attributes with 128

respect to their importance. All the available important attributes are alone passed into the SVM for further processing. This helps in faster processing and improved accuracy. 129