Meta Learning Algorithms for Credit Card Fraud Detection



Similar documents
Credit Card Fraud Detection Using Hidden Markov Model

Fraud Detection in Credit Card Using DataMining Techniques Mr.P.Matheswaran 1,Mrs.E.Siva Sankari ME 2,Mr.R.Rajesh 3

A Study of Detecting Credit Card Delinquencies with Data Mining using Decision Tree Model

A COMPARATIVE ASSESSMENT OF SUPERVISED DATA MINING TECHNIQUES FOR FRAUD PREVENTION

A Novel Approach for Credit Card Fraud Detection Targeting the Indian Market

Electronic Payment Fraud Detection Techniques

A COMPARATIVE STUDY OF DECISION TREE ALGORITHMS FOR CLASS IMBALANCED LEARNING IN CREDIT CARD FRAUD DETECTION

Detecting Credit Card Fraud by Decision Trees and Support Vector Machines

CREDIT CARD FRAUD DETECTION BASED ON ONTOLOGY GRAPH

The Credit Card Fraud Detection Analysis With Neural Network Methods

REVIEW OF ENSEMBLE CLASSIFICATION

DATA MINING TECHNIQUES AND APPLICATIONS

Application of Hidden Markov Model in Credit Card Fraud Detection

To improve the problems mentioned above, Chen et al. [2-5] proposed and employed a novel type of approach, i.e., PA, to prevent fraud.

Data Mining Practical Machine Learning Tools and Techniques

Credit Card Fraud Detection Using Meta-Learning: Issues 1 and Initial Results

PROBLEM REDUCTION IN ONLINE PAYMENT SYSTEM USING HYBRID MODEL

Chapter 6. The stacking ensemble approach

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier

Survey on Credit Card Fraud Detection Techniques

Credit Card Fraud Detection Using Meta-Learning: Issues and Initial Results 1

A Review On Credit Card Fraud Detection Using BLAST-SSAHA Method

Data Mining. Nonlinear Classification

AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM

AN UPDATE RESEARCH ON CREDIT CARD ON-LINE TRANSACTIONS

A Hybrid Approach to Learn with Imbalanced Classes using Evolutionary Algorithms

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

Fraud Detection in Online Banking Using HMM

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

How To Detect Credit Card Fraud

ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS

Random forest algorithm in big data environment

Fraud Risk Prediction in Merchant-Bank Relationship using Regression Modeling

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Towards applying Data Mining Techniques for Talent Mangement

Model Combination. 24 Novembre 2009

Knowledge Discovery and Data Mining

Comparison of Data Mining Techniques used for Financial Data Analysis

Top Top 10 Algorithms in Data Mining

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.

A Secured Approach to Credit Card Fraud Detection Using Hidden Markov Model

Social Media Mining. Data Mining Essentials

Unsupervised Profiling Methods for Fraud Detection

Data Mining - Evaluation of Classifiers

DATA MINING APPLICATION IN CREDIT CARD FRAUD DETECTION SYSTEM

Utility-Based Fraud Detection

An Overview of Knowledge Discovery Database and Data mining Techniques

A Novel Classification Approach for C2C E-Commerce Fraud Detection

IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification

Top 10 Algorithms in Data Mining

An Introduction to Data Mining

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) ( ) Roman Kern. KTI, TU Graz

Customer Classification And Prediction Based On Data Mining Technique

Dr. U. Devi Prasad Associate Professor Hyderabad Business School GITAM University, Hyderabad

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Classification and Prediction techniques using Machine Learning for Anomaly Detection.

E-Banking Integrated Data Utilization Platform WINBANK Case Study

Artificial Neural Network and Location Coordinates based Security in Credit Cards

Data Mining. Concepts, Models, Methods, and Algorithms. 2nd Edition

Performance Evaluation of Requirements Engineering Methodology for Automated Detection of Non Functional Requirements

Probabilistic Credit Card Fraud Detection System in Online Transactions

Constrained Classification of Large Imbalanced Data by Logistic Regression and Genetic Algorithm

E-commerce Transaction Anomaly Classification

Performance Analysis of Decision Trees

Decision Tree Learning on Very Large Data Sets

(M.S.), INDIA. Keywords: Internet, SQL injection, Filters, Session tracking, E-commerce Security, Online shopping.

How To Use Neural Networks In Data Mining

Credit card fraud and detection techniques: a review

International Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: X DATA MINING TECHNIQUES AND STOCK MARKET

Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification

On the effect of data set size on bias and variance in classification learning

Effective Data Mining Using Neural Networks

Data Mining Classification: Decision Trees

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Question 2 Naïve Bayes (16 points)

Consolidated Tree Classifier Learning in a Car Insurance Fraud Detection Domain with Class Imbalance

Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100

Impelling Heart Attack Prediction System using Data Mining and Artificial Neural Network

Extension of Decision Tree Algorithm for Stream Data Mining Using Real Data

Research Article FraudMiner: A Novel Credit Card Fraud Detection Model Based on Frequent Itemset Mining

Data Mining Methods: Applications for Institutional Research

Comparison of K-means and Backpropagation Data Mining Algorithms

IMPLEMENTING CLASSIFICATION FOR INDIAN STOCK MARKET USING CART ALGORITHM WITH B+ TREE

Equity forecast: Predicting long term stock price movement using machine learning

Ensemble Data Mining Methods

A Perspective Analysis of Traffic Accident using Data Mining Techniques

A Survey on Outlier Detection Techniques for Credit Card Fraud Detection

ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA

Web Mining as a Tool for Understanding Online Learning

Machine Learning: Overview

Internet Banking Fraud Detection using HMM and BLAST-SSAHA Hybridization

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

Exploration of Data mining techniques in Fraud Detection: Credit Card Khyati Chaudhary

Transcription:

International Journal of Engineering Research and Development e-issn: 2278-67X, p-issn: 2278-8X, www.ijerd.com Volume 6, Issue 6 (March 213), PP. 16-2 Meta Learning Algorithms for Credit Card Fraud Detection 1 Sanjay Kumar Sen, 2 Prof. Dr Sujata Dash, 1 Asst. Professor (Comp Sc & Engg.), OEC, Bhubaneswar 2 DEAN(R & D) & PROFESSOR(CSE), GIFT,Bhubaneswar Abstract:- Due to the rapid advancement of electronic commerce technology, there is a great and dramatic increase in credit card transactions. As credit card becomes the most popular mode of payment for both online as well as regular purchase, cases of fraud associated with it are also rising; to detect credit card frauds in electronic transactions becomes the focus of risk of control of banks. The proposed work in this paper is the combination of five supervised machine learning algorithms, and Regression Tree (CART), Adaboost and Logitboost,Bagging and Dagging are proposed for classification of credit card data. These resulted forms help researchers to detect fraud in credit card. The experimental result shows the performance analysis of different meta-learning algorithms and also compared on the basis of misclassification and correct classification rate. Smaller misclassification reveals that bagging algorithm performs better classification of Credit card fraud detection technique. Keywords:-, Fraud detection, meta-learning algorithm, Machine Learning Algorithm, misclassification, classification. I. INTRODUCTION Data mining helps the use of complicated data analysis to discover valid patterns which are previously unknown and relationships among large data sets. These tools have mathematical algorithms, statistical models and machine learning methods. Data mining comprises of more than collection and management of data, representation in textual, qualitative or multimedia forms. Data mining applications uses a range of parameters to observe and analyse the data which includes association, classification, sequence or path analysis, clustering and forecasting. The popularity of online shopping is growing day by day. According to an ACNielsen study conducted in 25, one-tenth of the world s population is shopping online [1].Germany and Great Britain have the largest number of online shoppers, and credit card is the most popular mode of payment (59 percent). About 35 million transactions per year were reportedly carried out by Barclaycard, the largest credit card company in the United Kingdom, toward the end of the last century [2]. Retailers like Wal-Mart typically handle much larger number of credit card transactions including online and regular purchases. As the number of credit card users rises world-wide, the opportunities for attackers to steal credit card details and, subsequently, commit fraud are also increasing. The total credit card fraud in the United States itself is reported to be $2.7 billion in 25 and estimated to be $3. billion in 26, out of which $1.6 billion and $1.7 billion, respectively, are the estimates of online fraud [3]. The use of credit cards is common in modern day society. Fraud is a millions dough business and it is rising every year. Fraud presents significant cost to our financial prudence measure world wide.modern techniques based on Data mining, Machine learning, Sequence Alignment technique, Fuzzy Logic, Genetic Programming, Artificial Intelligence (AI) etc.,has been introduced for detecting & preventing credit/atm card, CHEQUE book type of fraudulent transactions. With great increase in credit cards, fraud has increasing excessively now a days. Since credit card becomes the most general mode of payment or both online as line as regular not only capturing the deceptive events but capturing such activities as rapidly as possible. II. RELATED WORK ON CREDIT CARD FRAUD DETECTION Credit card fraud detection has drawn a lot of research interest and a number of techniques, with special emphasis on data mining and neural networks, have been suggested. Ghosh and Reilly [4] have proposed credit card fraud detection with a neural network. They have built a detection system, which is trained on a large 16

sample of labelled credit card account transactions. These transactions contain example fraud cases due to lost cards, stolen cards, application fraud, counterfeit fraud, mail-order fraud, and non received issue (NRI) fraud. Recently, Syeda et al. [5] have used parallel granular neural networks (PGNNs) for improving the speed of data mining and knowledge discovery process in credit card fraud detection. A complete system has been implemented for this purpose. Stolfo et al. [6] suggest a credit card fraud detection system (FDS) using meta learning techniques to learn models of fraudulent credit card transactions. Meta learning is a general strategy that provides a means for combining and integrating a number of separately built classifiers or models. Aleskerov et al. [7] present CARDWATCH, a database mining system used for credit card fraud detection. The system, based on a neural learning module, provides an interface to a variety of commercial databases. Kim and Kim have identified skewed distribution of data and mix of legitimate and fraudulent transactions as the two main reasons for the complexity of credit card fraud detection [8]. Fan et al. [9] suggest the application of distributed data mining in credit card fraud detection. Brause et al. [1] have developed an approach that involves advanced data mining techniques and neural network algorithms to obtain high fraud coverage. Chiu and Tsai [11] have proposed Web services and data mining techniques to establish a collaborative scheme for fraud detection in the banking industry. [12] have done an extensive survey of existing data-mining-based FDSs and published a comprehensive report. Prodromidis and Stolfo [13] use an agent-based approach with distributed learning for detecting frauds in credit card transactions. It is based on artificial intelligence and combines inductive learning algorithms and meta learning methods for achieving higher accuracy. Phua et al. [14] suggest the use of meta classifier similar to [6] in fraud detection problems. They consider naive Bayesian, C4.5, and Back Propagation neural networks as the base classifiers. Vatsa et al. [15] have recently proposed a game-theoretic approach to credit card fraud detection. They model the interaction between an attack erandan FDS as a multistage game between two players, each trying to maximize his payoff. III. MACHINE LEARNING ALGORITHMS i. and Regression Tree(CART) It was introduce by Breimann 1984.It builds both classification and regression tree (Gini index measure is used for selecting splitting attribute. Pruning is done on training data set. It can deal with both numeric and categorical attributes and can also handle missing attributes. [16]. The CART monograph focus on the Gini rule which is similar to the better know entropy or information gain criterion [17]. For binary (/1) target the 'Gini measure of impurity" of a node t is: and regression free provide automatic construction of new features within each node and for the binary target. ii. Adaboost AdaBoost, short for Adaptive boosting is a machine algorithm, formulated by Yeave Freud and Robert Scapire. It is a meta-algorithm and can be used in conjunction with many other learning algorithms to improve their performance. AdaBoost is an algorithm for constructing a strong classifier as linear combination. AdaBoost is adaptive in the sense that subsequent classifiers built are tweaked in favour of those instances misclassified by previous classifiers. AdaBoost is sensitive to noisy data and outliers. In some problems, however, it can be less susceptible to the over fitting problem than most learning algorithms. The classifiers it uses can be weak (i.e., display a substantial error rate), but as long as their performance is not random (resulting in an error rate of.5 for binary classification), they will improve the final model. Even classifiers with an error rate higher than would be expected from a random classifier will be useful, since they will have negative coefficients in the final linear combination of classifiers and hence behave like their inverses. iii. Bagging The way of combining the decisions of different models means amalgamating the various outputs into a single prediction. The way of doing to do this is to calculate the average. In bagging the models receives equal weights. In case of bagging suppose that several training datasets of the same size are chosen at random from the problem domain. Suppose using a particular machine learning technique to build a decision dtree for each dataset, we might expect these trees to be practically identically and to make the same prediction for each new test instance. This is a disturbing fact and seems to cast a shadow over the whole enterprise. In bagging the models receive equal weights, whereas in boosting weighting is used to give more influence to the more successful one just as an executive might place different values on the advice of different experts depending on how experienced they are. To introduce bagging, several training datasets of the same size are chosen at random from the problem domain. Suppose using a particular machine learning technique to build a decision tree for each dataset, we might expect these trees to be practically identically and to make the same prediction for each new test instance. 17

iv. Logitboost LogitBoost is a boosting algorithm formulated by Jerome Friedmome, Trevor Hastie, and Robert Tibshirani. The original paper casts the Adaboost algorithm into a statistical framework. Specifically, if one considers AdaBoost as a generalized additive model and then applies the cost functional of logistic regression one can derive the LogitBoost algorithm. v. Grading We investigate another technique, which we call grading. The basic idea is to learn to predict for each of the original learning algorithms whether its prediction for a particular example is correct or not. We therefore train one classifier for each of the original learning algorithms on a training set that consists of the original examples with class labels that encode whether the prediction of this learner was correct on this particular example. The algorithm may also be viewed as an attempt to extend the work of Bay and Pazzani (2)[18] who propose to use a meta-classification scheme for characterizing model errors. Hence in contrast to stacking we leave the original examples unchanged, but instead modify the class labels. The algorithm may also be viewed as an attempt to extend the work of Bay and Pazzani (2)[19] who propose to use a metaclassification scheme for characterizing model errors. Their suggestion is to learn a comprehensible theory that describes the regions of errors of a given classifier. While the step of constructing the training set for the meta classifier is basically the same as in our approach,1 their approach is restricted to learning descriptive characterizations but cannot be directly used for improving classifier performance. The reason is that negative feedback when the meta classifier predicts that the base classifier is wrong only rules out the class predicted by the base classifier, but does not help to choose among the remaining classes (except, of course, for two-class problems). IV. PERFORMANCE COMPARISONS In literature classification algorithm, classifier performance can be measured on the same data. On the basis of results obtained Bagging algorithm is found better than other four algorithms. Comparisons of machine learning algorithms have been done on the basis of the misclassification and correct classification rate. It is observed that BAGGING machine learning classifier performance is better than and Regression Technique, Adaboost, Logitboost, Bagging and Grading linear and quadratic dis-criminant analysis classifier in context of misclassification rate and correct classification rate. Table 1 Algorithm misclassification Mis- CART.834.166 ADABOOST.847.153 LOGITBOOST.855.145 BAGGING.877.123 GRADING.536 64 According to obtained results of classification in table 1 following graph can be drawn. Graph for Mis-classification rate 18

CART ADABOO LOGITBO BAGGING GRADING CART ADABOOST LOGITBOOST BAGGING GRADING Fraud Detection using Meta Learning Algorithms in Credit Card System Mis-.9 1.8.7.5.3.1 Fig.1.Performace comparision graph between between CART,ADABOOST,LOGITBOOST,BAGGING, GRADING with respect to Mis-classification error rate Graph for correct classification rate.9 1.8.7.5.3.1 Fig.2.Performace comparision graph between between CART,ADABOOST,LOGIBOOST,BAGGING, GRADING with respect to correct classification error rate Graph for and Misclassification rate.8 1 Mis- Figure 3. Performance comparison graph between CART, ADABOOST, LOGIBOOST, BAGGING AND GRADING algorithm with respect to misclassification error rate and correct classification error rate. V. CONCLUSIONS This paper represents computational issues of five supervised machine learning algorithms i.e. Adaboost algorithm, Logitboost algorithm, and Regression technique algorithm, Bagging algorithm and Dagging algorithm with dedicating role of detection of credit fraud rating on the basis of classification rule. Among five algorithms, Bagging algorithm is the best because the Bagging algorithm is easier to interpret and understand as compared to Adaboost,Llogitboost, and Regression Tree 19

algorithm and Dagging algorithm. In order to compare the classification performance of three machine learning algorithm, classifiers are applied on same data and results are compared on the basis of misclassification and correct classification rate and according to experimental results in table 1, it can be concluded that bagging algorithm is the best as compared to classification and regression tree, is best as compared to adaboost, logitboost, classification and regression algorithm and Grading algorithm. REFERENCES [1]. Global Consumer Attitude Towards On-Line Shopping, http://www2.acnielsen.com/reports/documents/25_cc_online shopping.pdf, Mar. 27. [2]. D.J. Hand, G. Blunt, M.G. Kelly, and N.M. Adams, Data Mining for Fun and Profit, Statistical Science, vol. 15, no. 2, pp. 111-131,2. [3]. Statistics for General and On-Line Card Fraud, http://www.epaynews.com/statistics/fraud.html, Mar. 27. [4]. S. Ghosh and D.L. Reilly, Credit Card Fraud Detection with a Neural-Network, Proc. 27th Hawaii Int l Conf. System Sciences:Information Systems: Decision Support and Knowledge-Based Systems, vol. 3, pp. 621-63, 1994. [5]. M. Syeda, Y.Q. Zhang, and Y. Pan, Parallel Granular Networks for Fast Credit Card Fraud Detection, Proc. IEEE Int l Conf. Fuzzy Systems, pp. 572-577, 22. [6]. S.J. Stolfo, D.W. Fan, W. Lee, A.L. Prodromidis, and P.K. Chan, Credit Card Fraud Detection Using Meta-Learning: Issues and Initial Results, Proc. AAAI Workshop AI Methods in Fraud and Risk Management, pp. 83-9, 1997. [7]. E. Aleskerov, B. Freisleben, and B. Rao, CARDWATCH: A Neural Network Based Database Mining System for Credit Card Fraud Detection, Proc. IEEE/IAFE: Computational Intelligence for Financial Eng., pp. 22-226, 1997. [8]. M.J. Kim and T.S. Kim, A Neural Classifier with Fraud Density Map for Effective Credit Card Fraud Detection, Proc. Int l Conf. Intelligent Data Eng. and Automated Learning, pp. 378-383, 22. [9]. W. Fan, A.L. Prodromidis, and S.J. Stolfo, Distributed Data Mining in Credit Card Fraud Detection, IEEE Intelligent Systems, vol. 14, no. 6, pp. 67-74, 1999. [1]. R. Brause, T. Langsdorf, and M. Hepp, Neural Data Mining for Credit Card Fraud Detection, Proc. IEEE Int l Conf. Tools with Artificial Intelligence, pp. 13-16, 1999. [11]. C. Chiu and C. Tsai, A Web Services-Based Collaborative Scheme for Credit Card Fraud Detection, Proc. IEEE Int l Conf.e-Technology, e-commerce and e-service, pp. 177-181, 24. [12]. C. Phua, V. Lee, K. Smith, and R. Gayler, A Comprehensive Survey of Data Mining-Based Fraud Detection Research, http://www.bsys.monash.edu.au/people/cphua/, Mar. 27. [13]. S. Stolfo and A.L. Prodromidis, Agent-Based Distributed Learning Applied to Fraud Detection, Technical Report CUCS-14-99, Columbia Univ., 1999. [14]. C. Phua, D. Alahakoon, and V. Lee, Minority Report in Fraud Detection: of Skewed Data, ACM SIGKDD Explorations Newsletter, vol. 6, no. 1, pp. 5-59, 24. [15]. V. Vatsa, S. Sural, and A.K. Majumdar, A Game-theoretic Approach to Credit Card Fraud Detection, Proc. First Int l Conf. Information Systems Security, pp. 263-276, 25. [16]. Matthew N. Anyanwu and Sajjan G. Shiva, Comparative Analysis of Serial Decision Tree Algorithms, International Journal of Computer Science and Security, (IJCSS) Volume (3) : Issue (3),.pp:23-24. [17]. XindongWu,Vipin Kumar, J. Ross Quinlan,Joydeep Ghosh, Qiang Yang,Hiroshi Motoda, et al Top 1 algorithm in data mining according to the survey paper of Xindong wu et al know (inf syst (28) @springer verlog London limited 27,pp:1-37, Dec. 27. [18]. Bay, S. D., & Pazzani, M. J. (2). Characterizing model errors and differences. In Proceedings of the 17th International Conference on Machine Learning (ICML- 2). Morgan Kaufmann. [19]. Bay, S. D., & Pazzani, M. J. (2). Characterizing model errors and differences. In Proceedings of the 17th International Conference on Machine Learning (ICML- 2). Morgan Kaufmann. 2