A COMPARATIVE ASSESSMENT OF SUPERVISED DATA MINING TECHNIQUES FOR FRAUD PREVENTION
|
|
|
- Alyson Benson
- 10 years ago
- Views:
Transcription
1 A COMPARATIVE ASSESSMENT OF SUPERVISED DATA MINING TECHNIQUES FOR FRAUD PREVENTION Sherly K.K Department of Information Technology, Toc H Institute of Science & Technology, Ernakulam, Kerala, India. [email protected] Abstract- The extensive growth of internet and the vast financial possibilities opening up, more and more systems are subject to attack by intruders. In a competitive environment fraud become a business critical problem. It is very important to prevent unauthorized access to system resources and data. To build a completely secure system, in addition with the authentication process behavior analysis also required, prior to completing a transaction. Many companies have interactions with millions of external parties; it is cost prohibitive to manually check the external party s identities and activities. So the riskiest one can be determined through data mining techniques. This study evaluates three classification methods to solve the fraud detection problems for data mining and shows how advanced techniques can be combined successfully to obtain high fraud coverage with maximum confidence and minimum false alarm rate. Keywords- Bayesian classifier, Data mining, Decision tree, Fraud detection, neural network I. INTRODUCTION Due to the extensive growth of E-Commerce fraud detection has become a necessity. The term fraud here refers to the abuse of a profit organizations system without necessarily leading to direct legal consequences. Fraud detection is a continuously evolving discipline and ever changing tactics to commit fraud. It is the company and card issuer s interest to prevent fraud or failing this to detect fraud as soon as possible. Otherwise consumer trust in both the company and the card decreases and revenue is lost in addition to the direct losses made through fraudulent sales. Fraud is an adaptive crime and it is increasing every year. So it needs special methods of intelligent data analysis to detect and prevent it. Classification and prediction are two forms of data analysis that can be used to extract models. Many classification and prediction methods have been proposed by researchers in Data Mining. This paper discusses three main classification techniques used to prevent fraud in data mining. One main objective is to evaluate the use of data mining methods in differentiating fraud and non-fraud observations. This paper is organized as follows. A brief description about the related work is given in section 2. Decision tree model functionalities in fraud detection are given in section 3. Section 4 describes the neural network approach in card fraud detection. The application Bayesian classifier in credit card fraud detection is described in section 5. Model testing and evaluation are discussed in section 6 and section 7 concludes the paper. II. RELATED WORK Credit card fraud detection has drawn a lot of research and a special emphasis on a data mining have been suggested. Ghosh and Reilly [1] have proposed credit card fraud detection with a neural network. They have built a detection system, which is trained on a large sample of labeled credit card account transactions. These transactions contain example fraud cases due to lost cards, stolen cards, application fraud, counterfeit fraud, mail-order fraud, and nonreceived issue (NRI) fraud. Aleskerov et al. [2] present CARDWATCH, a database mining system used for credit card fraud detection. The system, based on a neural learning module, provides an interface to a variety of commercial databases. Syeda et al. [3] have used parallel granular neural networks (PGNNs) for improving the speed of data mining and knowledge discovery process in credit card fraud detection. A complete system has been implemented for this purpose. Fan et al. [7] suggest the application of distributed data mining in credit card fraud detection. Brause et al. [8] have developed an approach that involves advanced data mining techniques and neural network algorithms to obtain high fraud coverage. Stolfo et al. [9] suggest a credit card fraud detection system (FDS) using metalearning techniques to learn models of fraudulent credit card transactions. Metalearning is a general strategy that provides a means for combining and integrating a number of separately built classifiers or models. They consider naïve Bayesian, C4.5, and Back Propagation neural networks as the base classifiers. A metaclassifier is used to determine which classifier should be considered based on skewness of data. Phua et al. [10] have done an extensive survey of existing data-mining-based FDSs and published a comprehensive report. Prodromidis and Stolfo [11] use an agent-based approach with distributed learning for detecting frauds in credit card transactions. It is based on artificial intelligence and combines inductive learning algorithms and metalearning methods for achieving higher accuracy. The following are three classifying techniques used in fraud detection. 1
2 III. DECISION TREE CLASSIFIER Decision trees are powerful and popular tools for classification and prediction. Decision tree can be used to predict a pattern or to classify the class of a data. It is a decision support tool that uses a tree-like graph where each internal node denotes a test on an attribute, each branch represents an outcome of the test and each leaf node holds a class label. A decision tree produces a sequence of rules (or series of questions) that can be used to recognize the class. Rules can readily be expressed so that humans can understand them or even directly used in a database access language like SQL so that records falling into a particular category may be retrieved. Decision tree programs construct a decision tree from a set of training cases. There are several most popular decision tree algorithms such as ID3, C4.5 and CART (classification and regression trees). The central focus of the decision tree growing algorithm is selecting which attribute to test at each node in the tree. The goal is to select the attribute that is most useful for classifying examples. A good quantitative measure of the worth of an attribute is a statistical property called information gain that measures how well a given attribute separates the training examples according to their target classification. This measure is used to select among the candidate attributes at each step while growing the tree. In order to define information gain precisely, we need to define a measure commonly used in information theory, called entropy, that characterizes the (im)purity of an arbitrary collection of examples. A. Entropy Entropy is the quantitative measure of disorder in a system. In decision tree construction, entropy is used to determine which node to split next. It is a measure of impurity degree. If a target attribute takes on c different values, then the entropy of set D 1 is defined as transaction data is shown in table 1. Four transaction attributes such as transaction amount, time, merchant and city of purchase/order placing from the transaction data which are relevant for identifying the user spending behavior are considered for the tree construction. Each individual transaction amount usually depends on the corresponding type of item purchased, which also has a great role in identifying the spending nature of the card holder. The type of each purchase is linked to the type of business of the corresponding merchant. The four continuous parameters such as transaction amount, transaction time/frequency of transaction, type/quality of item purchased and billing/order placing city are converted into categorical parameters. The transaction time attribute categorization is done by dividing a month into four weeks and each week is divided into two slots, namely, weekday (wd) and weekend (we).thus transaction date is categorized into eight groups wd1, we1, wd2, we2, wd3, we3, wd4, we4. The transaction amount be quantized into three different levels - Low, Medium and High. The purchase item can be categorized in to five groups such as Textile items (Ti), Electronic items (EI), Gold (Gl), Medical (MD) and Miscellaneous (Mi) purchases. Some merchant may sell variety items, the item purchased from these merchant may consider as miscellaneous for convenience. TABLE 1: Card Transaction Data B. Information Gain Information gain is a measure of the effectiveness of an attribute A in classifying the training data and is defined as the difference between the original information requirement based on just the proportion of classes and the new requirement obtained after partitioning on A. Information gain is computed as impurity degrees of the parent table and weighted summation of impurity degrees of the subset table. The attribute with maximum gain ratio is selected as splitting attribute. Information gain (i) = Entropy of parent table D Sum (n k /n * Entropy of each value k of subset table Si) C. Decision Tree approach in card fraud prediction We will use past few year credit card transaction data of different customers to construct decision tree. Sample (1) TABLE 2: Categorized Data 2
3 The fourth attribute city of purchase/order placing is also an important parameter which can assist fraud detection easily. In case of physical card using transaction, consider the city of purchase parameter, else for the online transaction order placing city has relevant role in transaction. In online transaction order placing city can be identified with the IP address. Transactions come from dynamic IPs shows irregular behavior. Therefore to identify the fraud classify city of purchase/order placing into three category local (LC), national (NC) and international (IC). All these converted attributes shown in table 2 can be used as the input data to create a decision tree in identifying the fraud. Sample decision tree constructed is shown in fig.1 B. Neural network in the context of card fraud detection There are different kinds of neural networks and neural network algorithms. The most popular neural network algorithm is back propagation which works on multilayer feed-forward networks will be best suited for card fraud detection. A multilayer feed forward neural network consists of an input layer, one or more hidden layers, and output layer. Each output unit takes as input, a weighted sum of the outputs from units in the previous layer. It applies an activation function to the weighted input. X1 W1i X2 w2i. wjk. Xi wii. Ok. wni xn I/P layer hidden layer O/P layer Fig.2 A Multilayer feed-forward neural network Fig 1. Sample Decision Tree IV. NEURAL NETWORK CLASSIFIER Neural networks resemble the human brain. It can acquire knowledge through learning. The knowledge is stored within inter-neuron connection strengths known as synaptic weights. The network is composed of a large number of highly interconnected processing elements (neurons) working in parallel to solve a specific problem. The disadvantage is that because the network finds out how to solve the problem by itself, its operation can be in predictable. A. Artificial neuron An artificial neuron is a device with many inputs and one output. The neuron has two modes of operation, the training mode and the using mode. In the training mode neuron can be trained to find (or not), for particular input patterns. In the using mode when a taught input pattern is detected at the input, its associated output becomes the current output. If the input pattern does not belong in the taught list of input patterns, the firing rule is used to determine whether to fire or not. Back propagation learns by interactively processing a data set of training tuples, comparing the network s prediction for each tuple with the actual known value. The target value may be the known class label of the training tuple or a continuous value, for each training tuple the weights are modified. So as to minimize the mean squared error between the networks prediction and the actual target value. These modifications are made in the backward direction that is from the output layer, through each hidden layer down to the first layer. In general the weights well eventually converge and the learning process stops. So a neural network is just a function with a number of weights which produces a score based on the data within card transactions. If you were to assign random values to the weights then this function would be unlikely to generate meaningful score and good detection performance. So the weights in the neural network need to be optimized. This optimization process is often referred to as learning or training and involves an iterative process of passing through a historical data base of card transactions (with fraudulent and legitimate transactions clearly identified) and systematically adjusting the weighs so that the score discriminates well between fraudulent and legitimate transactions. The term intelligence used in connection with neural networks refers to the knowledge about fraud patterns which is reflected in the values of the weights of the trained network. Intelligence= the value of the weights. 3
4 The network can be simplified by removing weighted links that have the least effect on the trained network and is called network pruning. Once the trained network has been pruned, clustering is used to find the set of common activation values for each hidden unit in a given trained two-layer neural network. The combinations of these activation values for each hidden unit are analyzed. Rules are derived relating combinations of activation values with corresponding output unit values. Similarly the sets of input values and activation values are studied to derive rules describing the relationship between the input and hidden unit layers. Finally, the two sets of rules may be combined to form IF-THEN rules. A major disadvantage of neural networks lies in their knowledge representation. Acquired knowledge in the form of a network of units connected by weighted links is difficult for humans to interpret. V. BAYESIAN CLASSIFIER Bayesian classifiers are statistical classifiers. They can predict class member probabilities, such as the probability that a given tuple belongs to a particular class. Bayesian classification is based on Bayes theorem. A. Naïve Bayesian approach in card fraud prediction Naïve Bayesian classifiers assume that the effect of an attribute value on a given class is independent of the values of the other attributes. Let D be a transaction set history of card holder, X 1,X 2,..,X D with associated class labels fraudulent and legitimate (Ci). Each tuple is represented by an n-dimentional attribute vector, X = (x 1,x 2,..,x n ) described by the attributes, Y 1,Y 2,..,Y n respectively. Some attributes which can be derived from transaction records are individually highly predictive of fraud. The classifier well predict that the transaction belong to the class having the highest posterior probability P(X Ci). In order to reduce computation in evaluating P(X Ci) of the given transaction sets with many attributes (transaction amount, time, merchant, country etc), the naive assumption of class conditional independence is made (that is there are no dependence relationship among the attributes). We can easily estimate the probabilities P(x 1 Ci), P (x 2 Ci).. from the training tuples. x k refers to the value of attributes A k for tuples X. Card fraud detection is a complex problem domains involving many different input variables (for example transaction amount, time, merchant, merchant category codes, country etc) arising from multiple transactions in a sequence. Some of these variables are continuous (eg: amount, time) where as others are categorical (eg. (2) (3) MCC, country). A classifier computes a fraud score based on a multiplicity of continuous and categorical variables. a) If Ak is categorical then P(x k Ci) is the number of tuples of class Ci in D b) If Ak is continuous-valued then we need to use a Gaussian distribution with a mean µ and standard deviation σ defined by. /2σ 2 (4) So that The predicted class label is the class Ci for which P(X Ci)P(Ci) is the maximum. Where Ci, D is the number of training tuples of class Ci in D. B. Bayesian Belief Networks When the assumptions of class conditional independence holds true their naïve Bayesian classifier is the most accurate in comparison with all other classifier. In practice however dependencies can exist between variables. Bayesian belief network specify joint conditional probability distribution. This approach computes the probability distributions of each of the features and used a process called evidence integration to compute a consolidated fraud probability from the individual features probabilities. A belief network is defined by two components- a directed acyclic graph and a set of conditional probability tables (CPT). Each node in the graph represents a random variable. The variables may be discrete or continuous valued. Each arc represents a probabilistic dependence. If an arc is drawn from a node Y to a node Z then Y is a parent or immediate predecessor of Z and Z is descendant of Y. Each variable is conditionally independent of its nondescendants in the graph given its parents. A belief network has one conditional probability table (CPT) for each variable. The CPT for a variable Y specifies the conditional distribution P(Y parents(y)). Let x =(x 1.x n ) be a transaction tuple described by the variables or attributes Y 1 Yn respectively. An example of directed acyclic graph and CPT are shown in fig. and table 1 respectively. Joint probability distribution (5) (6) (7) 4
5 Fig. 3 Directed Acyclic graph TABLE 3: CONDITIONAL PROBABILITY TABLE C. Training Bayesian Belief networks Several algorithms exist for learning the network topology from the training data. Experts must specify conditional probabilities for the nodes that participate in direct dependencies. These probabilities can then be used to compute the remaining probability values If the network topology is known and the variables are observable, then learning network consists of TABLE 4: CLASSIFIERS COMPARISON computing the CPT entries as is similarly done in naïve Bayesian classification. When the network topology is given and some of the variables are hidden gradient descent method can be used to train the belief network. Let D be a training set of data tuples, X1, X2,.,X D. w ijk be a CPT entry for the variable Yi =y ij having the parents Ui =u ik where w ijk P(Yi = y ij Ui = u ik ) The w ijk are viewed as weights. The weights are initialized to random probability values. The gradient descant method performs greedy hill climbing in that each iteration the weights are updated and will eventually coverage to a local optimum solution. We maximize This can be done by the following steps 1. Compute the gradients : for each i,j,k (9) 2. The weights are updated by (8) (10) Where l is the learning date which is set to a small constant 3. Renormalize the weights. Sl.no Features Neural Network Bayesian classifier Decision Tree 1 Transparency of reasoning The acquired knowledge in the form of a network of units connected by weighted links is difficult for humans to interpret. 2 Sparsity Effective classifiers for continuous attributes and are inaccurate in the area of sparsity of the data. 3 Size of training set Produce best result for large transaction set Based on probabilities of attributes fraud score is calculated and is transparent to a user. Accurate fraud scores in the presence of sparsity. Effective even for small to medium size transactions 4 Training Time very long training time Training times are short A statistical property called information gain used to measure the purity of training samples according to their target classification, which is transparent to a user. Accurate fraud scores in the presence of sparsity. Model over fits the data for large data sets. Pruning techniques are required to correct the over fitting problem. Training time required is more than Bayesian and less than neural network. VI. TESTING AND MODEL EVALUATION Three alternative models can be built, each based on a different method and test against the training set. Decision tree model is prepared by using splitter algorithm. Neural network and Belief network can be trained by using the whole sample as a training set and test against the training set. The key differences between decision tree, neural network and Bayesian approaches to card fraud detection are in the areas of transparency of reasoning, handling of sparsity model training time and data required. The disadvantage of 5
6 Decision tree induction is that, it considers only one attribute at a time which will reduce its performance. The neural network classifiers are suitable for larger data bases only and take long time to train it. Bayesian classifiers are more accurate and much faster to train and suitable for low, medium and large sized data base. But they are slower when applied to new instances. In a comparative assessment of the models performance can conclude that the Bayesian Belief Network outperforms the other two models and achieves outstanding classification accuracy. Neural network achieves a satisfactorily high performance. Finally the Decision Tree s performance is considered rather low. [8] R. Brause, T. Langsdorf, M. Hepp,Gesellschaft f. Neural Data Mining for Credit Card Fraud Detection [9] S.J Stolfo, D.W Fan, W.Lee, A.L Prodronidis and P.K.Chan Credit Card Fraud Detection Using Mete-Learning: Issues and Initial Results Proc. AAAI Workshop AI Methods in Fraud and Risk Management, pp.83-90, [10] Clifton Phua, Vincent Lee, Kate Smith & Ross Gayler, A Comprehensive Survey of Data Mining-based Fraud Detection Research, Final version 2: 9/02/2005 [11] Philip K. Chan, Wei Fan, Andreas L. Prodromidis, and Salvatore J. Stolfo, Distributed Data Mining incredit Card Fraud Detection, IEEE November/December [12] Chun Wei Clifton Phua, Investigative data mining in fraud detection a thesis submitted on November VII. CONCLUSION Intrusion detection is important in today s computing environment. The combination of facts such as the extensive growth of internet, the vast financial possibilities opening up in electronic trade and the lack of truly secure systems create more opportunities for criminals to attack the system. The hybrid of the anomaly and misuse detection models can improve fraud detection and security of systems. The inclusion of biometric identifiers such as scanning finger prints or retinal pattern, DNA sequence, signature or voice can develop more secure system. The specially designed fraud pattern mining algorithm reduces the detection delay. The parallel architecture of neural networks is very well suited for real time applications. But in changing fraud patterns training the model require long time. So it will be more suited for large data sets. Bayesian classifier offers improved detection performance at reduced cost. This enables smaller and mid-size institutions to implement cost effective intelligent fraud solutions which is impossible with neural network based applications. Popular supervised algorithms such as neural networks, Bayesian networks and decision trees have been combined or applied in a sequential fashion to improve results. VIII. REFERENCES [1] S. Ghosh, D.L. Reilly, Credit Card Fraud Detection with a Neural- Network, Proceedings of the International Conference on System Science, pp , 1994 [2] Aleskerov, B. Freisleben, B. Rao, CARDWATCH: A Neural Network Based Database Mining System for Credit Card Fraud Detection, Proceedings of IEEE/IAFE Conference on Computational Intelligence for Financial Engineering (CIFEr), pp , 1997 [3] M. Syeda, Y.Q. Zhang, Y. Pan, Parallel Granular Neural Networks for Fast Credit Card Fraud Detection, Proceedings of the IEEE International Conference on Fuzzy Systems, pp , [4] Jiawei Han and Micheline Kamber. Data Mining Concepts and Techniques, Second Ediction. Morgan kaufmann publishers. [5] card processing» 2007».htm [6] NeuroDimension - Fraud Detection Using Neural Networks and Sentinel Solutions (Smartsoft).htm [7] Wei Fan, Haixun Wang, Philip S. YuSalvatore J. Stolfo, A Fully Distributed Framework for Cost-sensitive Data Mining,Proceedings of the 22 nd International Conference on Distributed Computing Systems (ICDCS 02) 6
A Study of Detecting Credit Card Delinquencies with Data Mining using Decision Tree Model
A Study of Detecting Credit Card Delinquencies with Data Mining using Decision Tree Model ABSTRACT Mrs. Arpana Bharani* Mrs. Mohini Rao** Consumer credit is one of the necessary processes but lending bears
Credit Card Fraud Detection Using Hidden Markov Model
International Journal of Soft Computing and Engineering (IJSCE) Credit Card Fraud Detection Using Hidden Markov Model SHAILESH S. DHOK Abstract The most accepted payment mode is credit card for both online
Meta Learning Algorithms for Credit Card Fraud Detection
International Journal of Engineering Research and Development e-issn: 2278-67X, p-issn: 2278-8X, www.ijerd.com Volume 6, Issue 6 (March 213), PP. 16-2 Meta Learning Algorithms for Credit Card Fraud Detection
Fraud Detection in Credit Card Using DataMining Techniques Mr.P.Matheswaran 1,Mrs.E.Siva Sankari ME 2,Mr.R.Rajesh 3
Fraud Detection in Credit Card Using DataMining Techniques Mr.P.Matheswaran 1,Mrs.E.Siva Sankari ME 2,Mr.R.Rajesh 3 1 P.G. Student, Department of CSE, Govt.College of Engineering, Thirunelveli, India.
Electronic Payment Fraud Detection Techniques
World of Computer Science and Information Technology Journal (WCSIT) ISSN: 2221-0741 Vol. 2, No. 4, 137-141, 2012 Electronic Payment Fraud Detection Techniques Adnan M. Al-Khatib CIS Dept. Faculty of Information
A Novel Approach for Credit Card Fraud Detection Targeting the Indian Market
www.ijcsi.org 172 A Novel Approach for Credit Card Fraud Detection Targeting the Indian Market Jaba Suman Mishra 1, Soumyashree Panda 2, Ashis Kumar Mishra 3 1 Department Of Computer Science & Engineering,
Data Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka ([email protected]) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
Customer Classification And Prediction Based On Data Mining Technique
Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor
Credit Card Fraud Detection Using Meta-Learning: Issues 1 and Initial Results
From: AAAI Technical Report WS-97-07. Compilation copyright 1997, AAAI (www.aaai.org). All rights reserved. Credit Card Fraud Detection Using Meta-Learning: Issues 1 and Initial Results Salvatore 2 J.
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
A Secured Approach to Credit Card Fraud Detection Using Hidden Markov Model
A Secured Approach to Credit Card Fraud Detection Using Hidden Markov Model Twinkle Patel, Ms. Ompriya Kale Abstract: - As the usage of credit card has increased the credit card fraud has also increased
The Credit Card Fraud Detection Analysis With Neural Network Methods
The Credit Card Fraud Detection Analysis With Neural Network Methods 1 M.Jeevana Sujitha, 2 K. Rajini Kumari, 3 N.Anuragamayi 1,2,3 Dept. of CSE, A.S.R College of Engineering & Tech., Tetali, Tanuku, AP,
DATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
Application of Hidden Markov Model in Credit Card Fraud Detection
Application of Hidden Markov Model in Credit Card Fraud Detection V. Bhusari 1, S. Patil 1 1 Department of Computer Technology, College of Engineering, Bharati Vidyapeeth, Pune, India, 400011 Email: [email protected]
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
EFFICIENT DATA PRE-PROCESSING FOR DATA MINING
EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College
Data quality in Accounting Information Systems
Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania
COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction
COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised
D A T A M I N I N G C L A S S I F I C A T I O N
D A T A M I N I N G C L A S S I F I C A T I O N FABRICIO VOZNIKA LEO NARDO VIA NA INTRODUCTION Nowadays there is huge amount of data being collected and stored in databases everywhere across the globe.
Credit Card Fraud Detection Using Meta-Learning: Issues and Initial Results 1
Credit Card Fraud Detection Using Meta-Learning: Issues and Initial Results 1 Salvatore J. Stolfo, David W. Fan, Wenke Lee and Andreas L. Prodromidis Department of Computer Science Columbia University
DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.
DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,
Detecting Credit Card Fraud by Decision Trees and Support Vector Machines
Detecting Credit Card Fraud by Decision Trees and Support Vector Machines Y. Sahin and E. Duman Abstract With the developments in the Information Technology and improvements in the communication channels,
Credit Card Fraud Detection Using Self Organised Map
International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 13 (2014), pp. 1343-1348 International Research Publications House http://www. irphouse.com Credit Card Fraud
Random forest algorithm in big data environment
Random forest algorithm in big data environment Yingchun Liu * School of Economics and Management, Beihang University, Beijing 100191, China Received 1 September 2014, www.cmnt.lv Abstract Random forest
131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10
1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom
Prediction of Heart Disease Using Naïve Bayes Algorithm
Prediction of Heart Disease Using Naïve Bayes Algorithm R.Karthiyayini 1, S.Chithaara 2 Assistant Professor, Department of computer Applications, Anna University, BIT campus, Tiruchirapalli, Tamilnadu,
Fraud Detection in Online Banking Using HMM
2012 International Conference on Information and Network Technology (ICINT 2012) IPCSIT vol. 37 (2012) (2012) IACSIT Press, Singapore Fraud Detection in Online Banking Using HMM Sunil Mhamane + and L.M.R.J
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati [email protected], [email protected]
Detection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup
Network Anomaly Detection A Machine Learning Perspective Dhruba Kumar Bhattacharyya Jugal Kumar KaKta»C) CRC Press J Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint of the Taylor
To improve the problems mentioned above, Chen et al. [2-5] proposed and employed a novel type of approach, i.e., PA, to prevent fraud.
Proceedings of the 5th WSEAS Int. Conference on Information Security and Privacy, Venice, Italy, November 20-22, 2006 46 Back Propagation Networks for Credit Card Fraud Prediction Using Stratified Personalized
Comparison of K-means and Backpropagation Data Mining Algorithms
Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and
DATA MINING APPLICATION IN CREDIT CARD FRAUD DETECTION SYSTEM
Journal of Engineering Science and Technology Vol. 6, No. 3 (2011) 311-322 School of Engineering, Taylor s University DATA MINING APPLICATION IN CREDIT CARD FRAUD DETECTION SYSTEM FRANCISCA NONYELUM OGWUELEKA
Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier
International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing
Credit Card Fraud Detection using Hidden Morkov Model and Neural Networks
Credit Card Fraud Detection using Hidden Morkov Model and Neural Networks R.RAJAMANI Assistant Professor, Department of Computer Science, PSG College of Arts & Science, Coimbatore. Email: [email protected]
Data Mining Solutions for the Business Environment
Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania [email protected] Over
A Review of Anomaly Detection Techniques in Network Intrusion Detection System
A Review of Anomaly Detection Techniques in Network Intrusion Detection System Dr.D.V.S.S.Subrahmanyam Professor, Dept. of CSE, Sreyas Institute of Engineering & Technology, Hyderabad, India ABSTRACT:In
Comparison of Data Mining Techniques used for Financial Data Analysis
Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract
Classification and Prediction
Classification and Prediction Slides for Data Mining: Concepts and Techniques Chapter 7 Jiawei Han and Micheline Kamber Intelligent Database Systems Research Lab School of Computing Science Simon Fraser
A Review of Data Mining Techniques
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,
How To Detect Credit Card Fraud
Card Fraud Howard Mizes December 3, 2013 2013 Xerox Corporation. All rights reserved. Xerox and Xerox Design are trademarks of Xerox Corporation in the United States and/or other countries. Outline of
Role of Neural network in data mining
Role of Neural network in data mining Chitranjanjit kaur Associate Prof Guru Nanak College, Sukhchainana Phagwara,(GNDU) Punjab, India Pooja kapoor Associate Prof Swami Sarvanand Group Of Institutes Dinanagar(PTU)
Impelling Heart Attack Prediction System using Data Mining and Artificial Neural Network
General Article International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347-5161 2014 INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Impelling
Artificial Neural Network and Location Coordinates based Security in Credit Cards
Artificial Neural Network and Location Coordinates based Security in Credit Cards 1 Hakam Singh, 2 Vandna Thakur Department of Computer Science Career Point University Hamirpur Himachal Pradesh,India Abstract
Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier
Data Mining: Concepts and Techniques Jiawei Han Micheline Kamber Simon Fräser University К MORGAN KAUFMANN PUBLISHERS AN IMPRINT OF Elsevier Contents Foreword Preface xix vii Chapter I Introduction I I.
ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA
ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA D.Lavanya 1 and Dr.K.Usha Rani 2 1 Research Scholar, Department of Computer Science, Sree Padmavathi Mahila Visvavidyalayam, Tirupati, Andhra Pradesh,
Data Mining Approach For Subscription-Fraud. Detection in Telecommunication Sector
Contemporary Engineering Sciences, Vol. 7, 2014, no. 11, 515-522 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ces.2014.4431 Data Mining Approach For Subscription-Fraud Detection in Telecommunication
Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski [email protected]
Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trakovski [email protected] Neural Networks 2 Neural Networks Analogy to biological neural systems, the most robust learning systems
Sanjeev Kumar. contribute
RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 [email protected] 1. Introduction The field of data mining and knowledgee discovery is emerging as a
A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE
A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE Kasra Madadipouya 1 1 Department of Computing and Science, Asia Pacific University of Technology & Innovation ABSTRACT Today, enormous amount of data
CREDIT CARD FRAUD DETECTION BASED ON ONTOLOGY GRAPH
CREDIT CARD FRAUD DETECTION BASED ON ONTOLOGY GRAPH Ali Ahmadian Ramaki 1, Reza Asgari 2 and Reza Ebrahimi Atani 3 1 Department of Computer Engineering, Guilan University, Rasht, Iran [email protected]
The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2
2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 1 School of
A Decision Tree- Rough Set Hybrid System for Stock Market Trend Prediction
A Decision Tree- Rough Set Hybrid System for Stock Market Trend Prediction Binoy.B.Nair Department Electronics and Communication Engineering, Amrita Vishwa Vidaypeetham, Ettimadai, Coimbatore, 641105,
APPLICATION OF ARTIFICIAL NEURAL NETWORKS USING HIJRI LUNAR TRANSACTION AS EXTRACTED VARIABLES TO PREDICT STOCK TREND DIRECTION
LJMS 2008, 2 Labuan e-journal of Muamalat and Society, Vol. 2, 2008, pp. 9-16 Labuan e-journal of Muamalat and Society APPLICATION OF ARTIFICIAL NEURAL NETWORKS USING HIJRI LUNAR TRANSACTION AS EXTRACTED
Principles of Dat Da a t Mining Pham Tho Hoan [email protected] [email protected]. n
Principles of Data Mining Pham Tho Hoan [email protected] References [1] David Hand, Heikki Mannila and Padhraic Smyth, Principles of Data Mining, MIT press, 2002 [2] Jiawei Han and Micheline Kamber,
Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal
Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether
Data Mining for Knowledge Management. Classification
1 Data Mining for Knowledge Management Classification Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Eamonn Keogh
A Review On Credit Card Fraud Detection Using BLAST-SSAHA Method
A Review On Credit Card Fraud Detection Using BLAST-SSAHA Method Mr Yogesh M Narekar 1, Mr Sushil Kumar Chavan 2 Department of Information Technology, RGCER, Nagpur, India 1 Department of Information Technology,
Performance Analysis of Decision Trees
Performance Analysis of Decision Trees Manpreet Singh Department of Information Technology, Guru Nanak Dev Engineering College, Ludhiana, Punjab, India Sonam Sharma CBS Group of Institutions, New Delhi,India
Lecture 6. Artificial Neural Networks
Lecture 6 Artificial Neural Networks 1 1 Artificial Neural Networks In this note we provide an overview of the key concepts that have led to the emergence of Artificial Neural Networks as a major paradigm
DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES
DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 [email protected]
IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES
IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES Bruno Carneiro da Rocha 1,2 and Rafael Timóteo de Sousa Júnior 2 1 Bank of Brazil, Brasília-DF, Brazil [email protected] 2 Network Engineering
Decision Support System For A Customer Relationship Management Case Study
61 Decision Support System For A Customer Relationship Management Case Study Ozge Kart 1, Alp Kut 1, and Vladimir Radevski 2 1 Dokuz Eylul University, Izmir, Turkey {ozge, alp}@cs.deu.edu.tr 2 SEE University,
Data Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015
RESEARCH ARTICLE OPEN ACCESS Data Mining Technology for Efficient Network Security Management Ankit Naik [1], S.W. Ahmad [2] Student [1], Assistant Professor [2] Department of Computer Science and Engineering
Healthcare Measurement Analysis Using Data mining Techniques
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 03 Issue 07 July, 2014 Page No. 7058-7064 Healthcare Measurement Analysis Using Data mining Techniques 1 Dr.A.Shaik
Keywords data mining, prediction techniques, decision making.
Volume 5, Issue 4, April 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analysis of Datamining
HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION
HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION Chihli Hung 1, Jing Hong Chen 2, Stefan Wermter 3, 1,2 Department of Management Information Systems, Chung Yuan Christian University, Taiwan
An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
Survey on Credit Card Fraud Detection Techniques
www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 4 Issue 11 Nov 2015, Page No. 15010-15015 Survey on Credit Card Fraud Detection Techniques Priya Ravindra Shimpi,
A New Approach For Estimating Software Effort Using RBFN Network
IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.7, July 008 37 A New Approach For Estimating Software Using RBFN Network Ch. Satyananda Reddy, P. Sankara Rao, KVSVN Raju,
Data Mining Techniques Chapter 7: Artificial Neural Networks
Data Mining Techniques Chapter 7: Artificial Neural Networks Artificial Neural Networks.................................................. 2 Neural network example...................................................
How To Solve The Kd Cup 2010 Challenge
A Lightweight Solution to the Educational Data Mining Challenge Kun Liu Yan Xing Faculty of Automation Guangdong University of Technology Guangzhou, 510090, China [email protected] [email protected]
Classification and Prediction
Classification and Prediction 1. Objectives...2 2. Classification vs. Prediction...3 2.1. Definitions...3 2.2. Supervised vs. Unsupervised Learning...3 2.3. Classification and Prediction Related Issues...4
Practical Applications of DATA MINING. Sang C Suh Texas A&M University Commerce JONES & BARTLETT LEARNING
Practical Applications of DATA MINING Sang C Suh Texas A&M University Commerce r 3 JONES & BARTLETT LEARNING Contents Preface xi Foreword by Murat M.Tanik xvii Foreword by John Kocur xix Chapter 1 Introduction
Chapter 12 Discovering New Knowledge Data Mining
Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam
Mobile Phone APP Software Browsing Behavior using Clustering Analysis
Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis
Neural Networks and Back Propagation Algorithm
Neural Networks and Back Propagation Algorithm Mirza Cilimkovic Institute of Technology Blanchardstown Blanchardstown Road North Dublin 15 Ireland [email protected] Abstract Neural Networks (NN) are important
REVIEW OF ENSEMBLE CLASSIFICATION
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IJCSMC, Vol. 2, Issue.
A New Approach for Evaluation of Data Mining Techniques
181 A New Approach for Evaluation of Data Mining s Moawia Elfaki Yahia 1, Murtada El-mukashfi El-taher 2 1 College of Computer Science and IT King Faisal University Saudi Arabia, Alhasa 31982 2 Faculty
Predict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, [email protected] Department of Electrical Engineering, Stanford University Abstract Given two persons
degrees of freedom and are able to adapt to the task they are supposed to do [Gupta].
1.3 Neural Networks 19 Neural Networks are large structured systems of equations. These systems have many degrees of freedom and are able to adapt to the task they are supposed to do [Gupta]. Two very
Chapter 20: Data Analysis
Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification
Data Mining Classification: Decision Trees
Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous
Learning is a very general term denoting the way in which agents:
What is learning? Learning is a very general term denoting the way in which agents: Acquire and organize knowledge (by building, modifying and organizing internal representations of some external reality);
Design call center management system of e-commerce based on BP neural network and multifractal
Available online www.jocpr.com Journal of Chemical and Pharmaceutical Research, 2014, 6(6):951-956 Research Article ISSN : 0975-7384 CODEN(USA) : JCPRC5 Design call center management system of e-commerce
AN UPDATE RESEARCH ON CREDIT CARD ON-LINE TRANSACTIONS
AN UPDATE RESEARCH ON CREDIT CARD ON-LINE TRANSACTIONS Falaki S. O. Alese B. K. Department of Computer Science, Federal University of Technology, Akure, Ondo State, Nigeria. Ismaila W. O. Department of
Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100
Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100 Erkan Er Abstract In this paper, a model for predicting students performance levels is proposed which employs three
NEURAL NETWORKS IN DATA MINING
NEURAL NETWORKS IN DATA MINING 1 DR. YASHPAL SINGH, 2 ALOK SINGH CHAUHAN 1 Reader, Bundelkhand Institute of Engineering & Technology, Jhansi, India 2 Lecturer, United Institute of Management, Allahabad,
Intrusion Detection via Machine Learning for SCADA System Protection
Intrusion Detection via Machine Learning for SCADA System Protection S.L.P. Yasakethu Department of Computing, University of Surrey, Guildford, GU2 7XH, UK. [email protected] J. Jiang Department
EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH
EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH SANGITA GUPTA 1, SUMA. V. 2 1 Jain University, Bangalore 2 Dayanada Sagar Institute, Bangalore, India Abstract- One
Credit Card Fraud Detection and Concept-Drift Adaptation with Delayed Supervised Information
Credit Card Fraud Detection and Concept-Drift Adaptation with Delayed Supervised Information Andrea Dal Pozzolo, Giacomo Boracchi, Olivier Caelen, Cesare Alippi, and Gianluca Bontempi 15/07/2015 IEEE IJCNN
Neural Networks in Data Mining
IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V6 PP 01-06 www.iosrjen.org Neural Networks in Data Mining Ripundeep Singh Gill, Ashima Department
LVQ Plug-In Algorithm for SQL Server
LVQ Plug-In Algorithm for SQL Server Licínia Pedro Monteiro Instituto Superior Técnico [email protected] I. Executive Summary In this Resume we describe a new functionality implemented
Exploration of Data mining techniques in Fraud Detection: Credit Card Khyati Chaudhary
International Journal of Electronics and Computer Science Engineering 1765 Available Online at www.ijecse.org ISSN- 2277-1956 Exploration of Data mining techniques in Fraud Detection: Credit Card Khyati
Data Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
DATA MINING USING INTEGRATION OF CLUSTERING AND DECISION TREE
DATA MINING USING INTEGRATION OF CLUSTERING AND DECISION TREE 1 K.Murugan, 2 P.Varalakshmi, 3 R.Nandha Kumar, 4 S.Boobalan 1 Teaching Fellow, Department of Computer Technology, Anna University 2 Assistant
A Data Mining Study of Weld Quality Models Constructed with MLP Neural Networks from Stratified Sampled Data
A Data Mining Study of Weld Quality Models Constructed with MLP Neural Networks from Stratified Sampled Data T. W. Liao, G. Wang, and E. Triantaphyllou Department of Industrial and Manufacturing Systems
