# DNA: An Online Algorithm for Credit Card Fraud Detection for Games Merchants

Save this PDF as:

Size: px
Start display at page:

Download "DNA: An Online Algorithm for Credit Card Fraud Detection for Games Merchants"

## Transcription

3 be seen the parameter minimum alone, for example, is not able to sufficiently segregate both labels (the average of both labels yield more segregation). Adequate attributes have a great distance between the average fraud and average genuine value. The corresponding standard deviation boundaries should, if possible, not overlap. Figure 2 shows this based on the example attribute sumcreditcardtoken. The attribute was part of the data set described in Section IV. It was created by counting up all distinct credit cards, which were used by a certain user_ . For the given example case, we calculated the values for the aggregation parameters (as depicted in Table 1 column two to four). combinations can either be done by functions b i f(t i (a j )) or by b i f a i, a j. The attributes are normalized with the min-max normalization [3, p. 114] to bring the different attributes on the same numerical level (ranging from 0 to 1). We distinguish between two types of attributes, as briefly described above. These attributes are referred as b i F, where F is the set of derived attributes that tend to 1 if normalized and are summed up in the nominator of (4). The denominator, in contrary, is composed of a second type of attributes b i E, which tend to 0 if normalized with minmax normalization. If the quotient of the normalization expression is not defined, it will be discarded. T = {t i (a 1, a 2,, a n ) i = 1,, n: a i A} (2) S = s(b 1, b 2,, b m ) k 1,,m : b k f a i,a j k 1,,m : b k f(t i (a j )) i,j 1,,n i 1,, T j 1,,n (3) Figure 2. Example distribution for used credit cards per user for two classes of labels (fraud, genuine) As it can be seen in Figure 2, the depicted attribute is not perfect, since the boundaries of the standard deviation of fraudulent transactions overlap with the area of standard deviation of genuine transactions. However, the distance between the average fraud value and average genuine value is suitable for classification. We can also see that the average fraud value is on the right hand side of the total average value and therefore tends to larger values in a fraudulent case. This indicates that the attribute sumcreditcardtoken should be placed in the numerator of (4), since it will tend to its maximum if a sequence is fraudulent. For our work, we were able to identify six suitable attributes: - sumcreditcard: # of distinct credit cards per user - sumtransactionstats: # of transaction status rejected - sumsuccessfulltrans # of completed transactions - avgdensity: average distance between dates - sumcountries: # of distinct countries involved - sumdates: # of distinct dates involved C. Formula and Weighting The above-described process is repeated for each of the n available attributes in the given data set [see also (2)] apart from the aggregation attribute and the label attribute. The training data set T consists of transactions t i which are described by attributes a i A, see also Formula (2). Thereby A consists of all original attributes of a transaction a i and T denotes the set of available transactions with attributes from A. These transactions are grouped into sequences S, which consist of various combined attributes b i, see also (3). These risklevel = b i F b i min S b i S b i min S b i α bi b i E b i min S b i max S b i min S b i β bi The next step is to weight the normalized attributes according to their importance for fraud detection. We use parameter α bi > 0 for F attributes and β bi > 0 for E attributes to achieve this task. Two sets of parameters are necessary since attributes used for the denominator need to be scaled down, in order to increase their significance. The attributes in the nominator are increased in significance, if the α bi is scaled up. The weight parameters α bi and β bi are determined empirically. D. Threshold Selection Formula (4) is an indicator on how prevalent fraudulent attributes are for a particular transaction. The last step for applying the DNA approach for fraud detection is to determine a threshold value whose violation will lead to the classification fraudulent transaction. As mentioned above, this threshold is determined empirically by undertaking a series of tests with a set of thresholds (e.g., from 0 to 100). Accuracy metrics such as Precision P, Recall R and score F1 (5) are then determined for each assumed threshold. Thereby F1 represents the harmonic mean of Precision and Recall and is used to rank the performance of different methods in the experimental evaluation: F1 = 2 P R (5) P+T The development of these performance measures over different threshold values is depicted Figure 3 for an example case: (4) 3

4 TABLE II. FULL DATA SET SCHEMA column Name created user_signuptime creditcard_token description timestamp of the payment transaction identifies credit card, hashed Figure 3. Determining threshold value for DNA approach Accuracy indicators (Figure 3) are increasing fast until threshold value 5. It is not reasonable to select a threshold lower than 5, since the F1 is far from the global optimum. From threshold 5 on, there is an intersection point, which will keep the F1 near the global optimum. This second range is called trade-off range and spans up to threshold value 12, in the case depicted in Figure 3. Within this range the merchant can choose between detecting more fraudsters, including a higher rate of false positives or catching less fraudsters, but increase Precision and therefore avoid false positives. This choice can depend on the ability of the merchant to deal with false positives and on the merchants specific total fraud costs. In the context of fraud detection the term total fraud costs means the sum of lost value, scanning cost as well as reimbursement fees associated with a fraud case. After a certain threshold value, in the shown case 12, the Precision is almost 1 and will only increase insignificantly. The Recall and consecutively F1, will decrease from that point. The Reason for this is the intrinsic mechanic in the DNA approach. Fraudulent transactions with a comparable low fraud profile will be assigned a lower risk level. This however, is still higher level than the risk level of genuine users. If however, the threshold is set high enough these lower profile fraud cases will be classified incorrectly as genuine, causing the Recall and F1 to drop. Therefore it makes no sense to choose a threshold greater than 12. IV. EXPERIMENTAL EVALUATION The performance of the proposed DNA approach is compared to the standard techniques, mentioned in the related work. This comparison is based on real credit card fraud data, which was thankworthy provided by a successful gaming company on the online games market. A. Data Set The given data set (referred as full data set) comprises of 156,883 credit card transactions from 63,933 unique users. The records in the data set have the schema as it can be seen in Table II. Due to the high number of occurrences in several columns as well as the lack of distinctive attributes, most of the standard techniques were not applicable on that data set. To overcome these obstacles and to get a fair comparison, several adaptations to the data set have been done. The resulting prepared data has a minimum sequence length of three (smaller sequences have been discarded) and four derived attributes (see Table III) were added. card_bin user_country user_id user_ transaction_amount order_payment_status TABLE III. column Name bin_country Bank Identification Number user s land of origin hashed for privacy compliance ADDITIONS PREPARED DATA SET description 2 letter country code derived from card_bin days_since_signup integer attribute calculated as difference from signup_time to created total_count package denotes total transaction figure for a particular user_ a single letter attribute ranging from A to E. It was derived from the offer_price attribute to reduce the cardinality of the offer_price attribute The prepared data set comprised of 13,298 unique users which are accompanied by 46,516 transactions. The last transaction of each user was cut out in order to form the test data set. This procedure segmented the prepared data set into 71.4 % train data and % test data. B. Fraud Detection Performance All tests in this section were performed using the prepared data set. We used the F1 score in order to rank the compared methods. As shown in Figure 4, the DNA approach is able to perform % better than the best standard method, which is the Bayesian Net. Figure 4. Fraud detection performance comparison The DNA approach is also able to achieve an almost perfect % Precision, which is especially valuable for online gaming merchants, since it reduces the risk of punish 4

5 genuine users and consecutively reduces the risk of reputation loss. C. Scalability The previous subsection showed the impressive fraud detection accuracy of the DNA approach. The next step was to take a deeper look into the run time behavior of the compared methods. Therefore, we prepared scaled data sets by multiplying the original prepared data (Subsection IV A) by 4, 8 and 16 times. The corresponding total execution times in seconds of the used methods can be seen in Figure 5. record the execution time per sequence. This enables us to measure the average execution time for each sequence length. Constant Sequence Length: In the first round of experiments, we measured the execution time for the fixed sequence length = 1. This allows measuring the influence of the data set size on the execution time. Figure 7. DNA results for fixed length sequence Figure 5. Execution time of all implemented methods (the used DTs produced out of memory exceptions for data set 8x and 16x) The DNA approach uses simple operations and data structures like HashMaps or simple additions for calculating the risk level per sequence. This lowers the computational intensiveness and improves scalability significantly. The Bayesian Belief Networks obtain consistently the best results for static analysis (all transactions are classified in one batch). However, the BBN is an offline algorithm; therefore it cannot be applied in a real time online payment system on a transaction basis. To use offline algorithms in real applications, further timeslots need to be considered which were neglected in this work (data collection, transfer and preparation time). Experiments In Figure 7, the grey line indicates the linear scaled value, based on the results of the normal data set. This means that a linear scaling algorithm, which needs 4.23 [ms] for a sequence of length 1 on the normal table, would need 16 times as long for the same calculation on a 16 times bigger table. As it can be seen the execution time for the DNA approach is less than that. Higher Sequence Length: Throughout the second round of experiments the average execution times for all sequence lengths were averaged. Since the above mentioned data sets were created by multiplication of the full data set, not only the data set size but also the sequence lengths were increased. Figure 8 shows the experimental results. Set1: prepared data set Set2: full data set A: length = 1 B: var length Figure 6. Overview experimental evaluation The second set of experiments measures the execution time of the DNA approach using the full data set and varying the sequence length. Multiplications of the full data set (Section IV, A) were used for these experiments. In order to perform the measurements, the algorithm was adapted to Figure 8. DNA results over higher sequence length It is visible that the performance stays close to the expected linear response time behavior for the first three data 5

### Application of Hidden Markov Model in Credit Card Fraud Detection

Application of Hidden Markov Model in Credit Card Fraud Detection V. Bhusari 1, S. Patil 1 1 Department of Computer Technology, College of Engineering, Bharati Vidyapeeth, Pune, India, 400011 Email: vrunda1234@gmail.com

### Fraud Detection in Online Banking Using HMM

2012 International Conference on Information and Network Technology (ICINT 2012) IPCSIT vol. 37 (2012) (2012) IACSIT Press, Singapore Fraud Detection in Online Banking Using HMM Sunil Mhamane + and L.M.R.J

### Chapter 6. The stacking ensemble approach

82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

### A Secured Approach to Credit Card Fraud Detection Using Hidden Markov Model

A Secured Approach to Credit Card Fraud Detection Using Hidden Markov Model Twinkle Patel, Ms. Ompriya Kale Abstract: - As the usage of credit card has increased the credit card fraud has also increased

### Detecting Credit Card Fraud

Case Study Detecting Credit Card Fraud Analysis of Behaviometrics in an online Payment environment Introduction BehavioSec have been conducting tests on Behaviometrics stemming from card payments within

### Credit Card Fraud Detection Using Self Organised Map

International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 13 (2014), pp. 1343-1348 International Research Publications House http://www. irphouse.com Credit Card Fraud

### Invited Applications Paper

Invited Applications Paper - - Thore Graepel Joaquin Quiñonero Candela Thomas Borchert Ralf Herbrich Microsoft Research Ltd., 7 J J Thomson Avenue, Cambridge CB3 0FB, UK THOREG@MICROSOFT.COM JOAQUINC@MICROSOFT.COM

### Credit Card Fraud Detection Using Hidden Markov Model

International Journal of Soft Computing and Engineering (IJSCE) Credit Card Fraud Detection Using Hidden Markov Model SHAILESH S. DHOK Abstract The most accepted payment mode is credit card for both online

### Card Fraud. Howard Mizes December 3, 2013

Card Fraud Howard Mizes December 3, 2013 2013 Xerox Corporation. All rights reserved. Xerox and Xerox Design are trademarks of Xerox Corporation in the United States and/or other countries. Outline of

### Techniques for Fraud Detection

Analysis on Credit Card Fraud Detection Methods 1 Renu HCE Sonepat 2 Suman HCE Sonepat Abstract Due to the theatrical increase of fraud which results in loss of dollars worldwide each year, several modern

### Feature Construction for Time Ordered Data Sequences

Feature Construction for Time Ordered Data Sequences Michael Schaidnagel School of Computing University of the West of Scotland Email: B00260359@studentmail.uws.ac.uk Fritz Laux Faculty of Computer Science

### Credit Card Fraud Detection using Hidden Morkov Model and Neural Networks

Credit Card Fraud Detection using Hidden Morkov Model and Neural Networks R.RAJAMANI Assistant Professor, Department of Computer Science, PSG College of Arts & Science, Coimbatore. Email: rajamani_devadoss@yahoo.co.in

### not possible or was possible at a high cost for collecting the data.

Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day

### A Fast Fraud Detection Approach using Clustering Based Method

pp. 33-37 Krishi Sanskriti Publications http://www.krishisanskriti.org/jbaer.html A Fast Detection Approach using Clustering Based Method Surbhi Agarwal 1, Santosh Upadhyay 2 1 M.tech Student, Mewar University,

### A Content based Spam Filtering Using Optical Back Propagation Technique

A Content based Spam Filtering Using Optical Back Propagation Technique Sarab M. Hameed 1, Noor Alhuda J. Mohammed 2 Department of Computer Science, College of Science, University of Baghdad - Iraq ABSTRACT

### Figure 1. The cloud scales: Amazon EC2 growth [2].

- Chung-Cheng Li and Kuochen Wang Department of Computer Science National Chiao Tung University Hsinchu, Taiwan 300 shinji10343@hotmail.com, kwang@cs.nctu.edu.tw Abstract One of the most important issues

### Probabilistic Credit Card Fraud Detection System in Online Transactions

Probabilistic Credit Card Fraud Detection System in Online Transactions S. O. Falaki 1, B. K. Alese 1, O. S. Adewale 1, J. O. Ayeni 2, G. A. Aderounmu 3 and W. O. Ismaila 4 * 1 Federal University of Technology,

### Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier

International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing

### PROBLEM REDUCTION IN ONLINE PAYMENT SYSTEM USING HYBRID MODEL

PROBLEM REDUCTION IN ONLINE PAYMENT SYSTEM USING HYBRID MODEL Sandeep Pratap Singh 1, Shiv Shankar P. Shukla 1, Nitin Rakesh 1 and Vipin Tyagi 2 1 Department of Computer Science and Engineering, Jaypee

### FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS

FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS Breno C. Costa, Bruno. L. A. Alberto, André M. Portela, W. Maduro, Esdras O. Eler PDITec, Belo Horizonte,

### FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT

### Error Log Processing for Accurate Failure Prediction. Humboldt-Universität zu Berlin

Error Log Processing for Accurate Failure Prediction Felix Salfner ICSI Berkeley Steffen Tschirpke Humboldt-Universität zu Berlin Introduction Context of work: Error-based online failure prediction: error

### To improve the problems mentioned above, Chen et al. [2-5] proposed and employed a novel type of approach, i.e., PA, to prevent fraud.

Proceedings of the 5th WSEAS Int. Conference on Information Security and Privacy, Venice, Italy, November 20-22, 2006 46 Back Propagation Networks for Credit Card Fraud Prediction Using Stratified Personalized

### Intrusion Detection via Machine Learning for SCADA System Protection

Intrusion Detection via Machine Learning for SCADA System Protection S.L.P. Yasakethu Department of Computing, University of Surrey, Guildford, GU2 7XH, UK. s.l.yasakethu@surrey.ac.uk J. Jiang Department

### Social Media Mining. Data Mining Essentials

Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

### BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents

### Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

### Mobile Phone APP Software Browsing Behavior using Clustering Analysis

Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis

### Biometric Authentication using Online Signatures

Biometric Authentication using Online Signatures Alisher Kholmatov and Berrin Yanikoglu alisher@su.sabanciuniv.edu, berrin@sabanciuniv.edu http://fens.sabanciuniv.edu Sabanci University, Tuzla, Istanbul,

### Internet Banking Fraud Detection using HMM and BLAST-SSAHA Hybridization

Internet Banking Fraud Detection using HMM and BLAST-SSAHA Hybridization Avanti H. Vaidya 1, S. W. Mohod 2 1, 2 Computer Science and Engineering Department, R.T.M.N.U University, B.D. College of Engineering

### Why is Internal Audit so Hard?

Why is Internal Audit so Hard? 2 2014 Why is Internal Audit so Hard? 3 2014 Why is Internal Audit so Hard? Waste Abuse Fraud 4 2014 Waves of Change 1 st Wave Personal Computers Electronic Spreadsheets

### DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,

### Data Mining Part 5. Prediction

Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification

### Document Image Retrieval using Signatures as Queries

Document Image Retrieval using Signatures as Queries Sargur N. Srihari, Shravya Shetty, Siyuan Chen, Harish Srinivasan, Chen Huang CEDAR, University at Buffalo(SUNY) Amherst, New York 14228 Gady Agam and

### Report on the Dagstuhl Seminar Data Quality on the Web

Report on the Dagstuhl Seminar Data Quality on the Web Michael Gertz M. Tamer Özsu Gunter Saake Kai-Uwe Sattler U of California at Davis, U.S.A. U of Waterloo, Canada U of Magdeburg, Germany TU Ilmenau,

### Email Spam Detection Using Customized SimHash Function

International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume 1, Issue 8, December 2014, PP 35-40 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org Email

### Data Mining Solutions for the Business Environment

Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania ruxandra_stefania.petre@yahoo.com Over

### Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100

Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100 Erkan Er Abstract In this paper, a model for predicting students performance levels is proposed which employs three

### Flexible mobility management strategy in cellular networks

Flexible mobility management strategy in cellular networks JAN GAJDORUS Department of informatics and telecommunications (161114) Czech technical university in Prague, Faculty of transportation sciences

### Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2

Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2 Department of Computer Engineering, YMCA University of Science & Technology, Faridabad,

### ANALYTIC HIERARCHY PROCESS (AHP) TUTORIAL

Kardi Teknomo ANALYTIC HIERARCHY PROCESS (AHP) TUTORIAL Revoledu.com Table of Contents Analytic Hierarchy Process (AHP) Tutorial... 1 Multi Criteria Decision Making... 1 Cross Tabulation... 2 Evaluation

### Key Components of WAN Optimization Controller Functionality

Key Components of WAN Optimization Controller Functionality Introduction and Goals One of the key challenges facing IT organizations relative to application and service delivery is ensuring that the applications

### Categorical Data Visualization and Clustering Using Subjective Factors

Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,

### A Game Theoretical Framework for Adversarial Learning

A Game Theoretical Framework for Adversarial Learning Murat Kantarcioglu University of Texas at Dallas Richardson, TX 75083, USA muratk@utdallas Chris Clifton Purdue University West Lafayette, IN 47907,

### International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

### Electronic Payment Fraud Detection Techniques

World of Computer Science and Information Technology Journal (WCSIT) ISSN: 2221-0741 Vol. 2, No. 4, 137-141, 2012 Electronic Payment Fraud Detection Techniques Adnan M. Al-Khatib CIS Dept. Faculty of Information

### Automatic Bank Fraud Detection Using Support Vector Machines

Automatic Bank Fraud Detection Using Support Vector Machines Djeffal Abdelhamid 1, Soltani Khaoula 1, Ouassaf Atika 2 1 Computer science department, LESIA Laboratory, Biskra University, Algeria 2 Economic

### Utility-Based Fraud Detection

Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Utility-Based Fraud Detection Luis Torgo and Elsa Lopes Fac. of Sciences / LIAAD-INESC Porto LA University of

### Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be

### Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

### Classification On The Clouds Using MapReduce

Classification On The Clouds Using MapReduce Simão Martins Instituto Superior Técnico Lisbon, Portugal simao.martins@tecnico.ulisboa.pt Cláudia Antunes Instituto Superior Técnico Lisbon, Portugal claudia.antunes@tecnico.ulisboa.pt

### Extension of Decision Tree Algorithm for Stream Data Mining Using Real Data

Fifth International Workshop on Computational Intelligence & Applications IEEE SMC Hiroshima Chapter, Hiroshima University, Japan, November 10, 11 & 12, 2009 Extension of Decision Tree Algorithm for Stream

### Support Vector Machines for Dynamic Biometric Handwriting Classification

Support Vector Machines for Dynamic Biometric Handwriting Classification Tobias Scheidat, Marcus Leich, Mark Alexander, and Claus Vielhauer Abstract Biometric user authentication is a recent topic in the

### Survey on Credit Card Fraud Detection Techniques

www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 4 Issue 11 Nov 2015, Page No. 15010-15015 Survey on Credit Card Fraud Detection Techniques Priya Ravindra Shimpi,

### Random forest algorithm in big data environment

Random forest algorithm in big data environment Yingchun Liu * School of Economics and Management, Beihang University, Beijing 100191, China Received 1 September 2014, www.cmnt.lv Abstract Random forest

### RANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS

ISBN: 978-972-8924-93-5 2009 IADIS RANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS Ben Choi & Sumit Tyagi Computer Science, Louisiana Tech University, USA ABSTRACT In this paper we propose new methods for

### Enhanced data mining analysis in higher educational system using rough set theory

African Journal of Mathematics and Computer Science Research Vol. 2(9), pp. 184-188, October, 2009 Available online at http://www.academicjournals.org/ajmcsr ISSN 2006-9731 2009 Academic Journals Review

### Data quality in Accounting Information Systems

Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania

### Prediction of Stock Performance Using Analytical Techniques

136 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 5, NO. 2, MAY 2013 Prediction of Stock Performance Using Analytical Techniques Carol Hargreaves Institute of Systems Science National University

### IBM SPSS Direct Marketing 23

IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release

### Principles of Data Mining by Hand&Mannila&Smyth

Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences

### IBM SPSS Neural Networks 22

IBM SPSS Neural Networks 22 Note Before using this information and the product it supports, read the information in Notices on page 21. Product Information This edition applies to version 22, release 0,

### A Survey on Outlier Detection Techniques for Credit Card Fraud Detection

IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 16, Issue 2, Ver. VI (Mar-Apr. 2014), PP 44-48 A Survey on Outlier Detection Techniques for Credit Card Fraud

### Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati tnpatil2@gmail.com, ss_sherekar@rediffmail.com

### Machine Learning using MapReduce

Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous

### Neural Networks and Back Propagation Algorithm

Neural Networks and Back Propagation Algorithm Mirza Cilimkovic Institute of Technology Blanchardstown Blanchardstown Road North Dublin 15 Ireland mirzac@gmail.com Abstract Neural Networks (NN) are important

### Evaluating Online Payment Transaction Reliability using Rules Set Technique and Graph Model

Evaluating Online Payment Transaction Reliability using Rules Set Technique and Graph Model Trung Le 1, Ba Quy Tran 2, Hanh Dang Thi My 3, Thanh Hung Ngo 4 1 GSR, Information System Lab., University of

### IBM SPSS Direct Marketing 22

IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release

### Making Sense of the Mayhem: Machine Learning and March Madness

Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University atran3@stanford.edu ginzberg@stanford.edu I. Introduction III. Model The goal of our research

### Data Mining Approach For Subscription-Fraud. Detection in Telecommunication Sector

Contemporary Engineering Sciences, Vol. 7, 2014, no. 11, 515-522 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ces.2014.4431 Data Mining Approach For Subscription-Fraud Detection in Telecommunication

### 6.2.8 Neural networks for data mining

6.2.8 Neural networks for data mining Walter Kosters 1 In many application areas neural networks are known to be valuable tools. This also holds for data mining. In this chapter we discuss the use of neural

### Can we Analyze all the Big Data we Collect?

DBKDA/WEB Panel 2015, Rome 28.05.2015 DBKDA Panel 2015, Rome, 27.05.2015 Reutlingen University Can we Analyze all the Big Data we Collect? Moderation: Fritz Laux, Reutlingen University, Germany Panelists:

### Learning Classifiers for Misuse Detection Using a Bag of System Calls Representation

Learning Classifiers for Misuse Detection Using a Bag of System Calls Representation Dae-Ki Kang 1, Doug Fuller 2, and Vasant Honavar 1 1 Artificial Intelligence Lab, Department of Computer Science, Iowa

### Enhancing Quality of Data using Data Mining Method

JOURNAL OF COMPUTING, VOLUME 2, ISSUE 9, SEPTEMBER 2, ISSN 25-967 WWW.JOURNALOFCOMPUTING.ORG 9 Enhancing Quality of Data using Data Mining Method Fatemeh Ghorbanpour A., Mir M. Pedram, Kambiz Badie, Mohammad

### Towards better accuracy for Spam predictions

Towards better accuracy for Spam predictions Chengyan Zhao Department of Computer Science University of Toronto Toronto, Ontario, Canada M5S 2E4 czhao@cs.toronto.edu Abstract Spam identification is crucial

### Introduction to Data Mining

Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association

### Cross-Validation. Synonyms Rotation estimation

Comp. by: BVijayalakshmiGalleys0000875816 Date:6/11/08 Time:19:52:53 Stage:First Proof C PAYAM REFAEILZADEH, LEI TANG, HUAN LIU Arizona State University Synonyms Rotation estimation Definition is a statistical

### Real Time Fraud Detection With Sequence Mining on Big Data Platform. Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May 6 2014 Santa Clara, CA

Real Time Fraud Detection With Sequence Mining on Big Data Platform Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May 6 2014 Santa Clara, CA Open Source Big Data Eco System Query (NOSQL) : Cassandra,

### A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE

A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE Kasra Madadipouya 1 1 Department of Computing and Science, Asia Pacific University of Technology & Innovation ABSTRACT Today, enormous amount of data

### Discovering process models from empirical data

Discovering process models from empirical data Laura Măruşter (l.maruster@tm.tue.nl), Ton Weijters (a.j.m.m.weijters@tm.tue.nl) and Wil van der Aalst (w.m.p.aalst@tm.tue.nl) Eindhoven University of Technology,

### An Order-Invariant Time Series Distance Measure [Position on Recent Developments in Time Series Analysis]

An Order-Invariant Time Series Distance Measure [Position on Recent Developments in Time Series Analysis] Stephan Spiegel and Sahin Albayrak DAI-Lab, Technische Universität Berlin, Ernst-Reuter-Platz 7,

### TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam

### Email Spam Detection A Machine Learning Approach

Email Spam Detection A Machine Learning Approach Ge Song, Lauren Steimle ABSTRACT Machine learning is a branch of artificial intelligence concerned with the creation and study of systems that can learn

### Using News Articles to Predict Stock Price Movements

Using News Articles to Predict Stock Price Movements Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 9237 gyozo@cs.ucsd.edu 21, June 15,

### Local outlier detection in data forensics: data mining approach to flag unusual schools

Local outlier detection in data forensics: data mining approach to flag unusual schools Mayuko Simon Data Recognition Corporation Paper presented at the 2012 Conference on Statistical Detection of Potential

### Chapter 20: Data Analysis

Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification

### Learning is a very general term denoting the way in which agents:

What is learning? Learning is a very general term denoting the way in which agents: Acquire and organize knowledge (by building, modifying and organizing internal representations of some external reality);

### Inner Classification of Clusters for Online News

Inner Classification of Clusters for Online News Harmandeep Kaur 1, Sheenam Malhotra 2 1 (Computer Science and Engineering Department, Shri Guru Granth Sahib World University Fatehgarh Sahib) 2 (Assistant

### Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics),

### Static Data Mining Algorithm with Progressive Approach for Mining Knowledge

Global Journal of Business Management and Information Technology. Volume 1, Number 2 (2011), pp. 85-93 Research India Publications http://www.ripublication.com Static Data Mining Algorithm with Progressive

### The Credit Card Fraud Detection Analysis With Neural Network Methods

The Credit Card Fraud Detection Analysis With Neural Network Methods 1 M.Jeevana Sujitha, 2 K. Rajini Kumari, 3 N.Anuragamayi 1,2,3 Dept. of CSE, A.S.R College of Engineering & Tech., Tetali, Tanuku, AP,

### Tracking Groups of Pedestrians in Video Sequences

Tracking Groups of Pedestrians in Video Sequences Jorge S. Marques Pedro M. Jorge Arnaldo J. Abrantes J. M. Lemos IST / ISR ISEL / IST ISEL INESC-ID / IST Lisbon, Portugal Lisbon, Portugal Lisbon, Portugal

### Performance Metrics for Graph Mining Tasks

Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics Supervised Learning Performance Metrics Unsupervised Learning Performance Metrics Optimizing Metrics Statistical

### Standardization and Its Effects on K-Means Clustering Algorithm

Research Journal of Applied Sciences, Engineering and Technology 6(7): 399-3303, 03 ISSN: 040-7459; e-issn: 040-7467 Maxwell Scientific Organization, 03 Submitted: January 3, 03 Accepted: February 5, 03

### CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES

CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES Claus Gwiggner, Ecole Polytechnique, LIX, Palaiseau, France Gert Lanckriet, University of Berkeley, EECS,

### A Novel Approach for Credit Card Fraud Detection Targeting the Indian Market

www.ijcsi.org 172 A Novel Approach for Credit Card Fraud Detection Targeting the Indian Market Jaba Suman Mishra 1, Soumyashree Panda 2, Ashis Kumar Mishra 3 1 Department Of Computer Science & Engineering,

### International Journal of Electronics and Computer Science Engineering 1449

International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and

### Healthcare Measurement Analysis Using Data mining Techniques

www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 03 Issue 07 July, 2014 Page No. 7058-7064 Healthcare Measurement Analysis Using Data mining Techniques 1 Dr.A.Shaik