Credit Card Fraud Detection and Concept-Drift Adaptation with Delayed Supervised Information
|
|
|
- Maud Thompson
- 10 years ago
- Views:
Transcription
1 Credit Card Fraud Detection and Concept-Drift Adaptation with Delayed Supervised Information Andrea Dal Pozzolo, Giacomo Boracchi, Olivier Caelen, Cesare Alippi, and Gianluca Bontempi 15/07/2015 IEEE IJCNN 2015, Killarney, Ireland 1/ 22
2 INTRODUCTION Fraud Detection is notably a challenging problem because of concept drift (i.e. customers habits evolve) class unbalance (i.e. genuine transactions far outnumber frauds) uncertain class labels (i.e. some frauds are not reported or reported with large delay and few transactions can be timely investigated) 2/ 22
3 INTRODUCTION II Fraud-detection systems (FDSs) differ from a classification tasks: only a small set of supervised samples is provided by human investigators (they check few alerts). the labels of the majority of transactions are available only several days later (after customers have report unauthorized transactions). 3/ 22
4 PROBLEM FORMULATION We formalise FD as a classification problem: At day t, classifier K t 1 (trained on t 1) associates to each feature vector x R n, a score P Kt 1 (+ x). The k transactions with largest P Kt 1 (+ x) define the alerts A t reported to the investigators. Investigators provide feedbacks F t about the alerts in A t, defining a set of k supervised couples (x, y) F t = {(x, y), x A t }, (1) F t are the only immediate supervised samples. 4/ 22
5 PROBLEM FORMULATION II At day t, delayed supervised couples D t δ are transactions that have not been checked by investigators, but their label is assumed to be correct after that δ days have elapsed. Supervised%samples% t δ Time% t 1 t D t δ F t Delayed%samples% Feedbacks% All%fraudulent%transac9ons%of%a%day% All%genuine%transac9ons%of%a%day% Fraudulent%transac9ons%in%the%feedback% Genuine%transac9ons%in%the%feedback% Figure : The supervised samples available at day t include: i) feedbacks of the first δ days and ii) delayed couples occurred before the δ th day. 5/ 22
6 F t are a small set of risky transactions according the FDS. Fraudulent%transac9ons%in% S D t δ contains all the occurred t Genuine%transac9ons%in% S t transactions in a day ( 99% genuine transactions). Fraudulent%feedback%in%% F t Genuine%feedback%in%% F t Time% F t 1 D t 7 F t 6 F t 5 F t 4 F t 3 F t 2 F t Day'1' D t 8 F t 1 D t 7 F t 6 F t 5 F t 4 F t 3 F t 2 F t Day'2' D t 9 D t 8 F t 1 D t 7 F t 6 F t 5 F t 4 F t 3 F t 2 F t Day'3' Figure : Everyday we have a new set of feedbacks (F t, F t 1,..., F t (δ 1) ) from the first δ days and a new set of delayed transactions occurred on the δ th day (D t δ ). In this Figure we assume δ = 7. 6/ 22
7 ACCURACY MEASURE FOR A FDS The goal of a FDS is to return accurate alerts, thus the highest precision in A t. This precision can be measured by the quantity p k (t) = #{(x, y) F t s.t. y = +} k (2) where p k (t) is the proportion of frauds in the top k transactions with the highest likelihood of frauds ([1]). 7/ 22
8 LEARNING STRATEGY Learning from feedbacks F t is a different problem than learning from delayed samples in D t δ : F t provides recent, up-to-date, information while D t δ might be already obsolete once it comes. Percentage of frauds in F t and D t δ is different. Supervised couples in F t are not independently drawn, but are instead selected by K t 1. A classifier trained on F t learns how to label transactions that are most likely to be fraudulent. Feedbacks and delayed transactions have to be treated separately. 8/ 22
9 CONCEPT DRIFT ADAPTATION Two conventional solutions for CD adaptation are W t and All%fraudulent%transac9ons%of%a%day% E t [6, 5]. To learn All%genuine%transac9ons%of%a%day% separately from feedbacks and delayed transactions Fraudulent%transac9ons%in%the%feedback% we propose F t, Wt D and Et D. Genuine%transac9ons%in%the%feedback% Time% D t 8 F t 1 D t 7 F t 6 F t 5 F t 4 F t 3 F t 2 F t Sliding' window' W D t F t W t D t 8 F t 1 D t 7 F t 6 F t 5 F t 4 F t 3 F t 2 F t Ensemble' M 2 M 1 F t E D t E t 9/ 22 Figure : Supervised information used by different classifiers in the ensemble and sliding window approach.
10 CLASSIFIER AGGREGATIONS W D t and E D t have to be aggregated with F t to exploit information provided by feedbacks. We combine these classifiers by averaging the posterior probabilities. Sliding window: P A W t (+ x) = P F t (+ x) + P W D t (+ x) 2 Ensemble: P A E t (+ x) = P F t (+ x) + P E D t (+ x) 2 A E t and A W t give larger influence to feedbacks on the probability estimates w.r.t E t and W t. 10/ 22
11 TWO RANDOM FOREST We used two different Random Forests (RF) classifiers depending on the fraud prevalence in the training set. for classifiers on delayed samples we used a Balanced RF [3] (undersampling before training each tree). for F t we adopted a standard RF [2] (no undersampling). 11/ 22
12 DATASETS We considered two datasets of credit card transactions: Table : Datasets Id Start day End day # Days # Instances # Features % Fraud ,830, % ,619, % In the 2013 dataset there is an average of 160k transaction per day and about 304 frauds per day, while in the 2014 dataset there is a daily average of 173k transactions and 380 frauds. 12/ 22
13 EXPERIMENTS Settings: We assume that after δ = 7 days all the transactions labels are provided (delayed supervised information) A budget of k = 100 alerts that can be checked by the investigators (F t is trained on a window of 700 feedbacks). A window of α = 16 days is used to train Wt D in Et D) (16 models Each experiments is repeated 10 times and the performance is assessed using p k. 13/ 22
14 In both 2013 and 2014 datasets, aggregations A W t outperforms the other FDSs in terms of p k. and A E t Table : Average p k in all the batches for the sliding window Dataset 2013 Dataset 2014 classifier mean sd mean sd F W D W A W Table : Average p k in all the batches for the ensemble Dataset 2013 Dataset 2014 classifier mean sd mean sd F E D E A E / 22
15 A W F W W D (a) Sliding window 2013 A W F W W D (b) Sliding window 2014 Sum of ranks from the Friedman test [4], classifiers having the same letter are not significantly different (paired t-test based upon on the ranks). A E F E E D A E F E E D (c) Ensemble 2013 (d) Ensemble / 22
16 EXPERIMENTS ON ARTIFICIAL DATASET WITH CD In the second part we artificially introduce CD in specific days by juxtaposing transactions acquired in different times of the year. Table : Datasets with Artificially Introduced CD Id Start 2013 End 2013 Start 2014 End 2014 CD CD CD / 22
17 Table : Average p k in the month before and after CD for the sliding window approach (a) Before CD CD1 CD2 CD3 classifier mean sd mean sd mean sd F W D W A W (b) After CD CD1 CD2 CD3 classifier mean sd mean sd mean sd F W D W A W / 22
18 A W W A W W (e) Sliding window strategies on dataset CD1 (f) Sliding window strategies on dataset CD2 W AW A E E (g) Sliding window strategies on dataset CD3 (h) Ensemble strategies on dataset CD3 Figure : Average p k per day (the higher the better) for classifiers on datasets with artificial concept drift smoothed using moving average of 15 days. The vertical bar denotes the date of the concept drift. 18/ 22
19 CONCLUDING REMARKS We notice that: F t outperforms classifiers on delayed samples (trained on obsolete couples). F t outperforms classifiers trained on the entire supervised dataset (dominated by delayed samples). Aggregation gives larger influence to feedbacks. 19/ 22
20 CONCLUSION We formalise a real-world FDS framework that meets realistic working conditions. In a real-world scenario, there is a strong alert-feedback interaction that has to be explicitly considered Feedbacks and delayed samples should be separately handled when training a FDS Aggregating two distinct classifiers is an effective strategy and that it enables a prompter adaptation in concept drifting environments 20/ 22
21 FUTURE WORK Future work will focus on: Adaptive aggregation of F t and the classifier trained on delayed samples. Study the sample selection bias in F t introduced by alert-feedback interaction. 21/ 22
22 BIBLIOGRAPHY [1] S. Bhattacharyya, S. Jha, K. Tharakunnel, and J. C. Westland. Data mining for credit card fraud: A comparative study. Decision Support Systems, 50(3): , [2] L. Breiman. Random forests. Machine learning, 45(1):5 32, [3] C. Chen, A. Liaw, and L. Breiman. Using random forest to learn imbalanced data. University of California, Berkeley, [4] M. Friedman. The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of the American Statistical Association, 32(200): , [5] J. Gao, B. Ding, W. Fan, J. Han, and P. S. Yu. Classifying data streams with skewed class distributions and concept drifts. Internet Computing, 12(6):37 49, [6] D. K. Tasoulis, N. M. Adams, and D. J. Hand. Unsupervised clustering in streaming data. In ICDM Workshops, pages , / 22
Using One-Versus-All classification ensembles to support modeling decisions in data stream mining
Using One-Versus-All classification ensembles to support modeling decisions in data stream mining Patricia E.N. Lutu Department of Computer Science, University of Pretoria, South Africa [email protected]
Using multiple models: Bagging, Boosting, Ensembles, Forests
Using multiple models: Bagging, Boosting, Ensembles, Forests Bagging Combining predictions from multiple models Different models obtained from bootstrap samples of training data Average predictions or
A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions
A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions Jing Gao Wei Fan Jiawei Han Philip S. Yu University of Illinois at Urbana-Champaign IBM T. J. Watson Research Center
Decision Support Systems
Decision Support Systems 50 (2011) 602 613 Contents lists available at ScienceDirect Decision Support Systems journal homepage: www.elsevier.com/locate/dss Data mining for credit card fraud: A comparative
The Artificial Prediction Market
The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory
Comparison of Data Mining Techniques used for Financial Data Analysis
Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract
Predicting borrowers chance of defaulting on credit loans
Predicting borrowers chance of defaulting on credit loans Junjie Liang ([email protected]) Abstract Credit score prediction is of great interests to banks as the outcome of the prediction algorithm
CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.
CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes
Random Forest Based Imbalanced Data Cleaning and Classification
Random Forest Based Imbalanced Data Cleaning and Classification Jie Gu Software School of Tsinghua University, China Abstract. The given task of PAKDD 2007 data mining competition is a typical problem
E-commerce Transaction Anomaly Classification
E-commerce Transaction Anomaly Classification Minyong Lee [email protected] Seunghee Ham [email protected] Qiyi Jiang [email protected] I. INTRODUCTION Due to the increasing popularity of e-commerce
Class Imbalance Learning in Software Defect Prediction
Class Imbalance Learning in Software Defect Prediction Dr. Shuo Wang [email protected] University of Birmingham Research keywords: ensemble learning, class imbalance learning, online learning Shuo Wang
Review of Ensemble Based Classification Algorithms for Nonstationary and Imbalanced Data
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 16, Issue 1, Ver. IX (Feb. 2014), PP 103-107 Review of Ensemble Based Classification Algorithms for Nonstationary
Distributed forests for MapReduce-based machine learning
Distributed forests for MapReduce-based machine learning Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi Chubu University, Japan. NTT Communication
Using Random Forest to Learn Imbalanced Data
Using Random Forest to Learn Imbalanced Data Chao Chen, [email protected] Department of Statistics,UC Berkeley Andy Liaw, andy [email protected] Biometrics Research,Merck Research Labs Leo Breiman,
COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments
Contents List of Figures Foreword Preface xxv xxiii xv Acknowledgments xxix Chapter 1 Fraud: Detection, Prevention, and Analytics! 1 Introduction 2 Fraud! 2 Fraud Detection and Prevention 10 Big Data for
Performance Metrics for Graph Mining Tasks
Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics Supervised Learning Performance Metrics Unsupervised Learning Performance Metrics Optimizing Metrics Statistical
Leveraging Ensemble Models in SAS Enterprise Miner
ABSTRACT Paper SAS133-2014 Leveraging Ensemble Models in SAS Enterprise Miner Miguel Maldonado, Jared Dean, Wendy Czika, and Susan Haller SAS Institute Inc. Ensemble models combine two or more models to
How To Solve The Kd Cup 2010 Challenge
A Lightweight Solution to the Educational Data Mining Challenge Kun Liu Yan Xing Faculty of Automation Guangdong University of Technology Guangzhou, 510090, China [email protected] [email protected]
How To Classify Data Stream Mining
JOURNAL OF COMPUTERS, VOL. 8, NO. 11, NOVEMBER 2013 2873 A Semi-supervised Ensemble Approach for Mining Data Streams Jing Liu 1,2, Guo-sheng Xu 1,2, Da Xiao 1,2, Li-ze Gu 1,2, Xin-xin Niu 1,2 1.Information
Data Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier
A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier G.T. Prasanna Kumari Associate Professor, Dept of Computer Science and Engineering, Gokula Krishna College of Engg, Sullurpet-524121,
Dan French Founder & CEO, Consider Solutions
Dan French Founder & CEO, Consider Solutions CONSIDER SOLUTIONS Mission Solutions for World Class Finance Footprint Financial Control & Compliance Risk Assurance Process Optimization CLIENTS CONTEXT The
THE HYBRID CART-LOGIT MODEL IN CLASSIFICATION AND DATA MINING. Dan Steinberg and N. Scott Cardell
THE HYBID CAT-LOGIT MODEL IN CLASSIFICATION AND DATA MINING Introduction Dan Steinberg and N. Scott Cardell Most data-mining projects involve classification problems assigning objects to classes whether
REVIEW OF ENSEMBLE CLASSIFICATION
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IJCSMC, Vol. 2, Issue.
On the Effectiveness of Obfuscation Techniques in Online Social Networks
On the Effectiveness of Obfuscation Techniques in Online Social Networks Terence Chen 1,2, Roksana Boreli 1,2, Mohamed-Ali Kaafar 1,3, and Arik Friedman 1,2 1 NICTA, Australia 2 UNSW, Australia 3 INRIA,
Better credit models benefit us all
Better credit models benefit us all Agenda Credit Scoring - Overview Random Forest - Overview Random Forest outperform logistic regression for credit scoring out of the box Interaction term hypothesis
Improving Credit Card Fraud Detection with Calibrated Probabilities
Improving Credit Card Fraud Detection with Calibrated Probabilities Alejandro Correa Bahnsen, Aleksandar Stojanovic, Djamila Aouada and Björn Ottersten Interdisciplinary Centre for Security, Reliability
Adaptive Classification Algorithm for Concept Drifting Electricity Pricing Data Streams
Adaptive Classification Algorithm for Concept Drifting Electricity Pricing Data Streams Pramod D. Patil Research Scholar Department of Computer Engineering College of Engg. Pune, University of Pune Parag
Selecting Data Mining Model for Web Advertising in Virtual Communities
Selecting Data Mining for Web Advertising in Virtual Communities Jerzy Surma Faculty of Business Administration Warsaw School of Economics Warsaw, Poland e-mail: [email protected] Mariusz Łapczyński
Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05
Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification
Machine Learning. Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos)
Machine Learning Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos) What Is Machine Learning? A computer program is said to learn from experience E with respect to some class of
Knowledge-based systems and the need for learning
Knowledge-based systems and the need for learning The implementation of a knowledge-based system can be quite difficult. Furthermore, the process of reasoning with that knowledge can be quite slow. This
Knowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19 - Bagging. Tom Kelsey. Notes
Knowledge Discovery and Data Mining Lecture 19 - Bagging Tom Kelsey School of Computer Science University of St Andrews http://tom.host.cs.st-andrews.ac.uk [email protected] Tom Kelsey ID5059-19-B &
Data Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Credit Card Fraud Detection Using Meta-Learning: Issues 1 and Initial Results
From: AAAI Technical Report WS-97-07. Compilation copyright 1997, AAAI (www.aaai.org). All rights reserved. Credit Card Fraud Detection Using Meta-Learning: Issues 1 and Initial Results Salvatore 2 J.
Azure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
Classification of Bad Accounts in Credit Card Industry
Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification
Machine Learning Capacity and Performance Analysis and R
Machine Learning and R May 3, 11 30 25 15 10 5 25 15 10 5 30 25 15 10 5 0 2 4 6 8 101214161822 0 2 4 6 8 101214161822 0 2 4 6 8 101214161822 100 80 60 40 100 80 60 40 100 80 60 40 30 25 15 10 5 25 15 10
Classifier and Cluster Ensembles for Mining Concept Drifting Data Streams
Classifier and Cluster Ensembles for Mining Concept Drifting Data Streams Peng Zhang, Xingquan Zhu, Jianlong Tan, Li Guo Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190,
Mimicking human fake review detection on Trustpilot
Mimicking human fake review detection on Trustpilot [DTU Compute, special course, 2015] Ulf Aslak Jensen Master student, DTU Copenhagen, Denmark Ole Winther Associate professor, DTU Copenhagen, Denmark
Data Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
MS1b Statistical Data Mining
MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to
Equity forecast: Predicting long term stock price movement using machine learning
Equity forecast: Predicting long term stock price movement using machine learning Nikola Milosevic School of Computer Science, University of Manchester, UK [email protected] Abstract Long
How To Identify A Churner
2012 45th Hawaii International Conference on System Sciences A New Ensemble Model for Efficient Churn Prediction in Mobile Telecommunication Namhyoung Kim, Jaewook Lee Department of Industrial and Management
How To Perform An Ensemble Analysis
Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY 10598 Outlier Ensembles Keynote, Outlier Detection and Description Workshop, 2013 Based on the ACM SIGKDD Explorations Position Paper: Outlier
On the effect of data set size on bias and variance in classification learning
On the effect of data set size on bias and variance in classification learning Abstract Damien Brain Geoffrey I Webb School of Computing and Mathematics Deakin University Geelong Vic 3217 With the advent
Data Mining in CRM & Direct Marketing. Jun Du The University of Western Ontario [email protected]
Data Mining in CRM & Direct Marketing Jun Du The University of Western Ontario [email protected] Outline Why CRM & Marketing Goals in CRM & Marketing Models and Methodologies Case Study: Response Model Case
Data Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification
Making the Most of Missing Values: Object Clustering with Partial Data in Astronomy
Astronomical Data Analysis Software and Systems XIV ASP Conference Series, Vol. XXX, 2005 P. L. Shopbell, M. C. Britton, and R. Ebert, eds. P2.1.25 Making the Most of Missing Values: Object Clustering
DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES
DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 [email protected]
Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods
Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods Jerzy B laszczyński 1, Krzysztof Dembczyński 1, Wojciech Kot lowski 1, and Mariusz Paw lowski 2 1 Institute of Computing
Detection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup
Network Anomaly Detection A Machine Learning Perspective Dhruba Kumar Bhattacharyya Jugal Kumar KaKta»C) CRC Press J Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint of the Taylor
Chapter 8. Final Results on Dutch Senseval-2 Test Data
Chapter 8 Final Results on Dutch Senseval-2 Test Data The general idea of testing is to assess how well a given model works and that can only be done properly on data that has not been seen before. Supervised
CLASS imbalance learning refers to a type of classification
IEEE TRANSACTIONS ON SYSTEMS, MAN AND CYBERNETICS, PART B Multi-Class Imbalance Problems: Analysis and Potential Solutions Shuo Wang, Member, IEEE, and Xin Yao, Fellow, IEEE Abstract Class imbalance problems
Keywords Data mining, Classification Algorithm, Decision tree, J48, Random forest, Random tree, LMT, WEKA 3.7. Fig.1. Data mining techniques.
International Journal of Emerging Research in Management &Technology Research Article October 2015 Comparative Study of Various Decision Tree Classification Algorithm Using WEKA Purva Sewaiwar, Kamal Kant
Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016
Network Machine Learning Research Group S. Jiang Internet-Draft Huawei Technologies Co., Ltd Intended status: Informational October 19, 2015 Expires: April 21, 2016 Abstract Network Machine Learning draft-jiang-nmlrg-network-machine-learning-00
A Data Generator for Multi-Stream Data
A Data Generator for Multi-Stream Data Zaigham Faraz Siddiqui, Myra Spiliopoulou, Panagiotis Symeonidis, and Eleftherios Tiakas University of Magdeburg ; University of Thessaloniki. [siddiqui,myra]@iti.cs.uni-magdeburg.de;
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM
AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM ABSTRACT Luis Alexandre Rodrigues and Nizam Omar Department of Electrical Engineering, Mackenzie Presbiterian University, Brazil, São Paulo [email protected],[email protected]
Analyzing PETs on Imbalanced Datasets When Training and Testing Class Distributions Differ
Analyzing PETs on Imbalanced Datasets When Training and Testing Class Distributions Differ David Cieslak and Nitesh Chawla University of Notre Dame, Notre Dame IN 46556, USA {dcieslak,nchawla}@cse.nd.edu
Chapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
Data Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka ([email protected]) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
Credit Card Fraud Detection Using Meta-Learning: Issues and Initial Results 1
Credit Card Fraud Detection Using Meta-Learning: Issues and Initial Results 1 Salvatore J. Stolfo, David W. Fan, Wenke Lee and Andreas L. Prodromidis Department of Computer Science Columbia University
ClusterOSS: a new undersampling method for imbalanced learning
1 ClusterOSS: a new undersampling method for imbalanced learning Victor H Barella, Eduardo P Costa, and André C P L F Carvalho, Abstract A dataset is said to be imbalanced when its classes are disproportionately
Generalizing Random Forests Principles to other Methods: Random MultiNomial Logit, Random Naive Bayes, Anita Prinzie & Dirk Van den Poel
Generalizing Random Forests Principles to other Methods: Random MultiNomial Logit, Random Naive Bayes, Anita Prinzie & Dirk Van den Poel Copyright 2008 All rights reserved. Random Forests Forest of decision
Statistics in Retail Finance. Chapter 7: Fraud Detection in Retail Credit
Statistics in Retail Finance Chapter 7: Fraud Detection in Retail Credit 1 Overview > Detection of fraud remains an important issue in retail credit. Methods similar to scorecard development may be employed,
A Novel Ensemble Learning-based Approach for Click Fraud Detection in Mobile Advertising
A Novel Ensemble Learning-based Approach for Click Fraud Detection in Mobile Advertising Kasun S. Perera 1, Bijay Neupane 1, Mustafa Amir Faisal 2, Zeyar Aung 1, and Wei Lee Woon 1 1 Institute Center for
Addressing the Class Imbalance Problem in Medical Datasets
Addressing the Class Imbalance Problem in Medical Datasets M. Mostafizur Rahman and D. N. Davis the size of the training set is significantly increased [5]. If the time taken to resample is not considered,
Credit Card Fraud Detection Using Self Organised Map
International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 13 (2014), pp. 1343-1348 International Research Publications House http://www. irphouse.com Credit Card Fraud
Financial Statement Fraud Detection: An Analysis of Statistical and Machine Learning Algorithms
Financial Statement Fraud Detection: An Analysis of Statistical and Machine Learning Algorithms Johan Perols Assistant Professor University of San Diego, San Diego, CA 92110 [email protected] April
Knowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right
6.2.8 Neural networks for data mining
6.2.8 Neural networks for data mining Walter Kosters 1 In many application areas neural networks are known to be valuable tools. This also holds for data mining. In this chapter we discuss the use of neural
BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL
The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University
Data Mining & Data Stream Mining Open Source Tools
Data Mining & Data Stream Mining Open Source Tools Darshana Parikh, Priyanka Tirkha Student M.Tech, Dept. of CSE, Sri Balaji College Of Engg. & Tech, Jaipur, Rajasthan, India Assistant Professor, Dept.
Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD
Predictive Analytics Techniques: What to Use For Your Big Data March 26, 2014 Fern Halper, PhD Presenter Proven Performance Since 1995 TDWI helps business and IT professionals gain insight about data warehousing,
Introducing diversity among the models of multi-label classification ensemble
Introducing diversity among the models of multi-label classification ensemble Lena Chekina, Lior Rokach and Bracha Shapira Ben-Gurion University of the Negev Dept. of Information Systems Engineering and
Observation and Findings
Chapter 6 Observation and Findings 6.1. Introduction This chapter discuss in detail about observation and findings based on survey performed. This research work is carried out in order to find out network
Package acrm. R topics documented: February 19, 2015
Package acrm February 19, 2015 Type Package Title Convenience functions for analytical Customer Relationship Management Version 0.1.1 Date 2014-03-28 Imports dummies, randomforest, kernelfactory, ada Author
Graph Mining and Social Network Analysis
Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann
Data Mining for Business Analytics
Data Mining for Business Analytics Lecture 2: Introduction to Predictive Modeling Stern School of Business New York University Spring 2014 MegaTelCo: Predicting Customer Churn You just landed a great analytical
Machine Learning. CS 188: Artificial Intelligence Naïve Bayes. Example: Digit Recognition. Other Classification Tasks
CS 188: Artificial Intelligence Naïve Bayes Machine Learning Up until now: how use a model to make optimal decisions Machine learning: how to acquire a model from data / experience Learning parameters
Monday Morning Data Mining
Monday Morning Data Mining Tim Ruhe Statistische Methoden der Datenanalyse Outline: - data mining - IceCube - Data mining in IceCube Computer Scientists are different... Fakultät Physik Fakultät Physik
203.4770: Introduction to Machine Learning Dr. Rita Osadchy
203.4770: Introduction to Machine Learning Dr. Rita Osadchy 1 Outline 1. About the Course 2. What is Machine Learning? 3. Types of problems and Situations 4. ML Example 2 About the course Course Homepage:
The Impact of Big Data on Classic Machine Learning Algorithms. Thomas Jensen, Senior Business Analyst @ Expedia
The Impact of Big Data on Classic Machine Learning Algorithms Thomas Jensen, Senior Business Analyst @ Expedia Who am I? Senior Business Analyst @ Expedia Working within the competitive intelligence unit
STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and
Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table
Gerry Hobbs, Department of Statistics, West Virginia University
Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit
CLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA
CLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA Professor Yang Xiang Network Security and Computing Laboratory (NSCLab) School of Information Technology Deakin University, Melbourne, Australia http://anss.org.au/nsclab
ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA
ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA D.Lavanya 1 and Dr.K.Usha Rani 2 1 Research Scholar, Department of Computer Science, Sree Padmavathi Mahila Visvavidyalayam, Tirupati, Andhra Pradesh,
