CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.


 Ezra James
 4 years ago
 Views:
Transcription
1 CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes of the books Introduction to Data Mining (Chap. 5) and On the Power of Ensemble: Supervised and Unsupervised Methods Reconciled (A tutorial at SDM 2010 by Jing Gao etal.).
2 Ensemble Methods Objective: To improve model performance in terms of accuracy by aggregating the predictions of multiple models How to do it: Construct a set of base models from the training data Make predictions by combing the predicted results made by each base model 2
3 3 General Idea
4 Stories of Success Milliondollar prize Improve the baseline movie recommendation approach of Netflix by 10% in accuracy The top submissions all combine several algorithms as an ensemble Data mining competitions on Kaggle Winning teams employ ensembles of classifiers 4
5 5 Netflix Prize Supervised learning task Training data is a set of users and movies, and a set of ratings (1, 2, 3, 4, 5 stars) on movies given by users. Construct a classifier that given a user and an unrated movie, correctly predicts user s rating on the movie:1, 2, 3, 4, or 5 stars $1 million prize for a 10% improvement over Netflix s current movie recommender Competition At first, singlemodel methods are developed, and performances are improved However, improvements slowed down Later, individuals and teams merged their results, and significant improvements are observed
6 Leaderboard Our final solution consists of blending 107 individual results. Predictive accuracy is substantially improved when blending multiple predictors. Our experience is that most efforts should be concentrated in deriving substantially different approaches, rather than refining a single technique. 6
7 Why Ensemble Work? Suppose there are 3 base classifiers Each classifier has error rate, ε = 0.35 or accuracy acc = Given a test instance, if we choose any one of these classifiers to make prediction, the probability that the classifier makes a wrong prediction is 35%. Base classifiers: C 1 C 2 C 3 A test instance: x 7
8 Why Ensemble Work? Combine the 3 base classifiers to predict the class label of a test instance using a majority vote on the predictions made by the base classifiers Assume classifiers be independent, then the ensemble makes a wrong prediction only if more than 2 of the base classifiers predict incorrectly 8
9 Why Ensemble Work? x Truth label: 1 A wrong prediction A precise prediction C 1 C 2 C error rate: 35%, acc: 65%
10 Why Ensemble Work? 3 i= 2 Therefore, probability that the ensemble classifier makes a wrong prediction is: 3 i ε (1 ε ) i 3 i = = That is the accuracy of the ensemble classifier is 71.83%
11 Why Ensemble Work? Suppose there are 25 independent base classifiers Therefore, probability that the ensemble classifier makes a wrong prediction is: 25 i= i ε (1 ε ) i 25 i = 0.06 That is the accuracy of the ensemble classifier is 94% 11
12 Why Ensemble Work? Some unknown distribution Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 12 Ensemble gives the global picture!
13 Why Independency Is Necessary? C 1 C 2 C 1 C
14 Necessary Conditions The base classifiers are identical (perfectly correlated) The base classifiers are independent Error rate of an ensemble of 25 binary classifiers for different base classifier error rates Observation: the ensemble classifier performs worse than the base classifiers when the base classifier error rate is larger than
15 Necessary Conditions Two necessary conditions for an ensemble classifier to perform better than a single classifier: The base classifiers should be independent of each other In practice, this condition can be relaxed that the base classifiers can be slightly correlated. The base classifiers should do better than a classifier that performs random guessing (e.g., for binary classification, accuracy should be better than 0.5) 15
16 Ensemble Learning Methods Supervised ensemble learning methods classification, regression Unsupervised ensemble learning methods clustering 16
17 17 Supervised Ensemble Methods How to generate an ensemble of classifiers? By manipulating the training set: multiple training sets are created by resampling the original data according to some sampling distribution. A classifier is then build from each training set. Bagging, Boosting By manipulating the input features: a subset of input attributes is chosen to form each training set. The subset can be either chosen randomly or using domain knowledge Random forest By manipulating the learning algorithm: applying the algorithm several times on the same training data using different parameters.
18 Supervised Ensemble Method: General Procedure 1. Let D denote the original training data, k denote the number of base classifiers, and T be the test data. 2. for i = 1 to k do 3. Create training set, D i from D. 4. Build a base classifier C i from D i. 5. end for 6. for each test record x T do 7. C * (x) = Vote(C 1 (x), C 2 (x),, C k (x)) 8. end for Majority voting (can be other schemes) 18
19 Bagging Known as bootstrap aggregating, to repeatedly sample with replacement according to a uniform probability distribution Build classifier on each bootstrap sample, which is of the size of the original data Use majority voting to determine the class label of ensemble classifier 19
20 Bagging Index of an instance Original Data Round Round C 1 C 2 Round C 3
21 Bagging A training example has a probability of 1 1/N of not being selected Its probability of ending up not in a training set D i is (1 1/N) N 1/e=0.368 A bootstrap sample D i contains approximately 63.2% of the original training data 21
22 Boosting Principles: Boost a set of weak learners to a strong learner Make records currently misclassified more important. Generally, An iterative procedure to adaptively change the distribution of training data so that the base classifiers will focus more on previously misclassified records 22
23 23 Boosting Specifically, Initially, all N records are assigned equal weights Unlike bagging, weights may change at the end of each boosting round In each boosting round, after the weights are assigned to the training examples, we can either Draw a bootstrap sample from the original data by using the weights as a sampling distribution to build a model, or Learn a model that is biased toward higherweighted examples
24 Boosting: Procedure (Resampling based on Instance Weights) 1. Initially, the examples are assigned equal weights 1/N, so that they are equally likely to be chosen for training. A sample is drawn uniformly to obtain a new training set. 2. A classifier is induced from the training set, and used to classify all the examples in the original training set 3. The weights of the training examples are updated at the end of each boosting round Records that are wrongly classified will have their weights increased Records that are classified correctly will have their weights decreased 4. Repeat Step 2 and 3 until the stopping condition is met 5. Finally, the ensemble is obtained by aggregating the base classifiers obtained from each boosting round 24
25 Boosting: Example Initially, all the examples are assigned the same weights. 1/10 1/10 1/10 1/10 1/10 1/10 1/10 1/10 1/10 1/10 Original Data Uniformly randomly sample using boostrapping Round A classifier built from the data C 1 Perform the classifier on all original instances 25 Original Data
26 Boosting: Example Adjust the weights based on weather the instances were chosen in the previous round, or misclassified in by the classifier trained in the previous around. E.g., instance 4 was misclassified in Round 1, then its weight is increased, and instance 5 was not chosen in Round 1, then its weight is increased as well, etc. 1/10 1/5 1/20 1/5 1/5 1/20 1/20 1/20 1/20 1/20 Original Data Randomly sample based on the weights of each instance. Round A classifier built from the data C 2 Perform the classifier on all instances 26 Original Data
27 Boosting: Example Adjust the weights based on weather the instances were chosen in the previous round, or misclassified in by the classifier trained in the previous around. Randomly sample based on the weights of each instance. Round As the boosting rounds proceed, examples that are the hardest to classify tend to become even more prevalent, e.g., instance 4 27
28 Alternative: Weighted Classified To learn a model that is biased toward higherweighted examples By minimizing the weighted error that is biased toward higherweighted examples E = 1 N N j= 1 w δ j ( f ( x ) y ) where w j is the weight of the instance x j, and δ(p)=1 if the predicate p is true, and 0 otherwise j j 28
29 Boosting: AdaBoost Let D = {(x i, y i ) I = 1, 2,, N} be the set of training examples In AdaBoost, let a set of base classifiers of each boosting round: f 1, f 2,, f T Error rate of each classifier: N 1 ε = ( ) i w jδ fi ( x j ) y j N 1 j= Importance of a classifier: α = i ε i ln εi 29
30 Boosting: AdaBoost Weight update: w ( i j+ 1) = w Z ( j) i j e e α α j j if if f f j j ( x ( x i i ) ) = y y i i ( j) w i where denote the weight assigned to example (x i,y i ) during the j th boosting round, and Z j is the normalization factor to ensure ( j + 1) w i i = 1 30
31 Boosting: AdaBoost If any intermediate rounds produce error rate higher than 50%, the weights are reverted back to 1/N and the resampling procedure is repeated Classification: f * ( x) = arg max α δ y T j= 1 j ( f ( x) = y) j 31
32 Illustrating AdaBoost Example: 1dimentional examples (1 attribute) with binary classes Initial weights for each data point Instances for training Original Data B Boosting Round Data points for training α = Decision boundary Misclassified examples
33 Illustrating AdaBoost Boosting B Round Importance of the corresponding classifier α = Boosting B Round α = Boosting Round B3 α = Overall Ensemble result
34 Random Forests A class of ensemble methods specifically designed for decision tree classifiers Random Forests grows many trees Each tree is generated based on the values of an independent set of random vectors, which are generated from a fixed probability Final result on classifying a new instance: voting. Forest chooses the classification result having the most votes (over all the trees in the forest) 34
35 Random Forests 35 Illustration of random forests
36 Random Forests: Algorithm Choose T: number of trees to grow Choose m < M (M is the number of total features): number of features used to calculate the best split at each node (typically 20%) For each tree Choose a training set boostrapping For each node, randomly choose m features and calculate the best split Fully grown and not pruned Use majority vote among all the trees 36
37 Random Forests: Discussions Bagging + random features Improve Accuracy Incorporate more diversity Improve Efficiency Searching among subsets of features is much faster than searching among the complete set 37
38 Combination Methods Average Simple average Weighted average Voting Majority voting Plurality voting Weighted voting Combining by learning 38
39 Average 39 Simple average: Weighted average: = = T i f i x T x f 1 * ) ( 1 ) ( 1. and 0, where ) ( 1 ) ( 1 1 * = = = = T i i i T i i i w w x f w T x f
40 40 Voting Majority voting: Every classifier votes for one class label, and the final output class label is the one that receives more than half of the votes If none of the class labels receives more than half of the votes, a rejection option will be given and the combined classifier makes no prediction. Plurality voting: Takes the class label that receives the largest number of votes as the final winner. Weighted voting: A generalized version of plurality voting by introducing weights for each classifier.
41 Combining by Learning Stacking: A general procedure where a learner is trained to combine the individual learners Individual learners: firstlevel learners Combiner: secondlevel learner, or metalearner 41
42 Combining by Learning: Illustration Suppose giving a binary classification problem, Predicted value of classifier C 1 on the instance x 1 T base classifiers Labels N instances C 1 C 2 C T Y x x x N A new vector of features for each instance x i 42
43 Combining by Learning: Illustration Given D = {(x i, y i ) i = 1, 2,, N}, where x i = [C 1 (x i ),, C T (x i )] To learn a model in terms of w = [w 1,, w T ], s.t. the difference between y i and t i = w x i is as small as possible. The w = [w 1,, w T ] are the weights for each base classifier respectively. 43
44 Combining by Learning: Avoid Overfitting Whole training dataset Used for training Used for evaluation Used for firstlevel learners Used for metalearner 44
45 Unsupervised Ensemble Methods Clustering ensembles: Given an unlabeled data set D = {x 1, x 2,, x N } An ensemble approach computes: A set of clustering solutions {C 1, C 2,, C T }, each of which maps data to a cluster: C j (x)=m A unified clustering solution C* which combines base clustering solutions by their consensus 45
46 Clustering Ensembles Index of a cluster 4 base clusterings Ensemble clustering 7 instances C 1 C 2 C 3 C 4 C* x x x x x x x
47 Clustering Ensembles: Challenges Unsupervised The correspondence between the clusters in different clustering solutions is unknown Combinatorial optimization problem is NPcomplete 47
48 Clustering Ensembles: Challenges Identical clustering results: {{x 1, x 2 }, {x 3, x 4, x 5 }, {x 6, x 7 }} C 1 C 2 C 3 C 4 C* x x x x x x x Numbers of clusters in different base clusterings can be different 48 They may not represent the same cluster!
49 Clustering Ensembles: Similaritybased Methods Input: Data set D = {x 1, x 2,, x N } Base clustering algorithms: {C 1, C 2,, C T } A base clustering algorithm C for generating final results Process: 1. For i = 1,, T 2. Form a base clustering from D with k (i) clusters 3. Derive an N N similarity matrix M (i) based on the clustering result. 4. End 5. Form the consensus similarity matrix 6. Perform C on M to generate k clusters M = Output: Ensemble clustering results obtained by C 1 T T i= 1 M ( i) 49
50 Constructing Similarity Matrix If x i and x j belong to the same cluster, then 1, otherwise 0. M (1) C 1 x 1 1 x 2 1 x 3 2 x 4 2 x 5 2 x 6 3 x 7 3 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x x x x x x x Crisp clustering 50
51 Constructing Similarity Matrix C 1 P(l x i ) x x 2 1/2 1/2 0 x 3 1/3 1/3 1/3 x 4 1/4 1/2 1/4 x 5 3/5 1/5 1/5 x 6 2/5 2/5 1/5 x x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 1 1 x 2 1/2 1/2 x 3 x 4 x 5 x 6 x 7 M (1) Soft clustering M (1) 3 ( i, j) = P( l xi ) P( l l= 1 x j ) 51
CS570 Data Mining Classification: Ensemble Methods
CS570 Data Mining Classification: Ensemble Methods Cengiz Günay Dept. Math & CS, Emory University Fall 2013 Some slides courtesy of HanKamberPei, Tan et al., and Li Xiong Günay (Emory) Classification:
More informationData Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right
More informationData Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
More informationIntroduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk
Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk Ensembles 2 Learning Ensembles Learn multiple alternative definitions of a concept using different training
More informationEnsemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 20150305
Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 20150305 Roman Kern (KTI, TU Graz) Ensemble Methods 20150305 1 / 38 Outline 1 Introduction 2 Classification
More informationChapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
More informationEnsemble Methods. Adapted from slides by Todd Holloway h8p://abeau<fulwww.com/2007/11/23/ ensemble machine learning tutorial/
Ensemble Methods Adapted from slides by Todd Holloway h8p://abeau
More informationModel Combination. 24 Novembre 2009
Model Combination 24 Novembre 2009 Datamining 1 20092010 Plan 1 Principles of model combination 2 Resampling methods Bagging Random Forests Boosting 3 Hybrid methods Stacking Generic algorithm for mulistrategy
More informationKnowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19  Bagging. Tom Kelsey. Notes
Knowledge Discovery and Data Mining Lecture 19  Bagging Tom Kelsey School of Computer Science University of St Andrews http://tom.host.cs.standrews.ac.uk twk@standrews.ac.uk Tom Kelsey ID505919B &
More informationEnsemble Learning Better Predictions Through Diversity. Todd Holloway ETech 2008
Ensemble Learning Better Predictions Through Diversity Todd Holloway ETech 2008 Outline Building a classifier (a tutorial example) Neighbor method Major ideas and challenges in classification Ensembles
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Evaluating the Accuracy of a Classifier Holdout, random subsampling, crossvalidation, and the bootstrap are common techniques for
More informationREVIEW OF ENSEMBLE CLASSIFICATION
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IJCSMC, Vol. 2, Issue.
More informationA Study Of Bagging And Boosting Approaches To Develop MetaClassifier
A Study Of Bagging And Boosting Approaches To Develop MetaClassifier G.T. Prasanna Kumari Associate Professor, Dept of Computer Science and Engineering, Gokula Krishna College of Engg, Sullurpet524121,
More informationUsing multiple models: Bagging, Boosting, Ensembles, Forests
Using multiple models: Bagging, Boosting, Ensembles, Forests Bagging Combining predictions from multiple models Different models obtained from bootstrap samples of training data Average predictions or
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 10 Sajjad Haider Fall 2012 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right
More informationLeveraging Ensemble Models in SAS Enterprise Miner
ABSTRACT Paper SAS1332014 Leveraging Ensemble Models in SAS Enterprise Miner Miguel Maldonado, Jared Dean, Wendy Czika, and Susan Haller SAS Institute Inc. Ensemble models combine two or more models to
More informationDecision Trees from large Databases: SLIQ
Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values
More informationDECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES
DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 lakshmi.mahanra@gmail.com
More informationData Analytics and Business Intelligence (8696/8697)
http: // togaware. com Copyright 2014, Graham.Williams@togaware.com 1/36 Data Analytics and Business Intelligence (8696/8697) Ensemble Decision Trees Graham.Williams@togaware.com Data Scientist Australian
More informationChapter 11 Boosting. Xiaogang Su Department of Statistics University of Central Florida  1 
Chapter 11 Boosting Xiaogang Su Department of Statistics University of Central Florida  1  Perturb and Combine (P&C) Methods have been devised to take advantage of the instability of trees to create
More informationPredicting borrowers chance of defaulting on credit loans
Predicting borrowers chance of defaulting on credit loans Junjie Liang (junjie87@stanford.edu) Abstract Credit score prediction is of great interests to banks as the outcome of the prediction algorithm
More informationL25: Ensemble learning
L25: Ensemble learning Introduction Methods for constructing ensembles Combination strategies Stacked generalization Mixtures of experts Bagging Boosting CSCE 666 Pattern Analysis Ricardo GutierrezOsuna
More informationData Mining Methods: Applications for Institutional Research
Data Mining Methods: Applications for Institutional Research Nora Galambos, PhD Office of Institutional Research, Planning & Effectiveness Stony Brook University NEAIR Annual Conference Philadelphia 2014
More informationWhy Ensembles Win Data Mining Competitions
Why Ensembles Win Data Mining Competitions A Predictive Analytics Center of Excellence (PACE) Tech Talk November 14, 2012 Dean Abbott Abbott Analytics, Inc. Blog: http://abbottanalytics.blogspot.com URL:
More informationEnsemble Data Mining Methods
Ensemble Data Mining Methods Nikunj C. Oza, Ph.D., NASA Ames Research Center, USA INTRODUCTION Ensemble Data Mining Methods, also known as Committee Methods or Model Combiners, are machine learning methods
More informationAzure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Daybyday Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
More informationTRANSACTIONAL DATA MINING AT LLOYDS BANKING GROUP
TRANSACTIONAL DATA MINING AT LLOYDS BANKING GROUP Csaba Főző csaba.fozo@lloydsbanking.com 15 October 2015 CONTENTS Introduction 04 Random Forest Methodology 06 Transactional Data Mining Project 17 Conclusions
More informationBoosting. riedmiller@informatik.unifreiburg.de
. Machine Learning Boosting Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut für Informatik Technische Fakultät AlbertLudwigsUniversität Freiburg riedmiller@informatik.unifreiburg.de
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationEnsembles and PMML in KNIME
Ensembles and PMML in KNIME Alexander Fillbrunn 1, Iris Adä 1, Thomas R. Gabriel 2 and Michael R. Berthold 1,2 1 Department of Computer and Information Science Universität Konstanz Konstanz, Germany First.Last@UniKonstanz.De
More informationNew Ensemble Combination Scheme
New Ensemble Combination Scheme Namhyoung Kim, Youngdoo Son, and Jaewook Lee, Member, IEEE Abstract Recently many statistical learning techniques are successfully developed and used in several areas However,
More informationData Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification
More informationHeritage Provider Network Health Prize Round 3 Milestone: Team crescendo s Solution
Heritage Provider Network Health Prize Round 3 Milestone: Team crescendo s Solution Rie Johnson Tong Zhang 1 Introduction This document describes our entry nominated for the second prize of the Heritage
More informationOutlier Ensembles. Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY 10598. Keynote, Outlier Detection and Description Workshop, 2013
Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY 10598 Outlier Ensembles Keynote, Outlier Detection and Description Workshop, 2013 Based on the ACM SIGKDD Explorations Position Paper: Outlier
More informationMachine Learning Capacity and Performance Analysis and R
Machine Learning and R May 3, 11 30 25 15 10 5 25 15 10 5 30 25 15 10 5 0 2 4 6 8 101214161822 0 2 4 6 8 101214161822 0 2 4 6 8 101214161822 100 80 60 40 100 80 60 40 100 80 60 40 30 25 15 10 5 25 15 10
More informationFine Particulate Matter Concentration Level Prediction by using Treebased Ensemble Classification Algorithms
Fine Particulate Matter Concentration Level Prediction by using Treebased Ensemble Classification Algorithms Yin Zhao School of Mathematical Sciences Universiti Sains Malaysia (USM) Penang, Malaysia Yahya
More informationData Mining for Knowledge Management. Classification
1 Data Mining for Knowledge Management Classification Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Eamonn Keogh
More informationMining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods
Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods Jerzy B laszczyński 1, Krzysztof Dembczyński 1, Wojciech Kot lowski 1, and Mariusz Paw lowski 2 1 Institute of Computing
More informationPractical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
More informationDistributed forests for MapReducebased machine learning
Distributed forests for MapReducebased machine learning Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi Chubu University, Japan. NTT Communication
More informationAdvanced Ensemble Strategies for Polynomial Models
Advanced Ensemble Strategies for Polynomial Models Pavel Kordík 1, Jan Černý 2 1 Dept. of Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2 Dept. of Computer
More informationClass #6: Nonlinear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Nonlinear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Nonlinear classification Linear Support Vector Machines
More informationII. RELATED WORK. Sentiment Mining
Sentiment Mining Using Ensemble Classification Models Matthew Whitehead and Larry Yaeger Indiana University School of Informatics 901 E. 10th St. Bloomington, IN 47408 {mewhiteh, larryy}@indiana.edu Abstract
More informationPredictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD
Predictive Analytics Techniques: What to Use For Your Big Data March 26, 2014 Fern Halper, PhD Presenter Proven Performance Since 1995 TDWI helps business and IT professionals gain insight about data warehousing,
More informationBetter credit models benefit us all
Better credit models benefit us all Agenda Credit Scoring  Overview Random Forest  Overview Random Forest outperform logistic regression for credit scoring out of the box Interaction term hypothesis
More informationBOOSTING  A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL
The Fifth International Conference on elearning (elearning2014), 2223 September 2014, Belgrade, Serbia BOOSTING  A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University
More informationData Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
More informationThe Artificial Prediction Market
The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory
More informationAnalysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News
Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News Sushilkumar Kalmegh Associate Professor, Department of Computer Science, Sant Gadge Baba Amravati
More informationA General Framework for Mining ConceptDrifting Data Streams with Skewed Distributions
A General Framework for Mining ConceptDrifting Data Streams with Skewed Distributions Jing Gao Wei Fan Jiawei Han Philip S. Yu University of Illinois at UrbanaChampaign IBM T. J. Watson Research Center
More informationPredict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons
More informationLatest Results on outlier ensembles available at http://www.charuaggarwal.net/theory.pdf (Clickable Link) Outlier Ensembles.
Outlier Ensembles [Position Paper] Charu C. Aggarwal IBM T. J. Watson Research Center Yorktown Heights, NY charu@us.ibm.com ABSTRACT Ensemble analysis is a widely used metaalgorithm for many data mining
More informationUsing Random Forest to Learn Imbalanced Data
Using Random Forest to Learn Imbalanced Data Chao Chen, chenchao@stat.berkeley.edu Department of Statistics,UC Berkeley Andy Liaw, andy liaw@merck.com Biometrics Research,Merck Research Labs Leo Breiman,
More informationChapter 12 Bagging and Random Forests
Chapter 12 Bagging and Random Forests Xiaogang Su Department of Statistics and Actuarial Science University of Central Florida  1  Outline A brief introduction to the bootstrap Bagging: basic concepts
More informationClassification and Prediction
Classification and Prediction Slides for Data Mining: Concepts and Techniques Chapter 7 Jiawei Han and Micheline Kamber Intelligent Database Systems Research Lab School of Computing Science Simon Fraser
More informationBeating the NCAA Football Point Spread
Beating the NCAA Football Point Spread Brian Liu Mathematical & Computational Sciences Stanford University Patrick Lai Computer Science Department Stanford University December 10, 2010 1 Introduction Over
More informationCS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott
More informationGerry Hobbs, Department of Statistics, West Virginia University
Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit
More informationCOMP 598 Applied Machine Learning Lecture 21: Parallelization methods for largescale machine learning! Big Data by the numbers
COMP 598 Applied Machine Learning Lecture 21: Parallelization methods for largescale machine learning! Instructor: (jpineau@cs.mcgill.ca) TAs: PierreLuc Bacon (pbacon@cs.mcgill.ca) Ryan Lowe (ryan.lowe@mail.mcgill.ca)
More informationComparison of Data Mining Techniques used for Financial Data Analysis
Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract
More informationMetalearning for Dynamic Integration in Ensemble Methods
Metalearning for Dynamic Integration in Ensemble Methods Fábio Pinto 12 July 2013 Faculdade de Engenharia da Universidade do Porto Ph.D. in Informatics Engineering Supervisor: Doutor Carlos Soares Cosupervisor:
More informationEnsemble of Classifiers Based on Association Rule Mining
Ensemble of Classifiers Based on Association Rule Mining Divya Ramani, Dept. of Computer Engineering, LDRP, KSV, Gandhinagar, Gujarat, 9426786960. Harshita Kanani, Assistant Professor, Dept. of Computer
More informationApplied Data Mining Analysis: A StepbyStep Introduction Using RealWorld Data Sets
Applied Data Mining Analysis: A StepbyStep Introduction Using RealWorld Data Sets http://info.salfordsystems.com/jsm2015ctw August 2015 Salford Systems Course Outline Demonstration of two classification
More informationClassification and Regression by randomforest
Vol. 2/3, December 02 18 Classification and Regression by randomforest Andy Liaw and Matthew Wiener Introduction Recently there has been a lot of interest in ensemble learning methods that generate many
More informationCOMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction
COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised
More informationBIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376
Course Director: Dr. Kayvan Najarian (DCM&B, kayvan@umich.edu) Lectures: Labs: Mondays and Wednesdays 9:00 AM 10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical
More informationClustering UE 141 Spring 2013
Clustering UE 141 Spring 013 Jing Gao SUNY Buffalo 1 Definition of Clustering Finding groups of obects such that the obects in a group will be similar (or related) to one another and different from (or
More informationBIDM Project. Predicting the contract type for IT/ITES outsourcing contracts
BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an
More informationDecompose Error Rate into components, some of which can be measured on unlabeled data
BiasVariance Theory Decompose Error Rate into components, some of which can be measured on unlabeled data BiasVariance Decomposition for Regression BiasVariance Decomposition for Classification BiasVariance
More informationClassification algorithm in Data mining: An Overview
Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department
More informationData Mining  Evaluation of Classifiers
Data Mining  Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
More informationInductive Learning in Less Than One Sequential Data Scan
Inductive Learning in Less Than One Sequential Data Scan Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson Research Hawthorne, NY 10532 {weifan,haixun,psyu}@us.ibm.com ShawHwa Lo Statistics Department,
More informationOn the application of multiclass classification in physical therapy recommendation
RESEARCH Open Access On the application of multiclass classification in physical therapy recommendation Jing Zhang 1,PengCao 1,DouglasPGross 2 and Osmar R Zaiane 1* Abstract Recommending optimal rehabilitation
More informationOverview. Evaluation Connectionist and Statistical Language Processing. Test and Validation Set. Training and Test Set
Overview Evaluation Connectionist and Statistical Language Processing Frank Keller keller@coli.unisb.de Computerlinguistik Universität des Saarlandes training set, validation set, test set holdout, stratification
More informationEnvironmental Remote Sensing GEOG 2021
Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class
More informationFast Analytics on Big Data with H20
Fast Analytics on Big Data with H20 0xdata.com, h2o.ai Tomas Nykodym, Petr Maj Team About H2O and 0xdata H2O is a platform for distributed in memory predictive analytics and machine learning Pure Java,
More informationWelcome. Data Mining: Updates in Technologies. Xindong Wu. Colorado School of Mines Golden, Colorado 80401, USA
Welcome Xindong Wu Data Mining: Updates in Technologies Dept of Math and Computer Science Colorado School of Mines Golden, Colorado 80401, USA Email: xwu@ mines.edu Home Page: http://kais.mines.edu/~xwu/
More informationCLASS distribution, i.e., the proportion of instances belonging
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS PART C: APPLICATIONS AND REVIEWS 1 A Review on Ensembles for the Class Imbalance Problem: Bagging, Boosting, and HybridBased Approaches Mikel Galar,
More informationData Mining with R. Decision Trees and Random Forests. Hugh Murrell
Data Mining with R Decision Trees and Random Forests Hugh Murrell reference books These slides are based on a book by Graham Williams: Data Mining with Rattle and R, The Art of Excavating Data for Knowledge
More informationInsurance Analytics  analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.
Insurance Analytics  analýza dat a prediktivní modelování v pojišťovnictví Pavel Kříž Seminář z aktuárských věd MFF 4. dubna 2014 Summary 1. Application areas of Insurance Analytics 2. Insurance Analytics
More informationA New Ensemble Model for Efficient Churn Prediction in Mobile Telecommunication
2012 45th Hawaii International Conference on System Sciences A New Ensemble Model for Efficient Churn Prediction in Mobile Telecommunication Namhyoung Kim, Jaewook Lee Department of Industrial and Management
More informationSelecting Data Mining Model for Web Advertising in Virtual Communities
Selecting Data Mining for Web Advertising in Virtual Communities Jerzy Surma Faculty of Business Administration Warsaw School of Economics Warsaw, Poland email: jerzy.surma@gmail.com Mariusz Łapczyński
More informationAn Ensemble Method for Large Scale Machine Learning with Hadoop MapReduce
An Ensemble Method for Large Scale Machine Learning with Hadoop MapReduce by Xuan Liu Thesis submitted to the Faculty of Graduate and Postdoctoral Studies In partial fulfillment of the requirements For
More informationSVM Ensemble Model for Investment Prediction
19 SVM Ensemble Model for Investment Prediction Chandra J, Assistant Professor, Department of Computer Science, Christ University, Bangalore Siji T. Mathew, Research Scholar, Christ University, Dept of
More informationCredit Card Fraud Detection and ConceptDrift Adaptation with Delayed Supervised Information
Credit Card Fraud Detection and ConceptDrift Adaptation with Delayed Supervised Information Andrea Dal Pozzolo, Giacomo Boracchi, Olivier Caelen, Cesare Alippi, and Gianluca Bontempi 15/07/2015 IEEE IJCNN
More informationMonday Morning Data Mining
Monday Morning Data Mining Tim Ruhe Statistische Methoden der Datenanalyse Outline:  data mining  IceCube  Data mining in IceCube Computer Scientists are different... Fakultät Physik Fakultät Physik
More informationRoulette Sampling for CostSensitive Learning
Roulette Sampling for CostSensitive Learning Victor S. Sheng and Charles X. Ling Department of Computer Science, University of Western Ontario, London, Ontario, Canada N6A 5B7 {ssheng,cling}@csd.uwo.ca
More informationData Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining
Data Mining Clustering (2) Toon Calders Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Outline Partitional Clustering Distancebased Kmeans, Kmedoids,
More informationSupervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
More informationTensor Methods for Machine Learning, Computer Vision, and Computer Graphics
Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Part I: Factorizations and Statistical Modeling/Inference Amnon Shashua School of Computer Science & Eng. The Hebrew University
More informationAcrossModel Collective Ensemble Classification
AcrossModel Collective Ensemble Classification Hoda Eldardiry and Jennifer Neville Computer Science Department Purdue University West Lafayette, IN 47907 (hdardiry neville)@cs.purdue.edu Abstract Ensemble
More informationApplied Multivariate Analysis  Big data analytics
Applied Multivariate Analysis  Big data analytics Nathalie VillaVialaneix nathalie.villa@toulouse.inra.fr http://www.nathalievilla.org M1 in Economics and Economics and Statistics Toulouse School of
More informationPerformance Metrics for Graph Mining Tasks
Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics Supervised Learning Performance Metrics Unsupervised Learning Performance Metrics Optimizing Metrics Statistical
More informationData Mining Techniques for Prognosis in Pancreatic Cancer
Data Mining Techniques for Prognosis in Pancreatic Cancer by Stuart Floyd A Thesis Submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUE In partial fulfillment of the requirements for the Degree
More informationAn Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
More informationData Mining in Direct Marketing with Purchasing Decisions Data.
Data Mining in Direct Marketing with Purchasing Decisions Data. Randy Collica Sr. Business Analyst Database Mgmt. & Compaq Computer Corp. Database Mgmt. & Overview! Business Problem to Solve.! Data Layout
More informationARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)
ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications
More informationHomework Assignment 7
Homework Assignment 7 36350, Data Mining Solutions 1. Base rates (10 points) (a) What fraction of the emails are actually spam? Answer: 39%. > sum(spam$spam=="spam") [1] 1813 > 1813/nrow(spam) [1] 0.3940448
More information