Boosted Decision Trees for Word Recognition in Handwritten Document Retrieval

Size: px
Start display at page:

Download "Boosted Decision Trees for Word Recognition in Handwritten Document Retrieval"

Transcription

1 5 February 2009 Padova, Italy Boosted Decision Trees for Word Recognition in Handwritten Document Retrieval Howe, N.R., Rath, T.M. and Manmatha, R. Department of Computer Science, University of Massachusetts SIGIR 2005 published by ACM, New York Information Management Research Group (IMS) Department of Information Engineering University of Padua, Italy

2 Outline Introduction to recognition and retrieval of handwritten documents Classification Algorithm: AdaBoost and Decision trees Classification Experiments Language Models for Retrieval Conclusions 2

3 Introduction Recognition and retrieval of off-line hand-written documents based upon word classification Decision tree with normalized pixels as feature form the basis for AdaBoost Problem of skewed distribution of class frequencies Experiments done on the GW20 and GW100 corpus Retrieval is done using a language model over recognized words 3

4 Introduction The main goal is to offer access to world historical handwritten documents Often HW works on limited vocabularies (postal address) Historical documents add complexity due to ink bleeding or dirt on the paper Use pixels in normalized word image at multiple scales (image pyramids) as features Propose an innovative procedure to create additional training data 4

5 The Boosting Approach Boosting is a classification technique that determines its prediction via the weighted vote of a diverse set of base classifiers each of which has been trained on a different weighting of the trained data AdaBoost trains successive version of its base classifier focusing on hard-to-classify examples It can use a simple classifier but stronger classifiers get better results 5

6 AdaBoost in brief Introduced in 1995 by Freund and Schapire in A decision-theoretic generalization of the on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):

7 AdaBoost in brief Introduced in 1995 by Freund and Schapire in A decision-theoretic generalization of the on-line learning and an application , to boosting. Journal of Computer and System Sciences, 55(1): Reference: Freund, Y. and Schapire, R. E. A Short Introduction to Boosting, Journal of Japanese Society for Artificial Intelligence, 14(5): 6

8 AdaBoost in brief Introduced in 1995 by Freund and Schapire in A decision-theoretic generalization of the on-line learning and an application , to boosting. Journal of Computer and System Sciences, 55(1): Reference: Freund, Y. and Schapire, R. E. A Short Introduction to Boosting, Journal of Japanese Society for Artificial Intelligence, 14(5): Binary case 6

9 AdaBoost in brief Introduced in 1995 by Freund and Schapire in A decision-theoretic generalization of the on-line learning and an application , to boosting. Journal of Computer and System Sciences, 55(1): Reference: Freund, Y. and Schapire, R. E. A Short Introduction to Boosting, Journal of Japanese Society for Artificial Intelligence, 14(5): All weights are set equally 6

10 AdaBoost in brief Introduced in 1995 by Freund and Schapire in A decision-theoretic generalization of the on-line learning and an application , to boosting. Journal of Computer and System Sciences, 55(1): Reference: Freund, Y. and Schapire, R. E. A Short Introduction to Boosting, Journal of Japanese Society for Artificial Intelligence, 14(5): Find a weak hypothesis appropriate for the distribution Dt 6

11 AdaBoost in brief Introduced in 1995 by Freund and Schapire in A decision-theoretic generalization of the on-line learning and an application , to boosting. Journal of Computer and System Sciences, 55(1): Reference: Freund, Y. and Schapire, R. E. A Short Introduction to Boosting, Journal of Japanese Society for Artificial Intelligence, 14(5): The error measures the goodness of the hypothesis 6

12 AdaBoost in brief Introduced in 1995 by Freund and Schapire in A decision-theoretic generalization of the on-line learning and an application , to boosting. Journal of Computer and System Sciences, 55(1): Reference: Freund, Y. and Schapire, R. E. A Short Introduction to Boosting, Journal of Japanese Society for Artificial Intelligence, 14(5): AdaBoost chooses the parameter αt that measures the importance assigned to ht αt 0 if εt 1 /2 6

13 AdaBoost in brief Introduced in 1995 by Freund and Schapire in A decision-theoretic generalization of the on-line learning and an application , to boosting. Journal of Computer and System Sciences, 55(1): Reference: Freund, Y. and Schapire, R. E. A Short Introduction to Boosting, Journal of Japanese Society for Artificial Intelligence, 14(5): Dt is updated increased the weight of misclassified examples concentrate on hard examples 6

14 AdaBoost in brief Introduced in 1995 by Freund and Schapire in A decision-theoretic generalization of the on-line learning and an application , to boosting. Journal of Computer and System Sciences, 55(1): Reference: Freund, Y. and Schapire, R. E. A Short Introduction to Boosting, Journal of Japanese Society for Artificial Intelligence, 14(5): H is a weighted majority vote for the T weak hypothesis where αt is the weight assigned to ht 6

15 AdaBoost in brief In Schapire, R. E. and Singer, Y. Improved boosting algorithms using confidence-rated predictions, Machine Learning 37(3) , 1999 is shown how AdaBoost can handle weak hypothesis which output real values. Consider x, h t outputs ht(x) R whose sign is the predicted label (-1 or +1) and whose magnitude ht(x) gives the measure of confidence in the prediction. AdaBoost.M1 is the extension to the multi-class case it is adequate when the weak learner is strong to achieve an accuracy of at least 50%. Extensions: AdaBoost.MH and AdaBoost.MR reducing multi-class to a larger binary problem 7

16 Choices and Problems The recognition process uses values sampled directly from the word image at varying resolutions The choice is to divide word image and not letters recognizing letters become a limiting step segmentation of individual word images is easier (image classification problem) Skewed distribution of class frequencies (Zipfian distribution)and paucity of training data for most word classes 8

17 Classification Algorithm HW words belonging to a single class have similar ink distribution (bur not identical) The position of individual features within the word will shift from example to example The pixel representation contains information about word identity that can be amplified by boosting clearer areas will contain more reliable features blurring indicates areas of inconsistency 9

18 Classification Algorithm HW words belonging to a single class have similar ink distribution (bur not identical) The position of individual features within the word will shift from example to example Composite image of 21 examples of the word Instructions. Straightforward use of the pixel is ineffective. The pixel representation contains information about word identity that can be amplified by boosting clearer areas will contain more reliable features blurring indicates areas of inconsistency 9

19 Common framework Pixels used as features for word image classification Word image mapped into a common pixel grid Images scaled and translated horizontal line span: (0,0) to (1,0) resampling each image to a common grid will produce common pixel representation Long and short words (horizontal and vertical dimensions) astronomic data sizes Pyramid approach 10

20 Pyramid Approach Define a family of standard grids base grid Φ o covering ([0,1], [-0.5, 0.5]) broken into 32x32 px Refine grids cover the same square region with double resolution 64x64 px Like a tree in which each Φ k has 4 children in Φk+1 The standard image usually don t cover the full vertical extent of the grid portions above and below the edges of standard image may be represented using a single default value Data need only be stored for Φk with resolution up to that of the reference image. 11

21 Pyramid Approach Define a family of standard grids base grid Φ o covering ([0,1], [-0.5, 0.5]) broken into 32x32 px Refine grids cover the same square region with double resolution 64x64 px Like a tree in which each Φ k has 4 children in Φk+1 The standard image usually don t cover the full vertical extent of the grid portions above and below the edges of standard This square image area captures may be all represented the detail of interest for most words using a single default value Data need only be stored for Φk with resolution up to that of the reference image. 11

22 Pyramid Approach Define a family of standard grids base grid Φ o covering ([0,1], [-0.5, 0.5]) broken into 32x32 px Refine grids cover the same square region with double resolution 64x64 px Like a tree in which each Φ k has 4 children in Φk+1 The standard image usually don t cover the full vertical extent of the grid portions above and below the edges of standard image may be represented using a single default value Data need only be stored for Φk with resolution up to that of the reference image. 11

23 Boosting and Decision Trees Word image recognition has many potential class to use AdaBoost, a classifier with at least 50% accuracy is needed Decision Trees are the foremost option well understood achieve arbitrary accuracy on the training data in practice Se si va avanti fino ad avere un solo training At each node the training examples are split into 2 example per foglia si sub-groups by comparing the value of raggiunge a chosen accuratezza pixel in 100%. -> overfit the tree each to a chosen threshold che deve essere pruned rimuovendo rami statisticamente poco bell A tree branch growth is stopped when the contained subset is dominated by a majority class 12

24 C4.5 C4.5 provides the algorithm for building the decision tree, with some modification designed to support the grid pyramid data structure C4.5 builds decision trees from a set of training data using the concept of information entropy Training data is a set S = s1, s2,..., sn of already classify samples, where si = x1, x2,..., xm xj = feature Training data is augmented with a vector C = c1, c2,..., cv where ci represents the class that each sample belongs to. It uses the fact that each attribute of the data can be used to make a decision that splits the data into smaller subsets 13

25 C4.5 C4.5 Reference: provides Quinlan, the J. algorithm R. C4.5: Programs for building for Machine the Learning. decision tree, with some modification Morgan Kaufmann, designed to support the grid pyramid data structure C4.5 builds decision trees from a set of training data using the concept of information entropy Training data is a set S = s1, s2,..., sn of already classify samples, where si = x1, x2,..., xm xj = feature Training data is augmented with a vector C = c1, c2,..., cv where ci represents the class that each sample belongs to. It uses the fact that each attribute of the data can be used to make a decision that splits the data into smaller subsets 13

26 C4.5 for images pyramid At each node a feature (i.e. pixel location) and a threshold value must be chosen as split value exhaustiveness is not possible Only Φ o is exhaustively examined location and threshold offering the greatest information gain is retained The search proceeds selectively to its children in Φ 1, from there to the children of the best of those locations, and so on until the maximum resolution available is reached The grid level, location and threshold with the highest information gain becomes the decision criterion for the node 14

27 Boosting Single trees do not generalize well for hand-written word images (1) The base classifier is normally generated from the training data (2)AdaBoost raises the weights of misclassify elements forcing base classifier to work harder (3)After many rounds of boosting, a weighted vote classifies the training set perfectly and shows good generalization to unseen examples (4)In practice after a certain # of rounds (here: 200) the results don t improve significantly 15

28 Supplementary Training Examples Problem: paucity of training examples for many classes makes generalization difficult Zipf law few examples for many words 57% of the words appear only one time in the test collection Solution: generate new training examples for low frequencies classes via stochastical distortion of the available example Improve overall word classification accuracy 16

29 Supplementary Training Examples Sample from the original using a grid of points whose portions have been perturbed from a uniform lattice Nearby points should be perturbed by similar amounts New image is the distortion of the old one 17

30 Classification Experiments Test collection: GW20 previously used and GW100 non overlapping with GW20 written by multiple hands manually segmented to extract images of individual words (4856 in GW20 and in GW100) all images labeled with their ASCII equivalent GW20 experiments. 19 pages for training and 1 for tests. 18

31 Classification Experiments Test collection: GW20 previously used and GW100 non overlapping with GW20 written by multiple hands manually segmented to extract images of individual words (4856 in GW20 and in GW100) all images labeled with their ASCII equivalent Single decision tree standard C4.5 grown to completion, then pruned GW20 experiments. 19 pages for training and 1 for tests. 18

32 Classification Experiments Test collection: GW20 previously used and GW100 non overlapping with GW20 written by multiple hands manually segmented to extract images of individual words (4856 in GW20 and in GW100) all images labeled with their ASCII equivalent AdaBoost + Decision Tree as base learner GW20 experiments. 19 pages for training and 1 for tests. 18

33 Classification Experiments Test collection: GW20 previously used and GW100 non overlapping with GW20 written by multiple hands manually segmented to extract images of individual words (4856 in GW20 and in GW100) all images labeled with their ASCII equivalent AdaBoost + Decision Tree + Synthetic Data No experiments with AdaBoost and simple classifier because 50% accuracy cannot be achieved GW20 experiments. 19 pages for training and 1 for tests. 18

34 Classification Experiments Test collection: GW20 previously used and GW100 non overlapping with GW20 written by multiple hands GW100: performances = manually segmented to extract images of individual words (4856 in GW20 and in GW100) + OOV words and image all images labeled qualitywith their ASCII equivalent GW20 experiments. 19 pages for training and 1 for tests. 18

35 Retrieval Language Modeling approach to retrieval Ref: Ponte, J. and Croft, W.B. A language modeling approach to Information Retrieval, SIGIR Use query likelihood formulation where documents are ranked according to P(Q D) AdaBoost provides classification rather than probabilities only the most likely label for each word image is preserved An approach can be that the probabilities are equal to their frequencies in each recognized document but many words can be misclassified 19

36 Retrieval: Regularization Schema Regularization schema based upon classification rank information Hypothesis: Rank info may be more important than actual probabilities top terms very imp. some moderate imp. etc. Infer probabilities from the rank ordered output of AdaBoost classification algorithm rank the top n classes according to scores Associate a probability to classes fitting the Zipfian distribution to rank classes 20

37 Retrieval: Regularization Schema Instead a document has one possible word for each position, now it contains a probability distribution at each position Test on Lemur with the query-likelihood ranking method Because of limited size of GW20 line retrieval is performed relevant = line containing all query terms stop-words removed GW100 allows for full page retrieval with GW20 as training examples 21

38 Retrieval: Regularization Schema Instead a document has one possible word for each position, now it contains a probability distribution at each position Test on Lemur with the query-likelihood ranking method Because of limited size of GW20 line retrieval is performed relevant = line containing all query terms stop-words removed GW100 allows for full page retrieval with GW20 as training examples 21

39 Conclusions Learning algorithms are not designed to deal with training data that exhibits highly skewed distribution of class frequencies The methodology described does not always work fine because the synthetic training data are not truly independent of the originals Performances are good for GW20 The problem is challenging for GW100 larger dataset, noise using soft classification decisions can improve the results for shorter queries 22

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore. CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes

More information

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University

More information

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk Ensembles 2 Learning Ensembles Learn multiple alternative definitions of a concept using different training

More information

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea

More information

Chapter 11 Boosting. Xiaogang Su Department of Statistics University of Central Florida - 1 -

Chapter 11 Boosting. Xiaogang Su Department of Statistics University of Central Florida - 1 - Chapter 11 Boosting Xiaogang Su Department of Statistics University of Central Florida - 1 - Perturb and Combine (P&C) Methods have been devised to take advantage of the instability of trees to create

More information

Data Mining Methods: Applications for Institutional Research

Data Mining Methods: Applications for Institutional Research Data Mining Methods: Applications for Institutional Research Nora Galambos, PhD Office of Institutional Research, Planning & Effectiveness Stony Brook University NEAIR Annual Conference Philadelphia 2014

More information

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 lakshmi.mahanra@gmail.com

More information

Active Learning with Boosting for Spam Detection

Active Learning with Boosting for Spam Detection Active Learning with Boosting for Spam Detection Nikhila Arkalgud Last update: March 22, 2008 Active Learning with Boosting for Spam Detection Last update: March 22, 2008 1 / 38 Outline 1 Spam Filters

More information

Model Combination. 24 Novembre 2009

Model Combination. 24 Novembre 2009 Model Combination 24 Novembre 2009 Datamining 1 2009-2010 Plan 1 Principles of model combination 2 Resampling methods Bagging Random Forests Boosting 3 Hybrid methods Stacking Generic algorithm for mulistrategy

More information

AdaBoost. Jiri Matas and Jan Šochman. Centre for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz

AdaBoost. Jiri Matas and Jan Šochman. Centre for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz AdaBoost Jiri Matas and Jan Šochman Centre for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz Presentation Outline: AdaBoost algorithm Why is of interest? How it works? Why

More information

REVIEW OF ENSEMBLE CLASSIFICATION

REVIEW OF ENSEMBLE CLASSIFICATION Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IJCSMC, Vol. 2, Issue.

More information

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

More information

Ensemble Data Mining Methods

Ensemble Data Mining Methods Ensemble Data Mining Methods Nikunj C. Oza, Ph.D., NASA Ames Research Center, USA INTRODUCTION Ensemble Data Mining Methods, also known as Committee Methods or Model Combiners, are machine learning methods

More information

L25: Ensemble learning

L25: Ensemble learning L25: Ensemble learning Introduction Methods for constructing ensembles Combination strategies Stacked generalization Mixtures of experts Bagging Boosting CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna

More information

Boosting. riedmiller@informatik.uni-freiburg.de

Boosting. riedmiller@informatik.uni-freiburg.de . Machine Learning Boosting Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut für Informatik Technische Fakultät Albert-Ludwigs-Universität Freiburg riedmiller@informatik.uni-freiburg.de

More information

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05 Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification

More information

On the effect of data set size on bias and variance in classification learning

On the effect of data set size on bias and variance in classification learning On the effect of data set size on bias and variance in classification learning Abstract Damien Brain Geoffrey I Webb School of Computing and Mathematics Deakin University Geelong Vic 3217 With the advent

More information

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier G.T. Prasanna Kumari Associate Professor, Dept of Computer Science and Engineering, Gokula Krishna College of Engg, Sullurpet-524121,

More information

Comparison of Data Mining Techniques used for Financial Data Analysis

Comparison of Data Mining Techniques used for Financial Data Analysis Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract

More information

How Boosting the Margin Can Also Boost Classifier Complexity

How Boosting the Margin Can Also Boost Classifier Complexity Lev Reyzin lev.reyzin@yale.edu Yale University, Department of Computer Science, 51 Prospect Street, New Haven, CT 652, USA Robert E. Schapire schapire@cs.princeton.edu Princeton University, Department

More information

Fine Particulate Matter Concentration Level Prediction by using Tree-based Ensemble Classification Algorithms

Fine Particulate Matter Concentration Level Prediction by using Tree-based Ensemble Classification Algorithms Fine Particulate Matter Concentration Level Prediction by using Tree-based Ensemble Classification Algorithms Yin Zhao School of Mathematical Sciences Universiti Sains Malaysia (USM) Penang, Malaysia Yahya

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Classification and Prediction

Classification and Prediction Classification and Prediction Slides for Data Mining: Concepts and Techniques Chapter 7 Jiawei Han and Micheline Kamber Intelligent Database Systems Research Lab School of Computing Science Simon Fraser

More information

Supporting Online Material for

Supporting Online Material for www.sciencemag.org/cgi/content/full/313/5786/504/dc1 Supporting Online Material for Reducing the Dimensionality of Data with Neural Networks G. E. Hinton* and R. R. Salakhutdinov *To whom correspondence

More information

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

More information

Local features and matching. Image classification & object localization

Local features and matching. Image classification & object localization Overview Instance level search Local features and matching Efficient visual recognition Image classification & object localization Category recognition Image classification: assigning a class label to

More information

The Role of Size Normalization on the Recognition Rate of Handwritten Numerals

The Role of Size Normalization on the Recognition Rate of Handwritten Numerals The Role of Size Normalization on the Recognition Rate of Handwritten Numerals Chun Lei He, Ping Zhang, Jianxiong Dong, Ching Y. Suen, Tien D. Bui Centre for Pattern Recognition and Machine Intelligence,

More information

On Adaboost and Optimal Betting Strategies

On Adaboost and Optimal Betting Strategies On Adaboost and Optimal Betting Strategies Pasquale Malacaria School of Electronic Engineering and Computer Science Queen Mary, University of London Email: pm@dcs.qmul.ac.uk Fabrizio Smeraldi School of

More information

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised

More information

Decision Tree Learning on Very Large Data Sets

Decision Tree Learning on Very Large Data Sets Decision Tree Learning on Very Large Data Sets Lawrence O. Hall Nitesh Chawla and Kevin W. Bowyer Department of Computer Science and Engineering ENB 8 University of South Florida 4202 E. Fowler Ave. Tampa

More information

Introduction to Learning & Decision Trees

Introduction to Learning & Decision Trees Artificial Intelligence: Representation and Problem Solving 5-38 April 0, 2007 Introduction to Learning & Decision Trees Learning and Decision Trees to learning What is learning? - more than just memorizing

More information

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam

More information

Ensemble of Classifiers Based on Association Rule Mining

Ensemble of Classifiers Based on Association Rule Mining Ensemble of Classifiers Based on Association Rule Mining Divya Ramani, Dept. of Computer Engineering, LDRP, KSV, Gandhinagar, Gujarat, 9426786960. Harshita Kanani, Assistant Professor, Dept. of Computer

More information

FilterBoost: Regression and Classification on Large Datasets

FilterBoost: Regression and Classification on Large Datasets FilterBoost: Regression and Classification on Large Datasets Joseph K. Bradley Machine Learning Department Carnegie Mellon University Pittsburgh, PA 523 jkbradle@cs.cmu.edu Robert E. Schapire Department

More information

Experiments in Web Page Classification for Semantic Web

Experiments in Web Page Classification for Semantic Web Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

Data Mining. Nonlinear Classification

Data Mining. Nonlinear Classification Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15

More information

CS570 Data Mining Classification: Ensemble Methods

CS570 Data Mining Classification: Ensemble Methods CS570 Data Mining Classification: Ensemble Methods Cengiz Günay Dept. Math & CS, Emory University Fall 2013 Some slides courtesy of Han-Kamber-Pei, Tan et al., and Li Xiong Günay (Emory) Classification:

More information

Introduction To Ensemble Learning

Introduction To Ensemble Learning Educational Series Introduction To Ensemble Learning Dr. Oliver Steinki, CFA, FRM Ziad Mohammad July 2015 What Is Ensemble Learning? In broad terms, ensemble learning is a procedure where multiple learner

More information

Online Forecasting of Stock Market Movement Direction Using the Improved Incremental Algorithm

Online Forecasting of Stock Market Movement Direction Using the Improved Incremental Algorithm Online Forecasting of Stock Market Movement Direction Using the Improved Incremental Algorithm Dalton Lunga and Tshilidzi Marwala University of the Witwatersrand School of Electrical and Information Engineering

More information

Training Methods for Adaptive Boosting of Neural Networks for Character Recognition

Training Methods for Adaptive Boosting of Neural Networks for Character Recognition Submission to NIPS*97, Category: Algorithms & Architectures, Preferred: Oral Training Methods for Adaptive Boosting of Neural Networks for Character Recognition Holger Schwenk Dept. IRO Université de Montréal

More information

A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery

A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery Runu Rathi, Diane J. Cook, Lawrence B. Holder Department of Computer Science and Engineering The University of Texas at Arlington

More information

Big Data Analytics CSCI 4030

Big Data Analytics CSCI 4030 High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising

More information

Operations Research and Knowledge Modeling in Data Mining

Operations Research and Knowledge Modeling in Data Mining Operations Research and Knowledge Modeling in Data Mining Masato KODA Graduate School of Systems and Information Engineering University of Tsukuba, Tsukuba Science City, Japan 305-8573 koda@sk.tsukuba.ac.jp

More information

Incremental SampleBoost for Efficient Learning from Multi-Class Data Sets

Incremental SampleBoost for Efficient Learning from Multi-Class Data Sets Incremental SampleBoost for Efficient Learning from Multi-Class Data Sets Mohamed Abouelenien Xiaohui Yuan Abstract Ensemble methods have been used for incremental learning. Yet, there are several issues

More information

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT

More information

Decision Trees from large Databases: SLIQ

Decision Trees from large Databases: SLIQ Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values

More information

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10 1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

More information

Extension of Decision Tree Algorithm for Stream Data Mining Using Real Data

Extension of Decision Tree Algorithm for Stream Data Mining Using Real Data Fifth International Workshop on Computational Intelligence & Applications IEEE SMC Hiroshima Chapter, Hiroshima University, Japan, November 10, 11 & 12, 2009 Extension of Decision Tree Algorithm for Stream

More information

Enhanced Boosted Trees Technique for Customer Churn Prediction Model

Enhanced Boosted Trees Technique for Customer Churn Prediction Model IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V5 PP 41-45 www.iosrjen.org Enhanced Boosted Trees Technique for Customer Churn Prediction

More information

Reasoning Component Architecture

Reasoning Component Architecture Architecture of a Spam Filter Application By Avi Pfeffer A spam filter consists of two components. In this article, based on my book Practical Probabilistic Programming, first describe the architecture

More information

Web Document Clustering

Web Document Clustering Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,

More information

Source. The Boosting Approach. Example: Spam Filtering. The Boosting Approach to Machine Learning

Source. The Boosting Approach. Example: Spam Filtering. The Boosting Approach to Machine Learning Source The Boosting Approach to Machine Learning Notes adapted from Rob Schapire www.cs.princeton.edu/~schapire CS 536: Machine Learning Littman (Wu, TA) Example: Spam Filtering problem: filter out spam

More information

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015 An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

More information

Model Trees for Classification of Hybrid Data Types

Model Trees for Classification of Hybrid Data Types Model Trees for Classification of Hybrid Data Types Hsing-Kuo Pao, Shou-Chih Chang, and Yuh-Jye Lee Dept. of Computer Science & Information Engineering, National Taiwan University of Science & Technology,

More information

Mining the Software Change Repository of a Legacy Telephony System

Mining the Software Change Repository of a Legacy Telephony System Mining the Software Change Repository of a Legacy Telephony System Jelber Sayyad Shirabad, Timothy C. Lethbridge, Stan Matwin School of Information Technology and Engineering University of Ottawa, Ottawa,

More information

Data Mining for Knowledge Management. Classification

Data Mining for Knowledge Management. Classification 1 Data Mining for Knowledge Management Classification Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Eamonn Keogh

More information

Roulette Sampling for Cost-Sensitive Learning

Roulette Sampling for Cost-Sensitive Learning Roulette Sampling for Cost-Sensitive Learning Victor S. Sheng and Charles X. Ling Department of Computer Science, University of Western Ontario, London, Ontario, Canada N6A 5B7 {ssheng,cling}@csd.uwo.ca

More information

Data Mining III: Numeric Estimation

Data Mining III: Numeric Estimation Data Mining III: Numeric Estimation Computer Science 105 Boston University David G. Sullivan, Ph.D. Review: Numeric Estimation Numeric estimation is like classification learning. it involves learning a

More information

Statistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees

Statistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees Statistical Data Mining Practical Assignment 3 Discriminant Analysis and Decision Trees In this practical we discuss linear and quadratic discriminant analysis and tree-based classification techniques.

More information

On the application of multi-class classification in physical therapy recommendation

On the application of multi-class classification in physical therapy recommendation RESEARCH Open Access On the application of multi-class classification in physical therapy recommendation Jing Zhang 1,PengCao 1,DouglasPGross 2 and Osmar R Zaiane 1* Abstract Recommending optimal rehabilitation

More information

Leveraging Ensemble Models in SAS Enterprise Miner

Leveraging Ensemble Models in SAS Enterprise Miner ABSTRACT Paper SAS133-2014 Leveraging Ensemble Models in SAS Enterprise Miner Miguel Maldonado, Jared Dean, Wendy Czika, and Susan Haller SAS Institute Inc. Ensemble models combine two or more models to

More information

KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS

KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS ABSTRACT KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS In many real applications, RDF (Resource Description Framework) has been widely used as a W3C standard to describe data in the Semantic Web. In practice,

More information

Knowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19 - Bagging. Tom Kelsey. Notes

Knowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19 - Bagging. Tom Kelsey. Notes Knowledge Discovery and Data Mining Lecture 19 - Bagging Tom Kelsey School of Computer Science University of St Andrews http://tom.host.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom Kelsey ID5059-19-B &

More information

Advanced Ensemble Strategies for Polynomial Models

Advanced Ensemble Strategies for Polynomial Models Advanced Ensemble Strategies for Polynomial Models Pavel Kordík 1, Jan Černý 2 1 Dept. of Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2 Dept. of Computer

More information

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!

More information

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether

More information

Beating the NCAA Football Point Spread

Beating the NCAA Football Point Spread Beating the NCAA Football Point Spread Brian Liu Mathematical & Computational Sciences Stanford University Patrick Lai Computer Science Department Stanford University December 10, 2010 1 Introduction Over

More information

Offline Word Spotting in Handwritten Documents

Offline Word Spotting in Handwritten Documents Offline Word Spotting in Handwritten Documents Nicholas True Department of Computer Science University of California, San Diego San Diego, CA 9500 ntrue@cs.ucsd.edu Abstract The digitization of written

More information

Data Mining Classification: Decision Trees

Data Mining Classification: Decision Trees Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous

More information

A Learning Based Method for Super-Resolution of Low Resolution Images

A Learning Based Method for Super-Resolution of Low Resolution Images A Learning Based Method for Super-Resolution of Low Resolution Images Emre Ugur June 1, 2004 emre.ugur@ceng.metu.edu.tr Abstract The main objective of this project is the study of a learning based method

More information

A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE

A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE Kasra Madadipouya 1 1 Department of Computing and Science, Asia Pacific University of Technology & Innovation ABSTRACT Today, enormous amount of data

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning. Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

More information

NAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE. Venu Govindaraju

NAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE. Venu Govindaraju NAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE Venu Govindaraju BIOMETRICS DOCUMENT ANALYSIS PATTERN RECOGNITION 8/24/2015 ICDAR- 2015 2 Towards a Globally Optimal Approach for Learning Deep Unsupervised

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Jiří Matas. Hough Transform

Jiří Matas. Hough Transform Hough Transform Jiří Matas Center for Machine Perception Department of Cybernetics, Faculty of Electrical Engineering Czech Technical University, Prague Many slides thanks to Kristen Grauman and Bastian

More information

Domain Classification of Technical Terms Using the Web

Domain Classification of Technical Terms Using the Web Systems and Computers in Japan, Vol. 38, No. 14, 2007 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J89-D, No. 11, November 2006, pp. 2470 2482 Domain Classification of Technical Terms Using

More information

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Introduction to Machine Learning Lecture 1 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Introduction Logistics Prerequisites: basics concepts needed in probability and statistics

More information

Beating the MLB Moneyline

Beating the MLB Moneyline Beating the MLB Moneyline Leland Chen llxchen@stanford.edu Andrew He andu@stanford.edu 1 Abstract Sports forecasting is a challenging task that has similarities to stock market prediction, requiring time-series

More information

Component Ordering in Independent Component Analysis Based on Data Power

Component Ordering in Independent Component Analysis Based on Data Power Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals

More information

How To Identify A Churner

How To Identify A Churner 2012 45th Hawaii International Conference on System Sciences A New Ensemble Model for Efficient Churn Prediction in Mobile Telecommunication Namhyoung Kim, Jaewook Lee Department of Industrial and Management

More information

Data Mining Techniques Chapter 6: Decision Trees

Data Mining Techniques Chapter 6: Decision Trees Data Mining Techniques Chapter 6: Decision Trees What is a classification decision tree?.......................................... 2 Visualizing decision trees...................................................

More information

Ensembles and PMML in KNIME

Ensembles and PMML in KNIME Ensembles and PMML in KNIME Alexander Fillbrunn 1, Iris Adä 1, Thomas R. Gabriel 2 and Michael R. Berthold 1,2 1 Department of Computer and Information Science Universität Konstanz Konstanz, Germany First.Last@Uni-Konstanz.De

More information

An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century

An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century Nora Galambos, PhD Senior Data Scientist Office of Institutional Research, Planning & Effectiveness Stony Brook University AIRPO

More information

Machine Learning using MapReduce

Machine Learning using MapReduce Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous

More information

UMass at TREC 2008 Blog Distillation Task

UMass at TREC 2008 Blog Distillation Task UMass at TREC 2008 Blog Distillation Task Jangwon Seo and W. Bruce Croft Center for Intelligent Information Retrieval University of Massachusetts, Amherst Abstract This paper presents the work done for

More information

Data Mining Techniques for Prognosis in Pancreatic Cancer

Data Mining Techniques for Prognosis in Pancreatic Cancer Data Mining Techniques for Prognosis in Pancreatic Cancer by Stuart Floyd A Thesis Submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUE In partial fulfillment of the requirements for the Degree

More information

BALANCE LEARNING TO RANK IN BIG DATA. Guanqun Cao, Iftikhar Ahmad, Honglei Zhang, Weiyi Xie, Moncef Gabbouj. Tampere University of Technology, Finland

BALANCE LEARNING TO RANK IN BIG DATA. Guanqun Cao, Iftikhar Ahmad, Honglei Zhang, Weiyi Xie, Moncef Gabbouj. Tampere University of Technology, Finland BALANCE LEARNING TO RANK IN BIG DATA Guanqun Cao, Iftikhar Ahmad, Honglei Zhang, Weiyi Xie, Moncef Gabbouj Tampere University of Technology, Finland {name.surname}@tut.fi ABSTRACT We propose a distributed

More information

Robust Real-Time Face Detection

Robust Real-Time Face Detection Robust Real-Time Face Detection International Journal of Computer Vision 57(2), 137 154, 2004 Paul Viola, Michael Jones 授 課 教 授 : 林 信 志 博 士 報 告 者 : 林 宸 宇 報 告 日 期 :96.12.18 Outline Introduction The Boost

More information

Journal of Asian Scientific Research COMPARISON OF THREE CLASSIFICATION ALGORITHMS FOR PREDICTING PM2.5 IN HONG KONG RURAL AREA.

Journal of Asian Scientific Research COMPARISON OF THREE CLASSIFICATION ALGORITHMS FOR PREDICTING PM2.5 IN HONG KONG RURAL AREA. Journal of Asian Scientific Research journal homepage: http://aesswebcom/journal-detailphp?id=5003 COMPARISON OF THREE CLASSIFICATION ALGORITHMS FOR PREDICTING PM25 IN HONG KONG RURAL AREA Yin Zhao School

More information

Programming Exercise 3: Multi-class Classification and Neural Networks

Programming Exercise 3: Multi-class Classification and Neural Networks Programming Exercise 3: Multi-class Classification and Neural Networks Machine Learning November 4, 2011 Introduction In this exercise, you will implement one-vs-all logistic regression and neural networks

More information

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, fabian.gruening@informatik.uni-oldenburg.de Abstract: Independent

More information

Predict Influencers in the Social Network

Predict Influencers in the Social Network Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons

More information

Effective Data Mining Using Neural Networks

Effective Data Mining Using Neural Networks IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 8, NO. 6, DECEMBER 1996 957 Effective Data Mining Using Neural Networks Hongjun Lu, Member, IEEE Computer Society, Rudy Setiono, and Huan Liu,

More information

Introducing diversity among the models of multi-label classification ensemble

Introducing diversity among the models of multi-label classification ensemble Introducing diversity among the models of multi-label classification ensemble Lena Chekina, Lior Rokach and Bracha Shapira Ben-Gurion University of the Negev Dept. of Information Systems Engineering and

More information

Getting Even More Out of Ensemble Selection

Getting Even More Out of Ensemble Selection Getting Even More Out of Ensemble Selection Quan Sun Department of Computer Science The University of Waikato Hamilton, New Zealand qs12@cs.waikato.ac.nz ABSTRACT Ensemble Selection uses forward stepwise

More information

Pattern-Aided Regression Modelling and Prediction Model Analysis

Pattern-Aided Regression Modelling and Prediction Model Analysis San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research Fall 2015 Pattern-Aided Regression Modelling and Prediction Model Analysis Naresh Avva Follow this and

More information

Consolidated Tree Classifier Learning in a Car Insurance Fraud Detection Domain with Class Imbalance

Consolidated Tree Classifier Learning in a Car Insurance Fraud Detection Domain with Class Imbalance Consolidated Tree Classifier Learning in a Car Insurance Fraud Detection Domain with Class Imbalance Jesús M. Pérez, Javier Muguerza, Olatz Arbelaitz, Ibai Gurrutxaga, and José I. Martín Dept. of Computer

More information