Active Learning for Multi-Class Logistic Regression



Similar documents
Classification of Documents using Text Mining Package tm

Data Mining Practical Machine Learning Tools and Techniques

Document Clustering Through Non-Negative Matrix Factorization: A Case Study of Hadoop for Computational Time Reduction of Large Scale Documents

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) ( ) Roman Kern. KTI, TU Graz

Data Mining - Evaluation of Classifiers

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier

Simple and efficient online algorithms for real world applications

STA 4273H: Statistical Machine Learning

TEXT MINING WITH LUCENE AND HADOOP: DOCUMENT CLUSTERING WITH FEATURE EXTRACTION DIPESH SHRESTHA A THESIS SUBMITTED FOR THE FULFILLMENT RESEARCH DEGREE

Decision Trees from large Databases: SLIQ

Decompose Error Rate into components, some of which can be measured on unlabeled data

Knowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19 - Bagging. Tom Kelsey. Notes

Social Media Mining. Data Mining Essentials

Data Mining. Nonlinear Classification

Supporting Online Material for

Chapter 11 Boosting. Xiaogang Su Department of Statistics University of Central Florida - 1 -

Chapter 6. The stacking ensemble approach

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.

Active Learning SVM for Blogs recommendation

Beating the MLB Moneyline

Knowledge Discovery and Data Mining

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

How To Make A Credit Risk Model For A Bank Account

Generalizing Random Forests Principles to other Methods: Random MultiNomial Logit, Random Naive Bayes, Anita Prinzie & Dirk Van den Poel

Supervised Learning (Big Data Analytics)

Gerry Hobbs, Department of Statistics, West Virginia University

CS570 Data Mining Classification: Ensemble Methods

CSCI567 Machine Learning (Fall 2014)

Binary Logistic Regression

Machine Learning. CUNY Graduate Center, Spring Professor Liang Huang.

Beating the NCAA Football Point Spread

Machine Learning.

Model Combination. 24 Novembre 2009

The Impact of Big Data on Classic Machine Learning Algorithms. Thomas Jensen, Senior Business Expedia

Anti-Spam Filter Based on Naïve Bayes, SVM, and KNN model

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski

BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts

Most-Surely vs. Least-Surely Uncertain

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S

Classification of Bad Accounts in Credit Card Industry

Data Preprocessing. Week 2

Question 2 Naïve Bayes (16 points)

Cross Validation. Dr. Thomas Jensen Expedia.com

Logistic Regression for Spam Filtering

Data Mining Techniques for Prognosis in Pancreatic Cancer

Data Mining Methods: Applications for Institutional Research

How To Identify A Churner

Multi-Class Active Learning for Image Classification

HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION

Microsoft Azure Machine learning Algorithms

Data Mining Techniques Chapter 6: Decision Trees

Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100

L25: Ensemble learning

Nominal and ordinal logistic regression

Knowledge Discovery and Data Mining

Making Sense of the Mayhem: Machine Learning and March Madness

Linear Threshold Units

Fine Particulate Matter Concentration Level Prediction by using Tree-based Ensemble Classification Algorithms

Leveraging Ensemble Models in SAS Enterprise Miner

Cross-Validation. Synonyms Rotation estimation

Lecture 10: Regression Trees

REVIEW OF ENSEMBLE CLASSIFICATION

Machine Learning. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Machine Learning Term 2012/ / 34

MERGING BUSINESS KPIs WITH PREDICTIVE MODEL KPIs FOR BINARY CLASSIFICATION MODEL SELECTION

Bootstrapping Big Data

University of Glasgow Terrier Team / Project Abacá at RepLab 2014: Reputation Dimensions Task

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

Data Mining Classification: Decision Trees

Data Mining mit der JMSL Numerical Library for Java Applications

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

Programming Exercise 3: Multi-class Classification and Neural Networks

Azure Machine Learning, SQL Data Mining and R

Using Ensemble of Decision Trees to Forecast Travel Time

A Methodology for Predictive Failure Detection in Semiconductor Fabrication

Acknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues

The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2

A Learning Algorithm For Neural Network Ensembles

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

Data Mining for Knowledge Management. Classification

ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

2 Decision tree + Cross-validation with R (package rpart)

Knowledge Discovery and Data Mining

Machine Learning for Medical Image Analysis. A. Criminisi & the InnerEye MSRC

Predicting Good Probabilities With Supervised Learning

A survey on click modeling in web search

CSC 411: Lecture 07: Multiclass Classification

Transcription:

Active Learning for Multi-Class Logistic Regression Andrew I. Schein and Lyle H. Ungar The University of Pennsylvania Department of Computer and Information Science 3330 Walnut Street Philadelphia, PA 19104-6389 USA ais, ungar @cis.upenn.edu

Goals Discover best practices for active learning with logistic regression. Develop loss function methods for logistic regression: Squared Loss Log Loss Evaluate heuristic methods: uncertainty sampling query by bagging classifier certainty Explain method underperformance, when it occurs. Schein and Ungar Active Learning for Multi-Class Logistic Regression, p. 1

Logistic Regression: A Method for Class Probability Estimation Binary x x w (1) Multiple Classes x x w xw (2) x x w is an inexact model for x is a vector of predictors for observation. is a vector of weights indexed by class. introducing a bias. Schein and Ungar Active Learning for Multi-Class Logistic Regression, p. 2

are given without their corresponding class labels Pool-based Active Learning for Classification We focus on pool-based settings: Observations. Our goal: Sequentially pick and label Example: to best improve classifier. We have a pool of 2000 documents with no topic label. Which documents do we label to build the best topic classifier? Schein and Ungar Active Learning for Multi-Class Logistic Regression, p. 3

A Variance Reduction Approach Based On -Optimality Using a Taylor expansion: Pool tr encodes the distribution of predictors in the pool. encodes the variance of the model parameter estimates. The expectation is over training sets of fixed size. The objective is to make close to in squared loss. We also explore a variant for log loss. Schein and Ungar Active Learning for Multi-Class Logistic Regression, p. 4

Methods Evaluated Baseline random instance selection bagging (interesting since it is used in QBB methods) Loss Function Approaches variance reduction (A-optimality) log loss reduction Heuristics CC Classifier Certainty QBB-MN Query by Bagging KL divergence measure QBB-AM Query by Bagging ensemble margin entropy-based uncertainty Sampling margin-based uncertainty Sampling Schein and Ungar Active Learning for Multi-Class Logistic Regression, p. 5

Data Sets and Evaluation Data Set Classes Obs Pred Maj Data Type Art 20 20,000 5 3635 artificial ArtNoisy 20 20,000 5 3047 artificial ArtConf 20 20,000 5 3161 artificial Comp2a 2 1989 6191 997 document Comp2b 2 2000 8617 1000 document LetterDB 26 20,000 16 813 char. image NewsGroups 20 18,808 16,400 997 document OptDigits 10 5620 64 1611 char. image TIMIT 20 10,080 12 1239 voice WebKB 4 4199 7543 1641 document Split data into equal size pool and test set for 10-fold x-validation. Start training at observations, stop at, run significance tests. Schein and Ungar Active Learning for Multi-Class Logistic Regression, p. 6

Results - Accuracies Data Set random bagging variance log loss Art 0.809 0.792 0.862 0.867 ArtNoisy 0.565 0.557 0.579 0.579 ArtConf 0.837 0.830 0.842 0.840 Comp2a 0.821 0.794 0.805 0.821 Comp2b 0.799 0.793 0.807 0.796 LetterDB 0.609 0.593 0.644 0.646 NewsGroups 0.483 0.422 OptDigits 0.927 0.931 0.937 0.944 TIMIT 0.413 0.397 0.405 0.423 WebKB 0.830 0.803 CC QBB-MN QBB-AM entropy margin Art 0.821 0.848 0.861 0.832 0.867 ArtNoisy 0.567 0.577 0.571 0.536 0.572 ArtConf 0.845 0.843 0.835 0.723 0.749 Comp2a 0.788 0.814 0.818 0.826 0.818 Comp2b 0.796 0.804 0.808 0.805 0.800 LetterDB 0.625 0.599 0.637 0.548 0.633 NewsGroups 0.464 0.444 0.356 0.438 OptDigits 0.942 0.941 0.949 0.951 0.952 TIMIT 0.395 0.408 0.438 0.327 0.440 WebKB 0.844 0.860 0.855 0.860

Results - equivalent random samples required Data Set random bagging variance log loss Art 300 220 600 600 ArtNoisy 300 240 450 450 ArtConf 120 100 130 120 Comp2a 150 110 130 210 Comp2b 150 130 170 140 LetterDB 300 250 380 380 NewsGroups 300 230 OptDigits 300 310 350 430 TIMIT 300 240 290 310 WebKB 300 220 CC QBB-MN QBB-AM entropy margin Art 330 480 600 370 600 ArtNoisy 310 420 350 160 350 ArtConf 140 140 110 50 50 Comp2a 90 150 150 190 150 Comp2b 140 160 170 160 150 LetterDB 340 250 360 180 360 NewsGroups 290 280 170 260 OptDigits 400 400 600 600 600 TIMIT 230 290 420 90 380 WebKB 360 570 460 530

Analysis of Heuristic Method Performance Entropy Sampling fails on the noisier data sets Noise is defined as training set independent squared error. Estimated using very large training sets. Margin Sampling fails on data with hierarchical category structure NewsGroups data set contains 5 computer and 5 politics topics. ArtConf is an artificial data set constructed with this property. Difficult to explain failures of QBB-MN and Classifier Certainty. Schein and Ungar Active Learning for Multi-Class Logistic Regression, p. 9

NewsGroup Hierarchy of Topics comp.graphics comp.os.ms-windows.misc comp.sys.ibm.pc.hardware comp.sys.mac.hardware comp.windows.x talk.religion.misc alt.atheism soc.religion.christian sci.crypt sci.electronics sci.med sci.space rec.autos rec.motorcycles rec.sport.baseball rec.sport.hockey misc.forsale talk.politics.misc talk.politics.guns talk.politics.mideast Schein and Ungar Active Learning for Multi-Class Logistic Regression, p. 10

Evaluation Conclusions Loss function approaches are robust but relatively slow variance and log loss perform comparably, log loss possibly better Heuristic methods often fail to beat random selection Noise level and type predicts some failures Best heuristic: margin-based uncertainty sampling Very fast Failures are predictable opportunity to improve the method. Effect of bagging: mostly negative Schein and Ungar Active Learning for Multi-Class Logistic Regression, p. 11