Multi-label Classification

Similar documents
Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Chapter 6. The stacking ensemble approach

Machine Learning Final Project Spam Filtering

Multi-Class and Structured Classification

Statistical Machine Learning

Data Mining Practical Machine Learning Tools and Techniques

Making Sense of the Mayhem: Machine Learning and March Madness

Lecture 2: The SVM classifier

Data Mining - Evaluation of Classifiers

LCs for Binary Classification

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University

Classification Problems

Supervised Learning (Big Data Analytics)

Feature Subset Selection in Spam Detection

Anti-Spam Filter Based on Naïve Bayes, SVM, and KNN model

Social Media Mining. Data Mining Essentials

IDENTIFIC ATION OF SOFTWARE EROSION USING LOGISTIC REGRESSION

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

MACHINE LEARNING IN HIGH ENERGY PHYSICS

Machine Learning in Spam Filtering

Big Data Analytics CSCI 4030

A Content based Spam Filtering Using Optical Back Propagation Technique

Content-Based Recommendation

Projektgruppe. Categorization of text documents via classification

Challenges of Cloud Scale Natural Language Processing

Predict Influencers in the Social Network

E-commerce Transaction Anomaly Classification

Microsoft Azure Machine learning Algorithms

T : Classification as Spam or Ham using Naive Bayes Classifier. Santosh Tirunagari :

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Data Mining. Nonlinear Classification

Chapter ML:XI (continued)

Machine learning for algo trading

Data Mining Techniques for Prognosis in Pancreatic Cancer

Predicting Flight Delays

CSE 473: Artificial Intelligence Autumn 2010

Support Vector Machines with Clustering for Training with Very Large Datasets

L25: Ensemble learning

Data Mining as a tool to Predict the Churn Behaviour among Indian bank customers

Consistent Binary Classification with Generalized Performance Metrics

Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors

Mining a Corpus of Job Ads

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

A new Approach for Intrusion Detection in Computer Networks Using Data Mining Technique

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing Classifier

Machine Learning CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification

SVM Ensemble Model for Investment Prediction

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

MAXIMIZING RETURN ON DIRECT MARKETING CAMPAIGNS

Comparison of machine learning methods for intelligent tutoring systems

Survey on Multiclass Classification Methods

An Introduction to Data Mining

A fast multi-class SVM learning method for huge databases

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

Football Match Winner Prediction

Machine Learning. CUNY Graduate Center, Spring Professor Liang Huang.

Introduction to Data Mining

An Approach to Detect Spam s by Using Majority Voting

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S

A Two-Pass Statistical Approach for Automatic Personalized Spam Filtering

TRTML - A Tripleset Recommendation Tool based on Supervised Learning Algorithms

Programming Exercise 3: Multi-class Classification and Neural Networks

MS1b Statistical Data Mining

Statistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees

Machine Learning using MapReduce

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) ( ) Roman Kern. KTI, TU Graz

Classification algorithm in Data mining: An Overview

Classification using Logistic Regression

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts

ACQUA : Application-level Quality of Experience at Internet Access

1. Classification problems

Machine Learning. Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos)

Azure Machine Learning, SQL Data Mining and R

Neural Networks for Sentiment Detection in Financial Text

Linear Threshold Units

Cross-Validation. Synonyms Rotation estimation

Towards better accuracy for Spam predictions

Analysis Tools and Libraries for BigData

Question 2 Naïve Bayes (16 points)

The Impact of Big Data on Classic Machine Learning Algorithms. Thomas Jensen, Senior Business Expedia

Evaluation and Comparison of Data Mining Techniques Over Bank Direct Marketing

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Introduction to Machine Learning Using Python. Vikram Kamath

DATA MINING TECHNIQUES AND APPLICATIONS

Data Mining Part 5. Prediction

CSCI567 Machine Learning (Fall 2014)

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

Data Mining in CRM & Direct Marketing. Jun Du The University of Western Ontario jdu43@uwo.ca

ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

Supervised Feature Selection & Unsupervised Dimensionality Reduction

Support Vector Machine (SVM)

How To Cluster

Predicting outcome of soccer matches using machine learning

CSC 411: Lecture 07: Multiclass Classification

Supervised Learning Evaluation (via Sentiment Analysis)!

Transcription:

Multi-label Classification Prof. Dr. Dr. Lars Shmidt Thieme Tutor: Leandro Marinho Information System and Machine learning lab

Definitions Single-label classification set of examples associated with a single label l from a set of disoint labels L, L >1; if L =2, then it is called a binary classification problem, While if L >2, then it is called a multi-class classification problem In multi-label classification, the examples are associated with a set of labels Y L For a input x the corresponding output is a vector Y = [ y1,..., y ] T L

Multi-label Classification Multi-class classification : direct approaches Nearest Neighbor Generative approach & Naïve Bayes Linear classification: geometry Perceptron K-class (polychotomous) logistic regression K-class SVM Multi-class classification through binary classification One-vs-All All-vs-all Others Calibration

Related Tasks Ranking Order the top most related labels with the new instance Hierarchical classification When the labels in a data set belong to a hierarchical structure Hierarchical multi-label classification When each example is labelled with more than one node of the hierarchical structure Multiple-label problems only one class is the true class

Multi-Label Classification Methods problem transformation methods - methods that transform the multilabel classification problem either into one or more single-label classification or regression problems algorithm adaptation methods - methods that extend specific learning algorithms in order to handle multi-label data directly

Problem Transformation Methods Example: Original Data Ex.: Sports Religion Science Politics 1 X X 2 X X 4 X X

Problem Transformation Methods Transformed data set using PT1 Ex.: Sports Religion Science Politics 1 X 2 X 4 X PT1 - Randomly selects one of the multiple labels of each multi-label instance and discards the rest Disadvantage: Lost of information

Problem Transformation Methods Transformed data set using PT2 Ex.: Sports Religion Science Politics PT2 Discards every multi-label instance from the multi-label data set Disadvantage: Even more lost of information Transformed data set using PT3 Ex.: Sports (Sports ^ Politics) (Science ^ Politics) (Science ^ Religion) 1 X 2 X 4 X PT3 Considers each different set of labels that exist in the multi-label data set as a single label. It so learns one single-label classifier where P(L) is the power set of L H : X P( L) Disadvantage: Can lead to lots of classes with small number of examples

Problem Transformation Methods Ex.: Sports Sports 1 X 2 X 4 X (a) Ex.: Religion Religion 1 X 2 X 4 X Ex.: Politics Politics 1 X 2 X 4 X (b) Ex.: Science Science 1 X 2 X 4 X (c) PT4 Learns L binary classifiers Hl : X { l, l} one for each different label l in L. For the classification of a new instance x this method outputs as a set of labels the union of the labels that are output by the L classifiers: H ( x): {}: l H ( x) = l PT 4 l L l (d)

How much multi-label is a data set? Not all datasets are equally multi-label. In some applications the number of labels of each example is small compared to L, while in others it is large. This could be a parameter that influences the performance of the different multi-label methods. Let D be a multi-label evaluation data set, consisting of D multi-label examples( x, Y ), i = 1.. D, Y L i i i Label Cardinality of D is the average number of the examples in D: D 1 LC( D) = Yi D i = 1 Label denisity of D is the average number of the examples in D divided by L : D 1 Yi LD( D) = D L i= 1

Evaluation Metrics Let D be a multi-label evaluation data set, consisting of D multilabel examples ( x, Y ), i = 1.. D, Y L i i i Let H be a multi-label classifier and Zi = H( xi) be the set of labels predicted by H for example x i Ham min Loss( H, D) D 1 Yi Zi = D L i= 1 Re call( H, D) D 1 Yi Zi = D Y i= 1 i proportion of examples which were classified as label x, among all examples which truly have label x D 1 Yi Zi Pr ecision( H, D) = D Z i= 1 i proportion of the examples which truly have class x among all those which were classified as class x

Precision-Recall Correct answers Questions answered Unclassified objects Objects correctly classified TP Misclassified objects FP Recall= = fraction of all objects correctly classified Precision= = fraction of all questions correctly answered

Data representation Labels as dummy variables Ex: X={att1,att2,att3}, L={class1, class2, class3, class4} 0.5,0.3,0.2,1,0,0,1 0.3,0.2, 0.5,0,0,1,1 0.5,0.1,0.2,1,1,1,0

Linear classification Each class has a parameter vector (w k,b k ) x is assigned to class k iff Note that we can break the symmetry and choose (w 1,b 1 )=0 For simplicity set b k =0 (add a dimension and include it in w k ) So learning goal given separable data: choose w k s.t.

Three discriminative algorithms