Algorithmic Crowdsourcing. Denny Zhou Microsoft Research Redmond Dec 9, NIPS13, Lake Tahoe

Similar documents
Globally Optimal Crowdsourcing Quality Management

large-scale machine learning revisited Léon Bottou Microsoft Research (NYC)

Visipedia: collaborative harvesting and organization of visual knowledge

Statistical Machine Learning

STA 4273H: Statistical Machine Learning

Likelihood: Frequentist vs Bayesian Reasoning

A Simple Introduction to Support Vector Machines

Learning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

The equivalence of logistic regression and maximum entropy models

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

CSE 473: Artificial Intelligence Autumn 2010

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Data Mining - Evaluation of Classifiers

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

The Artificial Prediction Market

Monotonicity Hints. Abstract

Basics of Statistical Machine Learning

Making Sense of the Mayhem: Machine Learning and March Madness

Support Vector Machines with Clustering for Training with Very Large Datasets

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Big Data - Lecture 1 Optimization reminders

Xi Chen Curriculum Vitae

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

CHAPTER 2 Estimating Probabilities

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

Course: Model, Learning, and Inference: Lecture 5

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S

Social Media Mining. Data Mining Essentials

Classification algorithm in Data mining: An Overview

Predict Influencers in the Social Network

Inflation. Chapter Money Supply and Demand

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

The primary goal of this thesis was to understand how the spatial dependence of

Sparse modeling: some unifying theory and word-imaging

GLM, insurance pricing & big data: paying attention to convergence issues.

Part III: Machine Learning. CS 188: Artificial Intelligence. Machine Learning This Set of Slides. Parameter Estimation. Estimation: Smoothing

Penalized regression: Introduction

Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection

Model Validation Techniques

Supervised Learning (Big Data Analytics)

Load Balancing and Switch Scheduling

Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Multiclass Classification Class 06, 25 Feb 2008 Ryan Rifkin

Support Vector Machines Explained

Efficient Curve Fitting Techniques

Performance Metrics for Graph Mining Tasks

Semi-Supervised Support Vector Machines and Application to Spam Filtering

Crowdclustering with Sparse Pairwise Labels: A Matrix Completion Approach

Simple and efficient online algorithms for real world applications

MapReduce Approach to Collective Classification for Networks

Predicting borrowers chance of defaulting on credit loans

The Gravity Model: Derivation and Calibration

Some Essential Statistics The Lure of Statistics

Efficient and Robust Allocation Algorithms in Clouds under Memory Constraints

Chapter 4. Probability and Probability Distributions

Nonlinear Optimization: Algorithms 3: Interior-point methods

Tail-Dependence an Essential Factor for Correctly Measuring the Benefits of Diversification

BIG DATA PROBLEMS AND LARGE-SCALE OPTIMIZATION: A DISTRIBUTED ALGORITHM FOR MATRIX FACTORIZATION

Christfried Webers. Canberra February June 2015

Support Vector Machine. Tutorial. (and Statistical Learning Theory)

Large Margin DAGs for Multiclass Classification

Question: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit?

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

Bayesian Predictive Profiles with Applications to Retail Transaction Data

Collaborative Filtering. Radek Pelánek

Finding the M Most Probable Configurations Using Loopy Belief Propagation

Distributed Structured Prediction for Big Data

Optimal Income Taxation with Multidimensional Types

Linear Classification. Volker Tresp Summer 2015

Moral Hazard. Itay Goldstein. Wharton School, University of Pennsylvania

Linear Models for Classification

Linear Programming Notes VII Sensitivity Analysis

Maximum Margin Clustering

Big Data Analytics CSCI 4030

What Is Probability?

How To Check For Differences In The One Way Anova

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

False Discovery Rates

Subordinating to the Majority: Factoid Question Answering over CQA Sites

L4: Bayesian Decision Theory

Classification by Pairwise Coupling

BALANCE LEARNING TO RANK IN BIG DATA. Guanqun Cao, Iftikhar Ahmad, Honglei Zhang, Weiyi Xie, Moncef Gabbouj. Tampere University of Technology, Finland

Big Data for Security: Challenges, Opportunities, and Experiments

E-commerce Transaction Anomaly Classification

Learning with Local and Global Consistency

Foundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research

Economics 1011a: Intermediate Microeconomics

Survey on Multiclass Classification Methods

Learning with Local and Global Consistency

Machine Learning. CS 188: Artificial Intelligence Naïve Bayes. Example: Digit Recognition. Other Classification Tasks

Bayesian Factorization Machines

13.0 Central Limit Theorem

Several Views of Support Vector Machines

CS570 Data Mining Classification: Ensemble Methods

Pa8ern Recogni6on. and Machine Learning. Chapter 4: Linear Models for Classifica6on

Project Planning and Project Estimation Techniques. Naveen Aggarwal

Using Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. Philip Kostov and Seamus McErlean

Transcription:

Algorithmic Crowdsourcing Denny Zhou icrosoft Research Redmond Dec 9, NIPS13, Lake Tahoe

ain collaborators John Platt (icrosoft) Xi Chen (UC Berkeley) Chao Gao (Yale) Nihar Shah (UC Berkeley) Qiang Liu (UC Irvine) 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 2

achine learning + crowdsourcing Almost all machine learning applications need training labels By crowdsourcing we can obtain many labels in a short time at very low cost 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 3

12/9/2013 NIPS 2013 Workshop on Crowdsourcing 4

O O O O O O O O Repeated labeling: Orange (O) vs. andarin () 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 5

O O O O O O O O Repeated labeling: Orange (O) vs. andarin () 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 6

O O O O O O O O Repeated labeling: Orange (O) vs. andarin () 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 7

How to make assumptions? Intuitively, label quality depends on worker ability and item difficulty. But, How to measure worker ability? How to measure item difficulty? How to combine worker ability and item difficulty? How to infer worker ability and item difficulty? How to infer ground truth? 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 8

A single assumption for all questions Our assumption: measurement objectivity A 2 B Invariance: No matter which scale, A is twice larger than B 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 9

A single assumption for all questions Our assumption: measurement objectivity A 2 B How to formulate invariance for mental measuring? 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 10

A single assumption for all questions Our assumption: measurement objectivity A 2 B Assume a set ℵ of equally difficult questions: R A : number of right answers W A : number of wrong answers R A /W A R B /W B 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 11

A single assumption for all questions Our assumption: measurement objectivity A 2 B Assume another set ℵ of equally difficult questions: R A : number of right answers W A : number of wrong answers R A /W A R B /W B 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 12

A single assumption for all questions Our assumption: measurement objectivity A 2 B R A /W A R B /W B = R A/W A R B /W B 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 13

A single assumption for all questions Our assumption: measurement objectivity A 2 B For multiclass labeling, we count the number of misclassifications from one class to another 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 14

easurement objectivity assumption leads to a unique model! worker i, item j labels c, k data matrix X ij true label Y j P X ij = k Y j = c = 1 Z exp[σ i c, k + τ j c, k ] worker confusion matrix itemconfusion matrix 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 15

Estimation procedure First, estimate the worker and item confusion matrices by maximizing marginal likelihood Then, estimate the labels by using Bayes rule with the estimated confusion matrices Two steps can be seamlessly unified in E! 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 16

Expectation-aximization (E) Initialize label estimates via majority vote Iterate till converge: Given the estimates of labels, estimate worker and item confusion matrices Given the estimates of worker and item confusion matrices, estimate labels 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 17

Prevent overfitting Equivalently formulate our solution into minimax conditional entropy Prevent overfitting by a natural regularization 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 18

inimax conditional entropy True label distribution: Q(Y) Define two 4-dim tensors Empirical confusion tensor Expected confusion tensor 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 19

inimax conditional entropy Jointly estimate P and Q by subject to 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 20

inimax conditional entropy Exactly recover the previous model via the dual of maximum entropy P X ij = k Y j = c = 1 Z exp[σ i c, k + τ j c, k ] Nothing but Lagrangian multipliers! Estimate true labels by minimizing maximum entropy (= maximizing likelihood) 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 21

Regularization ove from exact matching to approximate matching: 0, i, k, c 0, j, k, c 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 22

Regularization ove from exact matching to approximate matching: Penalize large fluctuations: 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 23

Experimental results Bluebirds data (error rates) o Belief propagation: Variational Inference for Crowdsourcing (Liu et al. NIPS 2013) o Other methods in the literature cannot outperform Dawid & Skene (1979) o Some are even worse than majority voting o Data: The multidimensional wisdom of crowds (Welinder et al, NIPS 2010) 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 24

Experimental results Web search data (error rates) o Latent trait analysis code from: http://www.machinedlearnings.com o Data: Learning from the wisdom of crowds by minimax entropy (Zhou et al., NIPS 2012) 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 25

Crowdsourced ordinal labeling Ordinal labels: web search, product rating Our assumption: adjacency confusability 1 likely to confuse 2 3 4 unlikely to confuse 5 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 26

Ordinal minimax conditional entropy inimax conditional entropy with the ordinalbased worker and item constraints: for all Δ and taking values from {, <} 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 27

Constraints: indirect label comparison 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 28

Ordinal labeling model Obtain the same model except confusion matrices are subtly structured by Fewer parameter, less model complexity 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 29

Web search data Experimental results o Latent trait analysis code from: http://www.machinedlearnings.com o Entropy(): regularized minimax conditional entropy for multiclass labels o Entropy(O): regularized minimax conditional entropy for ordinal labels o Data: Learning from the wisdom of crowds by minimax entropy (Zhou et al., NIPS 2012) 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 30

Experimental results Price estimation data o Latent trait analysis code from: http://www.machinedlearnings.com o Entropy(): regularized minimax conditional entropy for multiclass labels o Entropy(O): regularized minimax conditional entropy for ordinal labels o Data: 7 price ranges from least expensive to most expensive (Liu et al. NIPS 13) 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 31

Why latent trait model doesn t work? When solving a given problem try to avoid solving a more general problem as an intermediate step. -Vladimir Vapnik ordinal label = score range 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 32

Error bounds: problem setup Observed Data: Unknown true labels: Unknown workers accuracies: Simplified model of Dawid and Skene (1979) and minimax conditional entropy 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 33

Dawid-Skene estimator (1979) Complete likelihood: arginal likelihood: Estimating workers accuracy by Estimating true labels by (plug-in) 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 34

Theorem (lower bound) For any estimator, there exists a least favorable (parameter space) 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 35

Theorem (upper bound) Under mild assumptions, Dawid-Skene estimator is optimal in Wald s sense 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 36

Budget-optimal crowdsourcing We propose the following formulation: Given n biased coins, we want to know which are biased to heads and which are biased to tails We have a budget of tossing m times in total Our goal is to maximize the accuracy of prediction based on observed tossing outcomes Optimistic Knowledge Gradient Policy for Optimal Budget Allocation in Crowdsourcing (Chen et al, ICL 2013) 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 37

Summary easurement Objectivity aximum Conditional Entropy aximum Likelihood inimum Conditional Entropy inimax Conditional Entropy Regularized inimax Conditional Entropy inimax optimal rates Budget-optimal crowdsourcing 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 38

Chao Gao and Dengyong Zhou. inimax Optimal Convergence Rates for Estimating Ground Truth from Crowdsourced Labels, no. SR-TR-2013-110, October 2013 Dengyong Zhou, Qiang Liu, John Platt and Chris eek. Aggregating Ordinal Labels from Crowds by inimax Conditional Entropy. Submitted. Xi Chen, Qihang Lin, and Dengyong Zhou. Optimistic Knowledge Gradient Policy for Optimal Budget Allocation in Crowdsourcing, in Proceedings of the 30th International Conference on achine Learning (ICL), 2013 Dengyong Zhou, John Platt, Sumit Basu, and Yi ao. Learning from the Wisdom of Crowds by inimax Entropy, in Advances in Neural Information Processing Systems (NIPS), December 2012 http://research.microsoft.com/en-us/projects/crowd/ 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 39