Algorithmic Crowdsourcing Denny Zhou icrosoft Research Redmond Dec 9, NIPS13, Lake Tahoe
ain collaborators John Platt (icrosoft) Xi Chen (UC Berkeley) Chao Gao (Yale) Nihar Shah (UC Berkeley) Qiang Liu (UC Irvine) 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 2
achine learning + crowdsourcing Almost all machine learning applications need training labels By crowdsourcing we can obtain many labels in a short time at very low cost 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 3
12/9/2013 NIPS 2013 Workshop on Crowdsourcing 4
O O O O O O O O Repeated labeling: Orange (O) vs. andarin () 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 5
O O O O O O O O Repeated labeling: Orange (O) vs. andarin () 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 6
O O O O O O O O Repeated labeling: Orange (O) vs. andarin () 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 7
How to make assumptions? Intuitively, label quality depends on worker ability and item difficulty. But, How to measure worker ability? How to measure item difficulty? How to combine worker ability and item difficulty? How to infer worker ability and item difficulty? How to infer ground truth? 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 8
A single assumption for all questions Our assumption: measurement objectivity A 2 B Invariance: No matter which scale, A is twice larger than B 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 9
A single assumption for all questions Our assumption: measurement objectivity A 2 B How to formulate invariance for mental measuring? 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 10
A single assumption for all questions Our assumption: measurement objectivity A 2 B Assume a set ℵ of equally difficult questions: R A : number of right answers W A : number of wrong answers R A /W A R B /W B 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 11
A single assumption for all questions Our assumption: measurement objectivity A 2 B Assume another set ℵ of equally difficult questions: R A : number of right answers W A : number of wrong answers R A /W A R B /W B 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 12
A single assumption for all questions Our assumption: measurement objectivity A 2 B R A /W A R B /W B = R A/W A R B /W B 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 13
A single assumption for all questions Our assumption: measurement objectivity A 2 B For multiclass labeling, we count the number of misclassifications from one class to another 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 14
easurement objectivity assumption leads to a unique model! worker i, item j labels c, k data matrix X ij true label Y j P X ij = k Y j = c = 1 Z exp[σ i c, k + τ j c, k ] worker confusion matrix itemconfusion matrix 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 15
Estimation procedure First, estimate the worker and item confusion matrices by maximizing marginal likelihood Then, estimate the labels by using Bayes rule with the estimated confusion matrices Two steps can be seamlessly unified in E! 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 16
Expectation-aximization (E) Initialize label estimates via majority vote Iterate till converge: Given the estimates of labels, estimate worker and item confusion matrices Given the estimates of worker and item confusion matrices, estimate labels 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 17
Prevent overfitting Equivalently formulate our solution into minimax conditional entropy Prevent overfitting by a natural regularization 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 18
inimax conditional entropy True label distribution: Q(Y) Define two 4-dim tensors Empirical confusion tensor Expected confusion tensor 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 19
inimax conditional entropy Jointly estimate P and Q by subject to 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 20
inimax conditional entropy Exactly recover the previous model via the dual of maximum entropy P X ij = k Y j = c = 1 Z exp[σ i c, k + τ j c, k ] Nothing but Lagrangian multipliers! Estimate true labels by minimizing maximum entropy (= maximizing likelihood) 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 21
Regularization ove from exact matching to approximate matching: 0, i, k, c 0, j, k, c 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 22
Regularization ove from exact matching to approximate matching: Penalize large fluctuations: 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 23
Experimental results Bluebirds data (error rates) o Belief propagation: Variational Inference for Crowdsourcing (Liu et al. NIPS 2013) o Other methods in the literature cannot outperform Dawid & Skene (1979) o Some are even worse than majority voting o Data: The multidimensional wisdom of crowds (Welinder et al, NIPS 2010) 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 24
Experimental results Web search data (error rates) o Latent trait analysis code from: http://www.machinedlearnings.com o Data: Learning from the wisdom of crowds by minimax entropy (Zhou et al., NIPS 2012) 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 25
Crowdsourced ordinal labeling Ordinal labels: web search, product rating Our assumption: adjacency confusability 1 likely to confuse 2 3 4 unlikely to confuse 5 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 26
Ordinal minimax conditional entropy inimax conditional entropy with the ordinalbased worker and item constraints: for all Δ and taking values from {, <} 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 27
Constraints: indirect label comparison 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 28
Ordinal labeling model Obtain the same model except confusion matrices are subtly structured by Fewer parameter, less model complexity 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 29
Web search data Experimental results o Latent trait analysis code from: http://www.machinedlearnings.com o Entropy(): regularized minimax conditional entropy for multiclass labels o Entropy(O): regularized minimax conditional entropy for ordinal labels o Data: Learning from the wisdom of crowds by minimax entropy (Zhou et al., NIPS 2012) 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 30
Experimental results Price estimation data o Latent trait analysis code from: http://www.machinedlearnings.com o Entropy(): regularized minimax conditional entropy for multiclass labels o Entropy(O): regularized minimax conditional entropy for ordinal labels o Data: 7 price ranges from least expensive to most expensive (Liu et al. NIPS 13) 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 31
Why latent trait model doesn t work? When solving a given problem try to avoid solving a more general problem as an intermediate step. -Vladimir Vapnik ordinal label = score range 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 32
Error bounds: problem setup Observed Data: Unknown true labels: Unknown workers accuracies: Simplified model of Dawid and Skene (1979) and minimax conditional entropy 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 33
Dawid-Skene estimator (1979) Complete likelihood: arginal likelihood: Estimating workers accuracy by Estimating true labels by (plug-in) 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 34
Theorem (lower bound) For any estimator, there exists a least favorable (parameter space) 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 35
Theorem (upper bound) Under mild assumptions, Dawid-Skene estimator is optimal in Wald s sense 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 36
Budget-optimal crowdsourcing We propose the following formulation: Given n biased coins, we want to know which are biased to heads and which are biased to tails We have a budget of tossing m times in total Our goal is to maximize the accuracy of prediction based on observed tossing outcomes Optimistic Knowledge Gradient Policy for Optimal Budget Allocation in Crowdsourcing (Chen et al, ICL 2013) 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 37
Summary easurement Objectivity aximum Conditional Entropy aximum Likelihood inimum Conditional Entropy inimax Conditional Entropy Regularized inimax Conditional Entropy inimax optimal rates Budget-optimal crowdsourcing 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 38
Chao Gao and Dengyong Zhou. inimax Optimal Convergence Rates for Estimating Ground Truth from Crowdsourced Labels, no. SR-TR-2013-110, October 2013 Dengyong Zhou, Qiang Liu, John Platt and Chris eek. Aggregating Ordinal Labels from Crowds by inimax Conditional Entropy. Submitted. Xi Chen, Qihang Lin, and Dengyong Zhou. Optimistic Knowledge Gradient Policy for Optimal Budget Allocation in Crowdsourcing, in Proceedings of the 30th International Conference on achine Learning (ICL), 2013 Dengyong Zhou, John Platt, Sumit Basu, and Yi ao. Learning from the Wisdom of Crowds by inimax Entropy, in Advances in Neural Information Processing Systems (NIPS), December 2012 http://research.microsoft.com/en-us/projects/crowd/ 12/9/2013 NIPS 2013 Workshop on Crowdsourcing 39