490 likes | 697 Views
Generative Models for Crowdsourced Data. Outline. What is Crowdsourcing ? Modeling the labeling process Example with real data Extensions Future Directions. What is Crowdsourcing ?. Human based computation. Outsourcing certain steps of a computation to humans.
E N D
Outline • What is Crowdsourcing? • Modeling the labeling process • Example with real data • Extensions • Future Directions
What is Crowdsourcing? • Human based computation. • Outsourcing certain steps of a computation to humans. • ``Artificial artificial intelligence.’’ • Data science: • Making an immediate decision. • Creating a labeled data set for learning.
Funny enough … • Not everybody agrees on the gender of a Twitter profile. • Difficult Instances • Worker Ability / Motivation • Worker Bias • AdversarialBehaviour
Disagreements • When some workers say “male” and some workers say “female”, what to do?
Majority Rules Heuristic • Assign label l to item x if a majority of workers agree. • Otherwise item x remains unlabeled.
Majority Rules Heuristic • Assign label l to item x if a majority of workers agree. • Otherwise item x remains unlabeled. • Ignores prior worker data.
Majority Rules Heuristic • Assign label l to item x if a majority of workers agree. • Otherwise item xremains unlabeled. • Ignores prior worker data. • Introduce bias in labeled data.
Train on all labels • For labeled data set workflow. • Add all item-label pairs to the data set. • Equivalent to cost vector of: • P (l | { lw}) = 1/nwS 1{l = lw}
Train on all labels • For labeled data set workflow. • Add all item-label pairs to the data set. • Equivalent to cost vector of: • P (l | { lw}) = 1/nwS1{l = lw} • Ignores prior worker data.
Train on all labels • For labeled data set workflow. • Add all item-label pairs to the data set. • Equivalent to cost vector of: • P (l | { lw}) = 1/nwS1{l = lw} • Ignores prior worker data. • Models the crowd, not the “ground truth.”
What is ground truth • Different theoretical approaches. • PAC learning with noisy labels. • Fully-adversarial active learning. • Bayesians have been very active. • “Easy” to posit a functional form and quickly develop inference algorithms. • Issue of model correctness is ultimately empirical.
Bayesian Literature • (2009) Whitehill et. al. GLAD framework. • (1979) Dawid and Skene. Maximum Likelihood Estimation of Observer Error-Rates Using the EM Algorithm. • (2010) Welinder et. al. The Multidimensional Wisdom of Crowds. • (2010) Raykar et. al. Learning from Crowds.
Bayesian Approach • Define ground truth via a generative model which describes how “ground truth” is related to the observed output of crowdsource workers. • Fit to observed data. • Extract posterior over ground truth. • Make decision or train classifier.
Example: Binary Classification • Each worker has a matrix. α = ( -1 α01) ( α10 -1 ) • Each item has a scalar difficulty β > 0. • P (lw = j | z = i) = e-βαij / (Σk e-βαik) • αij ~ N (μij, 1) ; μij ~ N (0, 1) • log β ~ N (ρ, 1) ; ρ ~ N (0, 1)
Other Problems • Multiclass classification: • Same as binary with larger confusion matrix. • Ordinal classification: (“Hot or not”) • Confusion matrix has special form. • O (L) parameters instead of O (L2). • Multilabel classification: • Reduce to multiclass on power set. • Assume low-rank confusion matrix.
EM • Initially all workers are assumed moderately accurate and without bias. • Implies initial estimate of ground truth distribution favors consensus. • Disagreeing with the majority is a likely error.
EM • Initially all workers are assumed moderately accurate. • Workers consistently in the minority have their confusion probabilities increase.
EM • Initially all workers are assumed moderately accurate. • Workers consistently in the minority have their confusion probabilities increase. • Workers with higher confusion probabilities contribute less to the distribution of ground truth.
“Different” workers are marginalized • Workers that are consistently in the minority will not contribute strongly to the posterior distribution over ground truth. • Even if they are actually more accurate. • Can correct when an accurate worker(s) is paired with some inaccurate workers. • Good for breaking ties. • Raykar et. al.
Online EM • Given a set of worker-label pairs for a single item: • (Inference) Using current α, find most likely β* and distribution q* over ground truth. • (Training) Do SGD update of α with respect to EM auxiliary function evaluated at β* and q*.
Online EM • Given a set of worker-label pairs for a single item: • (Inference) Using current α, find most likely β* and distribution q* over ground truth. • (Training) Do SGD update of α with respect to EM auxiliary function evaluated at β* and q*.
Things to do with q* • Take an immediate cost-sensitive decision • d* = argmindEz~q*[f (z, d)] • Train a (importance-weighted) classifier • cost vector cd = Ez~q*[f (z, d)] • e.g. 0/1 loss: cd = (1 - q*d) • e.g. binary 0/1 loss: |c1 – c0| = |1 – 2 q*1| • No need to decide what the true label is! • Raykar et. al.: why not jointly estimate classifier and worker confusion?
Raykar et. al. insight • Cost vector is constructed by estimating worker confusion matrices. • Subsequently, classifier is trained; it will sometimes disagree with workers. • Would be nice to use that disagreement to inform the worker confusion matrices. • Circular dependency suggests joint estimation.
Online Joint Estimation • Initially the classifier will output an uninformative prior and therefore will be trained to follow consensus of workers. • Eventually workers which disagree with the classifier will have their confusion probabilities increase. • Workers consistently in the minority can contribute strongly to the posterior if they tend to agree with the classifier.
Additional Resources • Software • http://code.google.com/p/nincompoop • Blog • http://machinedlearnings.com/