380 likes | 579 Views
Amazon Mechanical Turk Artificial Artificial Intelligence. Presenter: Chien-Ju Ho 2009.4.21. Outline. Introduction to Amazon Mechanical Turk Applications Demographics and statistics The value of using MTurk Repeated labeling A machine-learning perspective. The Turk.
E N D
Amazon Mechanical TurkArtificial Artificial Intelligence Presenter: Chien-Ju Ho 2009.4.21
Outline • Introduction to Amazon Mechanical Turk • Applications • Demographics and statistics • The value of using MTurk • Repeated labeling • A machine-learning perspective
The Turk Automaton Chess Player built in 80s.
Amazon Mechanical Turk • Human Intelligence Task (HIT) • Tasks hard for computers • Developer • Prepay the money • Publish HITs • Get results • Worker • Complete the HITs • Get paid
Sample Applications(1) User Survey
Sample Applications(2) Image Tagging
Sample Applications(3) Data Collection
Sample Applications(4) • Audio Transcription • Split the audio into 30sec pieces • Image Filtering • Filter porn or inappropriate image • Lots of applications
How much should I pay? • It depends on the task. • Some information: • Payment >= 0.01: 586 • Payment >= 0.05: 357 • Payment >= 0.10: 264 • Payment >= 0.50: 74 • Payment >= 1.00: 48 • Payment >= 5.00: 5
The Demographics of MTurk • Survey on 1000 Turkers • Conduct the survey twice (Dec. 2008 and Oct. 2008) • Consistent statistics • Blog Post: • A Computer Scientist in a Business School • Where are Turkers from? • United States 76.25% • India 8.03% • United Kingdom 3.34% • Canada 2.34%
Other Statistics Age Gender Degree Income/year
Comparing with Internet Demographics • Use the data from ComScore • In summary, Tukers are • younger • Portion of 21-35 years old: 51% vs. 22% in internet • mainly female • 70% female vs. 50 % female • having lower income • 65% turkers with income < 60k/year vs. 45% in internet • having smaller family • 55% turkers have no children vs. 40% in internet
Get Another Label? Improving Data Quality and Data Mining Using Multiple, Noisy Labelers Victor S. Sheng, Foster Provost, and Panagiotis G. Ipeirotis New York University KDD 2008
Repeated Labeling • Imperfect labeling • Amazon mechanical Turk • Games with a purpose • Repeated labeling • Improve the supervised induction • Increase the single-label accuracy • Decrease the cost for acquiring training data
Repeated Labeling • Increase single-label accuracy • Decrease cost for training data • Labeling is cheap (using MTurk or GWAP) • Obtaining data sample might be expensive (taking new pictures, feature extraction)
Issues discussed • How repeated labeling influence • quality of the label • accuracy of the model • cost of acquiring data and the label • Selections of data points to label repeatedly
Label Quality • Uniform labeler quality • All labelers exhibit the same quality p • p is the probability labeler label correctly • For 2N+1 labelers, the label quality q is • Label quality for different settings of p
Label Quality • Different labeler quality • Repeated labeling is helpful in some cases • An example: • three labelers with quality p, p+d, p-d • Repeated labeling is preferable to single labeler with quality p+d when settings is in the blue region • No detailed analysis in the paper
Label Quality • Majority voting (MV) • Simple and intuitive • Drawback of information lost • Uncertainty-preserved labeling • Multiplied Example procedure (ME) • Using frequency as the weight of the label
Repeated Labeling Procedute • Round-robin strategy • Label the example with the fewest labels • Repeated label the examples in a fixed order
Model Accuracy and Cost • The definition of the cost • CU: the cost for the unlabeled portion • CL: the cost for labeling • Single labeling (SL): • Acquire a new training example • cost CU+CL • Repeated labeling with majority vote (MV) • Get another label for existing example • cost CL
Model Accuracy and Cost • Round-robin strategy, CU << CL • CU << CL means CU+CL ~ CL • The cost is similar in SL and MV • Which strategy (SL or MV) is better? • It depends
Model Accuracy and Cost • Round-robin strategy, general cost • CD: the cost for data acquisition • Tr: number of examples • NL: number of labels • Experiment settings • NL = kTr: each example is labeled k times • ρ= CU / CL
Model Accuracy and Cost • Experiment Result: (p=0.6, ρ=3, k=5) • 12 dataset-experiments in the paper
Selected Repeated-Labeling • Select data with highest uncertainty. • Which data point should be selected to label repeatedly? • {+,-,+} • {+,+,+,+,+} • Three approaches • Entropy • Label uncertainty • Model uncertainty
Selected Repeated-Labeling • Entropy • Find the most impure one to repeat labeling • ENTROPY IS NOT A GOOD MEASURE!!! • Noisy labeler is considered. • E.g. 6000 positive and 4000 negative labels with p = 0.6
Selected Repeated Labeling • Label uncertainty (LU) • Lpos: number of positive label observed • Lneg: number of negative label observed • Posterior label probability • p(y) follows the beta function B(Lpos+1, Lneg+1) • The uncertainty can be estimated by the CDF of the beta distribution
Selected Repeated Labeling • Model Uncertainty (MU) • The uncertainty for model to predict the label • For a set of learning models Hi • Label and Model Uncertainty (LMU) • Combining both the label and model uncertainty
Selected Repeated Labeling • Results, for p = 0.6 • Notations • GRR: General Round-Robin strategy • MU: Model Uncertainty • LU: Label Uncertainty • LMU: Label and Model Uncertainty
Conclusion • Under a wide range of conditions: • Repeated labeling can improve the quality of both labels and models. • Selected labeling can further improve the quality. • Repeated labeling can give advantages in the cost of acquiring examples and labels. • Assumptions • Fixed labeler quality and cost • Experiments are conducted in only one of the learning algorithm.
Conclusion (2) Amazon Mechanical Turk provides a platform for collecting non-expert opinions easily. The collected data would be useful for proper data integration algorithms, such as repeated labeling.