130 likes | 141 Views
Explore how to improve crowdsourced data quality by leveraging redundancy and rejecting spammers. Learn cost-efficient methods to ensure accurate results and combat biases in worker classifications.
E N D
Spam? No, thanks! Panos Ipeirotis – New York University “Crowdsourcing Work” Meetup
Panos Ipeirotis - Introduction • New York University, Stern School of Business “A Computer Scientist in a Business School” http://behind-the-enemy-lines.blogspot.com/ Email: panos@nyu.edu
Example: Build an Adult Web Site Classifier • Need a large number of hand-labeled sites • Get people to look at sites and classify them as: G (general), PG(parental guidance), R (restricted), X (porn) • Cost/Speed Statistics • Undergrad intern: 200 websites/hr, cost: $15/hr • MTurk: 2500 websites/hr, cost: $12/hr
Bad news: Spammers! • Worker ATAMRO447HWJQ • labeled X (porn) sites as G (general audience)
Improve Data Quality through Repeated Labeling • Get multiple, redundant labels using multiple workers • Pick the correct label based on majority vote 11 workers 93% correct 1 worker 70% correct • Probability of correctness increases with numberof workers • Probability of correctness increases with quality of workers
But Majority Voting is Expensive • Single Vote Statistics • MTurk: 2500 websites/hr, cost: $12/hr • Undergrad: 200 websites/hr, cost: $15/hr • 11-vote Statistics • MTurk: 227 websites/hr, cost: $12/hr • Undergrad: 200 websites/hr, cost: $15/hr
Using redundant votes, we can infer worker quality • Look at our spammer friend ATAMRO447HWJQ together with other 9 workers • We can compute error rates for each worker • Error rates for ATAMRO447HWJQ • P[X → X]=9.847% P[X → G]=90.153% • P[G → X]=0.053% P[G → G]=99.947% Our “friend” ATAMRO447HWJQmainly marked sites as G.Obviously a spammer…
Rejecting spammers and Benefits Random answers error rate = 50% Average error rate for ATAMRO447HWJQ: 45.2% • P[X → X]=9.847% P[X → G]=90.153% • P[G → X]=0.053% P[G → G]=99.947% Action: REJECT and BLOCK Results: • Over time you block all spammers • Spammers learn to avoid your HITS • You can decrease redundancy, as quality of workers is higher
After rejecting spammers, quality goes up • Spam keeps quality down • Without spam, workers are of higher quality • Need less redundancy for same quality • Same quality of results for lower cost Without spam 5 workers 94% correct Without spam 1 worker 80% correct With spam 11 workers 93% correct With spam 1 worker 70% correct
Correcting biases • Classifying sites as G, PG, R, X • Sometimes workers are careful but biased • Classifies G → P and P → R • Average error rate for ATLJIK76YH1TF: 45.0% • Error Rates for Worker: ATLJIK76YH1TF • P[G → G]=20.0% P[G → P]=80.0% P[G → R]=0.0% P[G → X]=0.0% • P[P → G]=0.0% P[P → P]=0.0%P[P → R]=100.0% P[P → X]=0.0% • P[R → G]=0.0% P[R → P]=0.0% P[R → R]=100.0% P[R → X]=0.0% • P[X → G]=0.0% P[X → P]=0.0% P[X → R]=0.0% P[X → X]=100.0% Is ATLJIK76YH1TF a spammer?
Correcting biases • Error Rates for Worker: ATLJIK76YH1TF • P[G → G]=20.0% P[G → P]=80.0% P[G → R]=0.0% P[G → X]=0.0% • P[P → G]=0.0% P[P → P]=0.0%P[P → R]=100.0% P[P → X]=0.0% • P[R → G]=0.0% P[R → P]=0.0% P[R → R]=100.0% P[R → X]=0.0% • P[X → G]=0.0% P[X → P]=0.0% P[X → R]=0.0% P[X → X]=100.0% • For ATLJIK76YH1TF, we simply need to compute the “non-recoverable” error-rate (technical details omitted) • Non-recoverable error-rate for ATLJIK76YH1TF: 9% • Technical hint: The “condition number” of the matrix [how easy is to invert the matrix] is a good indicator of spamminess
Too much theory? Open source implementation available at: http://code.google.com/p/get-another-label/ • Input: • Labels from Mechanical Turk • Cost of incorrect labelings (e.g., XG costlier than GX) • Output: • Corrected labels • Worker error rates • Ranking of workers according to their quality • Alpha version, more improvements to come! • Suggestions and collaborations welcomed!
Thank you!Questions? “A Computer Scientist in a Business School” http://behind-the-enemy-lines.blogspot.com/ Email: panos@nyu.edu