360 likes | 496 Views
Amazon Mechanical Turk New York City Meet Up. WELCOME !. September 1, 2009. AGENDA. Welcoming Statements Introductions Dolores Labs – Video Directory Use Case Knewton – Adaptive Learning Use Case FreedomOSS – Enterprise Integration New York University – Worker Quality Solution
E N D
Amazon Mechanical TurkNew York City Meet Up WELCOME! September 1, 2009
AGENDA • Welcoming Statements • Introductions • Dolores Labs – Video Directory Use Case • Knewton – Adaptive Learning Use Case • FreedomOSS – Enterprise Integration • New York University – Worker Quality Solution • Panel Questions and Answers
Dolores Labs Introduction • Founded in 2008 by Lukas Biewald, Senior Scientist, Powerset (MSFT); Yahoo! Search; Stanford AI Lab • Recognized enormous potential of AMT platform • Dolores Labs develops quality control technology (CrowdControl™) to make AMT more accessible and reliable
Case Study A large video directory needed to select relevant thumbnails for 200k+ videos
Size of project and turnover speed made MTurk the obvious solution Given the needs of the client, traditional outsourcing or hiring employees was not an option However, the client was concerned about quality of results Inherent variability of Mechanical Turk workers Unlike other Amazon marketplaces, workers are not a perfect commodity Significant variations in quality (accuracy) Need to ensure workers diligently completed work Intelligently aggregate multiple responses to find the single best thumbnail for a video Why Mechanical Turk?
High Quality on Mechanical Turk: Best Practices • Statistical inference algorithms to dynamically assess quality • …Of each worker, of each result • …While the task is live • Smart allocation of worker resources • Blindly increasing redundancy is expensive • Aggregating all responses from workers with varying quality into a single “best” answer White paper with Stanford AI Lab about quality on AMT http://bit.ly/DLpaper
Other Insights • Clear task instructions are crucial for good results • Garbage in, garbage out • Intuitive and efficient task interface makes the task faster (read—cheaper) and more fun! • Mechanical Turk is an unprecedented, hyper-efficient labor marketplace • Need to understand its dynamics through experience in order to harness its power
Amazon Mechanical TurkRequester MeetupDahn Tamir, Knewton Inc.
Knewton - Introduction • Live online GMAT and LSAT prep courses customized for each student, powered by the world’s most advanced adaptive learning engine. • Selected to the 2009 AlwaysOn Global 250 List. Named Category Winner in the Digital Education field.
How we use MTurk • Calibration for computer-adaptive testing • Quality assurance • Focus Groups and Surveys • Database building • Marketing
Why Mturk? • Speed • Cost • Appropriate worker population for each task • Quality
What We Learned • Turkers are a diverse and capable population • Use qualification tests • Invest in building good HITs • Hesitate to reject work (but not cheaters) • Meet Turker Nation
Amazon Mechanical TurkRequester Meet-up(Max Yankelevich, Chief Architect– Freedom OSS)
Freedom OSS- Introduction • Freedom OSS is a professional services organization with a focus on Practical Implementations using Cloud Computing & Open Source Technologies • International Firm • US Offices: PA,NYC, GA, KC ,NV, WA,NC • 4 Large Solution Centers in Eastern Europe (Russia, Belarus, Ukraine and Lithuania) • Practical Approach to Cloud Computing – most successfully completed Enterprise Cloud Computing projects in the Industry • Key Cloud Computing Partnerships • Top Amazon AWS Enterprise System Integrator • Top Eucalyptus Enterprise Partner • Key Open Source Partnerships • Top Red Hat Advanced Business Partner • #1 JBoss Advanced Business Partner in US • 2008 “JBoss SOA Innovation” Award Winner • 2007-08 “Practical SOA” Award Winner • 2008 “Red Hat Extensive Ecosystem” Award Winner • Leading technology partner for many Fortune 2000 companies • Freedom is a privately held corporation
MTurk and Enterprise Integration • Most Legacy systems are not architected to include the human intervention • Providing a technological interface to maintain the workflow while inserting human intelligence and building self adjudicating business flows • Leveraging Mechanical Turk programmatically in your everyday systems • Freedom OSS has leveraged the power of Enterprise Service Bus (ESB) & Practical Service Oriented Architecture (SOA) to make the process of on-boarding and managing MTurk workers a rapid and cost effective process • Using its Professional Open Source ESB – freeESB , Freedom has developed many powerful Connectors for some of the most used Enterprise Systems and Technologies such as SAP, Mainframe, Siebel, Java/J2EE, Oracle , IBM MQ ,etc
Master Data Cleansing & Validation Use Case • Keeping Master Customer Data File (Master Data Management) • Record de-duping • Contact information validation • Traditional MDM tactics • Expensive software • Big Bang approach • Invasive Code Changes to Legacy Applications • Clean and consistent customer data
AWS Cloud Business Applications API Real-time access First Turk Task – Simple Data Checking Real-time Events Second Turk Task – Deeper Data Checking Master Data Third Turk Task – Data Edit/Trusted Task Business Process Orchestration & Workflow freeESB Routing , Transformation, Connectivity, QoS Business Rules Engine Legacy Applications Mainframe, Client-Server, Oracle, .NET, SAP, Siebel ,etc
Outcome • Low operational costs • Non-invasive data integration • High-degree of accuracy due to multi-task distribution • Some Best Practices when integrating MTurk within an Enterprise • Deliver value incrementally • Inversion of Control
Amazon Mechanical TurkRequester Meetup(Panos Ipeirotis – New York University)
Panos Ipeirotis - Introduction • New York University, Stern School of Business “A Computer Scientist in a Business School” http://behind-the-enemy-lines.blogspot.com/ Email: panos@nyu.edu
Example: Build an Adult Web Site Classifier • Need a large number of hand-labeled sites • Get people to look at sites and classify them as: G (general), PG(parental guidance), R (restricted), X (porn) • Cost/Speed Statistics • Undergrad intern: 200 websites/hr, cost: $15/hr • MTurk: 2500 websites/hr, cost: $12/hr
Bad news: Spammers! • Worker ATAMRO447HWJQ • labeled X (porn) sites as G (general audience)
Improve Data Quality through Repeated Labeling • Get multiple, redundant labels using multiple workers • Pick the correct label based on majority vote 11 workers 93% correct 1 worker 70% correct • Probability of correctness increases with numberof workers • Probability of correctness increases with quality of workers
But Majority Voting is Expensive • Single Vote Statistics • MTurk: 2500 websites/hr, cost: $12/hr • Undergrad: 200 websites/hr, cost: $15/hr • 11-vote Statistics • MTurk: 227 websites/hr, cost: $12/hr • Undergrad: 200 websites/hr, cost: $15/hr
Using redundant votes, we can infer worker quality • Look at our spammer friend ATAMRO447HWJQ together with other 9 workers • We can compute error rates for each worker • Error rates for ATAMRO447HWJQ • P[X → X]=9.847% P[X → G]=90.153% • P[G → X]=0.053% P[G → G]=99.947% Our “friend” ATAMRO447HWJQmainly marked sites as G.Obviously a spammer…
Rejecting spammers and Benefits Random answers error rate = 50% Average error rate for ATAMRO447HWJQ: 45.2% • P[X → X]=9.847% P[X → G]=90.153% • P[G → X]=0.053% P[G → G]=99.947% Action: REJECT and BLOCK Results: • Over time you block all spammers • Spammers learn to avoid your HITS • You can decrease redundancy, as quality of workers is higher
After rejecting spammers, quality goes up • Spam keeps quality down • Without spam, workers are of higher quality • Need less redundancy for same quality • Same quality of results for lower cost Without spam 5 workers 94% correct Without spam 1 worker 80% correct With spam 11 workers 93% correct With spam 1 worker 70% correct
Correcting biases • Classifying sites as G, PG, R, X • Sometimes workers are careful but biased • Classifies G → P and P → R • Average error rate for ATLJIK76YH1TF: 45.0% Error Rates for Worker: ATLJIK76YH1TF P[G → G]=20.0% P[G → P]=80.0% P[G → R]=0.0% P[G → X]=0.0% P[P → G]=0.0% P[P → P]=0.0%P[P → R]=100.0% P[P → X]=0.0% P[R → G]=0.0% P[R → P]=0.0% P[R → R]=100.0% P[R → X]=0.0% P[X → G]=0.0% P[X → P]=0.0% P[X → R]=0.0% P[X → X]=100.0% Is ATLJIK76YH1TF a spammer?
Correcting biases Error Rates for Worker: ATLJIK76YH1TF P[G → G]=20.0% P[G → P]=80.0% P[G → R]=0.0% P[G → X]=0.0% P[P → G]=0.0% P[P → P]=0.0%P[P → R]=100.0% P[P → X]=0.0% P[R → G]=0.0% P[R → P]=0.0% P[R → R]=100.0% P[R → X]=0.0% P[X → G]=0.0% P[X → P]=0.0% P[X → R]=0.0% P[X → X]=100.0% • For ATLJIK76YH1TF, we simply need to compute the “non-recoverable” error-rate (technical details omitted) • Non-recoverable error-rate for ATLJIK76YH1TF: 9%
Too much theory? Open source implementation available at: http://code.google.com/p/get-another-label/ • Input: • Labels from Mechanical Turk • Cost of incorrect labelings (e.g., XG costlier than GX) • Output: • Corrected labels • Worker error rates • Ranking of workers according to their quality • Alpha version, more improvements to come! • Suggestions and collaborations welcomed!
Thank you!Questions? “A Computer Scientist in a Business School” http://behind-the-enemy-lines.blogspot.com/ Email: panos@nyu.edu