1 / 36

Amazon Mechanical Turk New York City Meet Up

Amazon Mechanical Turk New York City Meet Up. WELCOME !. September 1, 2009. AGENDA. Welcoming Statements Introductions Dolores Labs – Video Directory Use Case Knewton – Adaptive Learning Use Case FreedomOSS – Enterprise Integration New York University – Worker Quality Solution

peony
Download Presentation

Amazon Mechanical Turk New York City Meet Up

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Amazon Mechanical TurkNew York City Meet Up WELCOME! September 1, 2009

  2. AGENDA • Welcoming Statements • Introductions • Dolores Labs – Video Directory Use Case • Knewton – Adaptive Learning Use Case • FreedomOSS – Enterprise Integration • New York University – Worker Quality Solution • Panel Questions and Answers

  3. Amazon Mechanical TurkRequester MeetupHowie LiuDolores Labs

  4. Dolores Labs Introduction • Founded in 2008 by Lukas Biewald, Senior Scientist, Powerset (MSFT); Yahoo! Search; Stanford AI Lab • Recognized enormous potential of AMT platform • Dolores Labs develops quality control technology (CrowdControl™) to make AMT more accessible and reliable

  5. Case Study A large video directory needed to select relevant thumbnails for 200k+ videos

  6. Size of project and turnover speed made MTurk the obvious solution Given the needs of the client, traditional outsourcing or hiring employees was not an option However, the client was concerned about quality of results Inherent variability of Mechanical Turk workers Unlike other Amazon marketplaces, workers are not a perfect commodity Significant variations in quality (accuracy) Need to ensure workers diligently completed work Intelligently aggregate multiple responses to find the single best thumbnail for a video Why Mechanical Turk?

  7. 3 Step Process for Optimizing the Task

  8. High Quality on Mechanical Turk: Best Practices • Statistical inference algorithms to dynamically assess quality • …Of each worker, of each result • …While the task is live • Smart allocation of worker resources • Blindly increasing redundancy is expensive • Aggregating all responses from workers with varying quality into a single “best” answer White paper with Stanford AI Lab about quality on AMT http://bit.ly/DLpaper

  9. Other Insights • Clear task instructions are crucial for good results • Garbage in, garbage out • Intuitive and efficient task interface makes the task faster (read—cheaper) and more fun! • Mechanical Turk is an unprecedented, hyper-efficient labor marketplace • Need to understand its dynamics through experience in order to harness its power

  10. Amazon Mechanical TurkRequester MeetupDahn Tamir, Knewton Inc.

  11. Knewton - Introduction • Live online GMAT and LSAT prep courses customized for each student, powered by the world’s most advanced adaptive learning engine. • Selected to the 2009 AlwaysOn Global 250 List. Named Category Winner in the Digital Education field.

  12. How we use MTurk • Calibration for computer-adaptive testing • Quality assurance • Focus Groups and Surveys • Database building • Marketing

  13. Why Mturk? • Speed • Cost • Appropriate worker population for each task • Quality

  14. What We Learned • Turkers are a diverse and capable population • Use qualification tests • Invest in building good HITs • Hesitate to reject work (but not cheaters) • Meet Turker Nation

  15. Thank you!---Questions?dahn@knewton.com978-KNEWTON

  16. Amazon Mechanical TurkRequester Meet-up(Max Yankelevich, Chief Architect– Freedom OSS)

  17. Freedom OSS- Introduction • Freedom OSS is a professional services organization with a focus on Practical Implementations using Cloud Computing & Open Source Technologies • International Firm • US Offices: PA,NYC, GA, KC ,NV, WA,NC • 4 Large Solution Centers in Eastern Europe (Russia, Belarus, Ukraine and Lithuania) • Practical Approach to Cloud Computing – most successfully completed Enterprise Cloud Computing projects in the Industry • Key Cloud Computing Partnerships • Top Amazon AWS Enterprise System Integrator • Top Eucalyptus Enterprise Partner • Key Open Source Partnerships • Top Red Hat Advanced Business Partner • #1 JBoss Advanced Business Partner in US • 2008 “JBoss SOA Innovation” Award Winner • 2007-08 “Practical SOA” Award Winner • 2008 “Red Hat Extensive Ecosystem” Award Winner • Leading technology partner for many Fortune 2000 companies • Freedom is a privately held corporation

  18. MTurk and Enterprise Integration • Most Legacy systems are not architected to include the human intervention • Providing a technological interface to maintain the workflow while inserting human intelligence and building self adjudicating business flows • Leveraging Mechanical Turk programmatically in your everyday systems • Freedom OSS has leveraged the power of Enterprise Service Bus (ESB) & Practical  Service Oriented Architecture (SOA) to make the process of on-boarding  and managing MTurk workers a rapid and cost effective process • Using its Professional Open Source ESB – freeESB , Freedom has developed many powerful Connectors for some of the most used Enterprise Systems and Technologies such as SAP, Mainframe, Siebel, Java/J2EE, Oracle , IBM MQ ,etc

  19. Master Data Cleansing & Validation Use Case • Keeping Master Customer Data File (Master Data Management) • Record de-duping • Contact information validation • Traditional MDM tactics • Expensive software • Big Bang approach • Invasive Code Changes to Legacy Applications • Clean and consistent customer data

  20. AWS Cloud Business Applications API Real-time access First Turk Task – Simple Data Checking Real-time Events Second Turk Task – Deeper Data Checking Master Data Third Turk Task – Data Edit/Trusted Task Business Process Orchestration & Workflow freeESB Routing , Transformation, Connectivity, QoS Business Rules Engine Legacy Applications Mainframe, Client-Server, Oracle, .NET, SAP, Siebel ,etc

  21. Outcome • Low operational costs • Non-invasive data integration • High-degree of accuracy due to multi-task distribution • Some Best Practices when integrating MTurk within an Enterprise • Deliver value incrementally • Inversion of Control

  22. Thank you!---Questions?

  23. Amazon Mechanical TurkRequester Meetup(Panos Ipeirotis – New York University)

  24. Panos Ipeirotis - Introduction • New York University, Stern School of Business “A Computer Scientist in a Business School” http://behind-the-enemy-lines.blogspot.com/ Email: panos@nyu.edu

  25. Example: Build an Adult Web Site Classifier • Need a large number of hand-labeled sites • Get people to look at sites and classify them as: G (general), PG(parental guidance), R (restricted), X (porn) • Cost/Speed Statistics • Undergrad intern: 200 websites/hr, cost: $15/hr • MTurk: 2500 websites/hr, cost: $12/hr

  26. Bad news: Spammers! • Worker ATAMRO447HWJQ • labeled X (porn) sites as G (general audience)

  27. Improve Data Quality through Repeated Labeling • Get multiple, redundant labels using multiple workers • Pick the correct label based on majority vote 11 workers 93% correct 1 worker 70% correct • Probability of correctness increases with numberof workers • Probability of correctness increases with quality of workers

  28. But Majority Voting is Expensive • Single Vote Statistics • MTurk: 2500 websites/hr, cost: $12/hr • Undergrad: 200 websites/hr, cost: $15/hr • 11-vote Statistics • MTurk: 227 websites/hr, cost: $12/hr • Undergrad: 200 websites/hr, cost: $15/hr

  29. Using redundant votes, we can infer worker quality • Look at our spammer friend ATAMRO447HWJQ together with other 9 workers • We can compute error rates for each worker • Error rates for ATAMRO447HWJQ • P[X → X]=9.847% P[X → G]=90.153% • P[G → X]=0.053% P[G → G]=99.947% Our “friend” ATAMRO447HWJQmainly marked sites as G.Obviously a spammer…

  30. Rejecting spammers and Benefits Random answers error rate = 50% Average error rate for ATAMRO447HWJQ: 45.2% • P[X → X]=9.847% P[X → G]=90.153% • P[G → X]=0.053% P[G → G]=99.947% Action: REJECT and BLOCK Results: • Over time you block all spammers • Spammers learn to avoid your HITS • You can decrease redundancy, as quality of workers is higher

  31. After rejecting spammers, quality goes up • Spam keeps quality down • Without spam, workers are of higher quality • Need less redundancy for same quality • Same quality of results for lower cost Without spam 5 workers 94% correct Without spam 1 worker 80% correct With spam 11 workers 93% correct With spam 1 worker 70% correct

  32. Correcting biases • Classifying sites as G, PG, R, X • Sometimes workers are careful but biased • Classifies G → P and P → R • Average error rate for ATLJIK76YH1TF: 45.0% Error Rates for Worker: ATLJIK76YH1TF P[G → G]=20.0% P[G → P]=80.0% P[G → R]=0.0% P[G → X]=0.0% P[P → G]=0.0% P[P → P]=0.0%P[P → R]=100.0% P[P → X]=0.0% P[R → G]=0.0% P[R → P]=0.0% P[R → R]=100.0% P[R → X]=0.0% P[X → G]=0.0% P[X → P]=0.0% P[X → R]=0.0% P[X → X]=100.0% Is ATLJIK76YH1TF a spammer?

  33. Correcting biases Error Rates for Worker: ATLJIK76YH1TF P[G → G]=20.0% P[G → P]=80.0% P[G → R]=0.0% P[G → X]=0.0% P[P → G]=0.0% P[P → P]=0.0%P[P → R]=100.0% P[P → X]=0.0% P[R → G]=0.0% P[R → P]=0.0% P[R → R]=100.0% P[R → X]=0.0% P[X → G]=0.0% P[X → P]=0.0% P[X → R]=0.0% P[X → X]=100.0% • For ATLJIK76YH1TF, we simply need to compute the “non-recoverable” error-rate (technical details omitted) • Non-recoverable error-rate for ATLJIK76YH1TF: 9%

  34. Too much theory? Open source implementation available at: http://code.google.com/p/get-another-label/ • Input: • Labels from Mechanical Turk • Cost of incorrect labelings (e.g., XG costlier than GX) • Output: • Corrected labels • Worker error rates • Ranking of workers according to their quality • Alpha version, more improvements to come! • Suggestions and collaborations welcomed!

  35. Thank you!Questions? “A Computer Scientist in a Business School” http://behind-the-enemy-lines.blogspot.com/ Email: panos@nyu.edu

More Related