250 likes | 396 Views
How Crowdsourcable is Your Task?. Carsten Eickhoff Arjen P. de Vries. WSDM 2011 Workshop on Crowdsourcing for Search and Data Mining (CSDM 2011), Hong Kong, China, February 9–12, 2011. OOutline. The Crowdsourcing Boom Crowdsourcing, a Tale of Great Romance
E N D
How Crowdsourcable is Your Task? Carsten EickhoffArjen P. de Vries WSDM 2011 Workshop on Crowdsourcing for Search and Data Mining (CSDM 2011),Hong Kong, China, February 9–12, 2011.
OOutline • The Crowdsourcing Boom • Crowdsourcing, a Tale of Great Romance • A Journey to the Dark Side of Crowdsourcing • Is all Lost? • Conclusions
IThe Crowdsourcing Boom • Billions of judgements are being crowdsourced each year • CrowdFlower – Judgement volume doubled (2009-2010) • Significant numbers of research publications rely on crowdsourcing to create scientific resources • ...but is it actually reliable?
OOutline • The Crowdsourcing Boom • Crowdsourcing, a Tale of Great Romance • A Journey to the Dark Side of Crowdsourcing • Is all Lost? • Conclusions
ICrowdsourcing – A Tale of Great Romance • Summer 2008 • How do I quickly get a large number of judgements? • Task: Message grouping for discourse understanding • Crowdsourcing produced very reliable results
ICrowdsourcing – A Tale of Great Romance • Summer 2008 • How do I quickly get a large number of judgements? • Task: Message grouping for discourse understanding • Crowdsourcing produced very reliable results
ICrowdsourcing – A Tale of Great Romance • Fall 2008 • Crowdsourcing has become a standard data source • The excitement wears off
ICrowdsourcing – A Tale of Great Romance • A dark and cold day in late autumn 2009 • You need judgements for yet another experiment
ICrowdsourcing – A Tale of Great Romance • A dark and cold day in late autumn 2009 • You need judgements for yet another experiment • You get cheated!
ICrowdsourcing – A Tale of Great Romance • A dark and cold day in late autumn 2009 • You need judgements for yet another experiment • You get cheated! • Again and again...
OOutline • The Crowdsourcing Boom • Crowdsourcing, a Tale of Great Romance • A Journey to the Dark Side of Crowdsourcing • Is all Lost? • Conclusions
Task-based overview What is it that malicious workers do? Do we have remedies? OA Journey to the Dark Side
IA Journey to the Dark Side • Task: Closed class questions • Possible cheat: uniform answering (all yes/no) • Possible cheat: arbitrary answers • Remedy: Good gold standard data helps • Pitfall: Cheaters who think about the task at hand can cause a lot of trouble (e.g. relevance judgements)
IA Journey to the Dark Side • Task: Open class questions • Possible cheat (1): Copy and paste standard text • Possible cheat (2): Copy and paste domain-specific text • Remedy: (1) is easy to detect. (2) is problematic
IA Journey to the Dark Side • Task: Internal quality control • Possible cheat: artificially boost your own confidence • Possible cheat: even worse, do so in a network • Remedy: We need a better confidence measure than prior acceptance rate • Pitfall: Due to the large scale of HITs it is hard to find a reliable confidence measure
IA Journey to the Dark Side • Task: External quality control • Setup: redirect workers to your own site and let them do the HITs there • Possible cheat: make up confirmation token • Possible cheat: re-use genuine token • Possible cheat: claim that you did not get a token • Remedy: all of the above are easy to detect
OOutline • The Crowdsourcing Boom • Crowdsourcing, a Tale of Great Romance • A Journey to the Dark Side of Crowdsourcing • Is all Lost? • Conclusions
EIs all Lost? • Posterior detection and filtering of cheaters works reliably • But we waste resources (money, time, nerves..) • Can we discourage cheaters from doing our HIT in the first place?
EIs all Lost? • Which HIT types do cheaters like? • The Summer 2008 HIT hardly attracted any cheaters • The one in Autumn was swamped by them • The Summer task required a lot of creativity whereas the Autumn one was a straightforward relevance judgement
EIs all Lost? • Hypothesis: “If the HIT conveys the impression of requiring creativity, cheaters are less likely to take it.” • 2 HIT types • Suitability for children • Standard relevance judgements
FConclusion The share of malicious workers can be significantly reduced by making your task: Innovative Creative Non-repetitive Crowd Filtering can help to reduce the share of malicious workers at the cost of higher completion time. Previous acceptance rate is not a robust predictor of worker reliability
VQuestions, Remarks, Concerns? c.eickhoff@tudelft.nl