CROWDSOURCING

CROWDSOURCING Massimo Poesio Part 2: Games with a Purpose

GAMES WITH A PURPOSE • Luis von Ahn pioneered a new approach to resource creation on the Web: GAMES WITH A PURPOSE, or GWAP, in which people, as a side effect of playing, perform tasks ‘computers are unable to perform’ (sic)

GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK • GWAP do not rely on altruism or financial incentives to entice people to perform certain actions • The key property of games is that PEOPLE WANT TO PLAY THEM

EXAMPLES OF GWAP • Games at www.gwap.com • ESP • Verbosity • TagATune • Other games • Peekaboom • Phetch

ESP • The first GWAP developed by von Ahn and their group (2003 / 2004) • The problem: obtain accurate description of images to be used • To train image search engines • To develop machine learning approaches to vision • The goal: label the majority of the images on the Web

ESP: the game

ESP: THE GAME • Two partners are picked at random from the large number of players online • They are not told who their partner is, and can’t communicate with them • They are both shown the same image • The goal: guess how their partner will describe the image, and type that description • Hence, the ESP game • If any of the strings typed by one player matches the string typed by the other player, they score points

THE TASK

SCORING BY MATCHING

THE CHALLENGE: SCORES • One of the motivating factors is to try to score as many points as possible • Hourly, daily, weekly, and monthly scores are shown

SCORES

THE CHALLENGE: TIMING • Partners try to agree on as many images as they can during 2 ½ minutes • The termometer on the side indicates how many images they have agreed on • If they agree on 15 images they score bonus points

TABOO WORDS • To ensure the production of a large number of specific labels, some words are declared TABOO and not allowed • Taboo words are obtained from the game itself: any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image

TABOO WORDS

PASSING

GOOD LABELS, COMPLETING AN IMAGE • A label is considered “good” when more than N players produce it (with N a parameter of the game) • An image is “done” when its list of taboo words is so extensive that most players pass on it

IMPLEMENTATION • Pre-recorded game play • Especially at the beginning, and at quiet times, there won’t always be players to pair with • In these cases a player is paired against a recorded ‘hand’ of a previous game with the same picture • Cheating • Players could cheat in a number of ways, including agreeing on labels / playing against themselves • A number of mechanisms are in place against those cases • Selecting images

SOME STATISTICS • In the 4 months between August 9th 2003 and December 10th 2003 • 13630 players • 1.2 million labels for 293,760 images • 80% of players played more than once • By 2008: • 200,000 players • 50 million labels

ANALYSIS • The numbers indicate that the game is fun to play • Exciting factors: • Playing with a partner • Playing against time

QUALITY OF THE LABELS • For IMAGE SEARCH: • choose 10 labels among those produced and look at which images are returned • Compare labels produced by players with labels produced by participants in an experiment • 15 participants, 20 images among the 1000 with more than 5 labels • 83% of game labels also produced by participants • Manual assessment of labels (‘would you use these labels to describe this image?’) • 15 participants, 20 images • 85% of words rated useful

GOOGLE IMAGE LABELLER

THE TASK

RESULTS

VERBOSITY • … or, the game approach to collecting commonsense knowledge • Motivation: slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700,000 facts)

THE GAME • Based on an existing game, TABOO: • Players have to guess a word • One of the players gives hints concerning the word • In Verbosity, you have two players, the DESCRIBER and the GUESSER, and a SECRET WORD

THE GAME

TEMPLATES IN VERBOSITY • As in Open Mind Commonsense, templates are used to ensure that the relations / properties of interest are collected • The Describer produces hints by filling in a template

GUESSING ATTRIBUTES

PRODUCING A DESCRIPTION

TEMPLATES • _ is a kind of _ • _ is used for _ • _ is typically near/in/on _ • _ is the opposite of _ / _ is related to _

EMULATION • As in ESP game, pre-recorded games are used when a player cannot be paired with another player • The asymmetry of the game causes a problem not encountered in ESP game • Describer: can just repeat behavior of previous describer • Guesser: not so easy

RESULTS • Only published results I’m aware of predate the actual release of the game so I don’t know about the QUANTITY • Quality: • Ask six raters whether 200 facts collected using Verbosity are ‘true’ • Around 85% success

PEEKABOOM • Objective: collect data about the presence of objects in images in order to train vision algorithms for object detection

THE GAME • Two players • They take turns at playing ‘Peek’ and ‘Boom’ • ‘Boom’ gets a picture with an associated word; ‘Peek’ has to guess what is the associated word • ‘Boom’ reveals parts of a picture to ‘Peek’ by clicking on it (each click reveals a circular area of 20 pixels of radius)

THE GAME: PEEK

THE GAME

PINGS

HINTS

IMPLEMENTATION • Images and their labels come from ESP • Cheating: • Player queue (wait until next ‘matching interval’ – one every 10 seconds – to start playing) • IP address checks (to make sure players are not paired with themselves) • Blocking bots: ‘seed images’ (previously annotated) and blacklist

EVALUATION: USER STATISTICS • Usage: • 1 month in 2005 • 14,153 players • 1,122,998 completed rounds • Average person played around 158 images (or 72 minutes)

EVALUATION: ACCURACY OF DATA • Accuracy of bounding boxes • Choose 50 images played by at least two pairs • Have four volunteers make bounding boxes • OVERLAP(A,B) = AREA(A∩B) / AREA(A∪B) • Average: 0.75 • Accuracy of pings • 50 images as above • Three subject decide if ping is ‘inside the object’ • Result: 100%

SOME GENERAL LESSONS • von Ahn & Dabbish (2008) discuss the general approach and some lessons they took from their work

THREE TEMPLATES • OUTPUT AGREEMENT GAMES • Generalization of ESP • INVERSION-PROBLEM GAMES • INPUT-AGREEMENT GAMES

OUTPUT AGREEMENT GAMES • Two strangers are chosen among all potential players. They cannot see each other or communicate with each other. • In each round, both are given the same input • Game instructions say that players should produce same output as their partners • Winning condition: they produce the same output, possibly after a few attempts E.g.: ESP GAME.

INVERSION PROBLEM GAMES • Two strangers are chosen among all potential players. They cannot see each other or communicate with each other. • In each round, one player is designated as the DESCRIBER whereas the other is designated as the GUESSER. The output from the describer should help the guesser guess the original input • WINNING CONDITION: The guesser correctly guesses the input originally assigned to the describer. E.g.: VERBOSITY. Based on ‘20 Questions’.

INPUT AGREEMENT GAMES • Two strangers are chosen among all potential players. They cannot see each other or communicate with each other. • In each round, both are given input that is known by the game (but not by the players) to be the same or different • Game instructions say that players should produce output describing their input so that they can decide whether input is same or different • Winning condition: playing partners correctly decide whether input is same or different. E.g.: TagATune.

INCREASE ENJOYMENT • Games designed so as to make the task enjoyable • GWAPs by von Ahn et al attempt to do this by giving players a CHALLENGE: • TIMED RESPONSE • SCORE KEEPING • SKILL LEVELS • HIGH SCORE LEVELS

OUTPUT ACCURACY • Mechanisms to ensure correctness and avoid collusions (e.g., always produce the same label) • Random matching (players don’t know each other’s identity) • Player testing (assess quality of particular player’s input by matching his output against already annotated data) • Repetition (output only considered correct if many players produced it) • Taboo

MISCELLANEOUS • Other useful ideas • Evaluation • Efficiency: THROUGHPUT (T) • ‘Enjoyability’: AVERAGE LIFETIME PLAY (ALP) • Combined measure: EXPECTED CONTRIBUTION = T * ALP

OTHER GAMES • On gwap.com • TagATune • Elsewhere: • FoldIt • Karaoke Callout • PheTch • Spectral Game

CROWDSOURCING

CROWDSOURCING

Presentation Transcript

Crowdsourcing

Crowdsourcing

Outsourcing and Crowdsourcing

Crowdsourcing

Crowdsourcing

Crowdsourcing

Crowdsourcing Companies

CrowdSourcing TechShop BE

Crowdsourcing

CROWDSOURCING

Crowdsourcing

Deco — Declarative Crowdsourcing

Crowdsourcing crop improvement

Crowdsourcing ontology engineering

Crowdsourcing Using SIS

Crowdsourcing

Crowdsourcing

Crowdsourcing

Crowdsourcing