590 likes | 763 Views
Data Annotation using Human Computation. HOANG Cong Duy Vu 07/10/2009. Outline. Introduction Data Annotation with GWAP Data Annotation with AMT Characterization of Multiple Dimensions Correlation Analysis Future Directions Conclusion. Introduction.
E N D
Data Annotation using Human Computation HOANG Cong Duy Vu 07/10/2009
Outline • Introduction • Data Annotation with GWAP • Data Annotation with AMT • Characterization of Multiple Dimensions • Correlation Analysis • Future Directions • Conclusion
Introduction • Data annotation refers to the task of adding specific information to raw data • In computational linguistics, various information such as morphology, POS, syntax, semantics, discourse … • In computer vision, some information such as image labels, regions, video descriptions …
Introduction • Annotated data is extremely important for computational & learning problems and training AI algorithms. • But, also very non-trivial tasks to obtain due to • Ambiguity in processing information • Money/time consuming, labor-intensive and error-prone process
Introduction • Motivated facts: • Gaming data [1] • Each day, more than 200 million hours spent playing games in the U.S • by age 21 of American, the average number of more than 10,000 hours playing games, equivalent to five years of working a full-time job 40 hours per week. • With explosion of web services, consider to take advantage of community popularity How can we leverage this for annotation?
Introduction • Human computation emerges as a viable synergy for data annotation. • Its main idea is to harness what humans are good at but machines are poor at. • Use the ability and speed of community solving some particular tasks • Computer programs can simultaneously be used for other purposes (e.g. educational entertainment)
Introduction • What is human computation? • is a CS technique in which a computational process performs its function by outsourcing certain steps to humans. (from Wikipedia) • Also called as human-based computation, human computing • More general term: “Crowdsourcing” • Typical frameworks: • Game With APurpose (GWAP) • Amazon Mechanical Turk (AMT)
Data Annotation with GWAP • GWAP - Game With APurpose • Pioneered by Luis von Ahn at CMU in his PhD thesis in 2005 • GWAPs are online games with special mechanism • Humans enjoy playing games provided by computers • Humans help computers do implicit annotation tasks integrated in such games
Data Annotation with GWAP • How GWAP works? Developers: build everything for both server and clients Bots: sometimes developers create bots which play a role as real players since number of players in GWAP are limited at any certain time Players: people who play game, pairwise interaction GUI: Graphical User Interface Data sources: data need to be annotated
Data Annotation with GWAP • Input-output mechanism of GWAP [1]
Data Annotation with GWAP • Example 1: Image labeling Computer Vision Game Captured from http://www.gwap.com
Data Annotation with GWAP • Example 2: Video description tagging Computer Vision Game Captured from http://www.gwap.com
Data Annotation with GWAP • Example 3: Online word games Natural Language Processing Game Captured from http://wordgame.stanford.edu/freeAssociation.html
Data Annotation with GWAP • Recently, there are various GWAP games developed in wide range of AI domain • Computer Vision: ESPGame, Peekaboom, TagATune, Google Image Labeler, … • Semantic web: OntoGame, Verbosity • Natural Language Processing: Word Online Games (Categorilla, Categodzilla, and Free Association), Phrase Detectives
Data Annotation with GWAP • Results obtained so far: • ESP Game Dataset [1] (CMU): 100,000 images with English labels from ESPGame (1.8Gb) • Online word games [2] (Stanford): 800,000 data instances for semantic processing • TagATune music data1 (CMU): 25863 clips, 5405 source mp3 with 188 unique tags 1 from http://musicmachinery.com/2009/04/01/magnatagatune-a-new-research-data-set-for-mir/
Data Annotation with GWAP • Advantages • Free • People always love playing games • Fun, attractive and sometimes addictive • Disadvantages • Highly visual design for game requires much efforts • Integration of annotation tasks into games is hard, equivalently to thinking up algorithms • Very hard to design GWAP games for complex processing tasks
Data Annotation with GWAP • Players feel fun and enjoy the games • More players play the game, more annotated data people obtain. • Question: if games not fun, whether we can still attract much people to join? • Viable answer: Amazon Mechanical Turk (AMT) ???
Data Annotation with AMT • AMT – Amazon Mechanical Turk • one of the tools of Amazon Web Services • a wide-range marketplace for work • utilize human intelligence to generate tasks which computers are unable to do but humans can do effectively • Located at https://www.mturk.com/mturk/
Data Annotation with AMT Captured from https://www.mturk.com/mturk/
Data Annotation with AMT • How AMT works? Requesters: will define tasks using GUI interactive interface using APIs provided by AMT, known as HITs (Human Intelligence Tasks) HIT:each HIT allows requesters to generate task instructions, required qualifications, duration, or reward by money. Broker: web services playing an intermediate role to supply, assist and unveil everything Workers:people who want to solve HIT tasks to earn money
Data Annotation with AMT • An example about HIT:
Data Annotation with AMT • Statistics related to AMT: • The AMT service was initially launched publicly in 2005 • According to report1 in March 2007, there were more than 100,000 workers in over one hundred countries Why it can attract a lot of participants? 1 from http://en.wikipedia.org/wiki/Amazon_Mechanical_Turk
Data Annotation with AMT • It seems that AMT has wider range due to its ease and simplicity • Results obtained so far: • Statistics from Amazon website • 69,452 HITs currently available • Some of them make annotated data public • Sorokin [3] (UIUC): 25000 annotated images with costs 800$ • Snow [4] (Stanford): linguistics annotation (WSD, temporal ordering, word similarity, …)
Data Annotation with AMT • Advantages • For users • HITs are not so hard to solve • Easily earn money but still remain for relaxing purpose • For developers • APIs provided by AMT help build HITs easily • Diverse demographics of users on Amazon website • hopefully obtain large-scale annotated data very quickly over time
Data Annotation with AMT • Disadvantages • Hard to control and maintain the tradeoff between data quantity and quality • Need effective strategies
Data Annotation with AMT • Example 1: Word Similarity NLP task Captured from http://nlpannotations.googlepages.com/wordsim_sample.html
Data Annotation with AMT • Example 2: Image Labeling CV task Captured from http://visionpc.cs.uiuc.edu/~largescale/protocols/4/index.html
Characterization of Multiple Dimensions • Overview of the dimensions considered: • - to create interfaces interacting with people that participate in annotation process • - should be designed to ensure the objective of obtaining large, clean and useful data • - discuss about the quality of annotation or accuracy of annotated outputs • - Need to figure out where and which data need to be annotated • - Some factors relating to participants in annotation process
Characterization of Multiple Dimensions • Setup Effort • UI design/Visual impact • graphical characteristics in user interface design • substantial factor that determines the efficiency of annotation process • GWAPs need much effort to focus mainly on GUI to make the game entertaining enough to motivate players • But, AMT needs not much effort to build HIT tasks but it should be designed funnily & easily & attractively • Scale - none/low/basic/average/distinctive/excellent
Characterization of Multiple Dimensions • Fun • very significant factor because simply players will not join if GWAP games have no fun • make fun = design algorithms • Some ways: • Timing (GWAP & AMT) • Scores, top scores, top players (GWAP) • Levels (GWAP & AMT) • Money (AMT) • Scale - none/low/fair/high/very high
Characterization of Multiple Dimensions • Payment • To make annotation process more motivated • For example, • In GWAP, player pairs are raised scores • In AMT, workers have monetary payment or bonus from requesters • Scale - none/score/monetary payment
Characterization of Multiple Dimensions • Cheating • Sometimes, unmotivated and lazy participants use some tricks when doing annotation tasks • Some ways to avoid: • filter players by using IP address, locations, training (GWAP) • Use qualification (AMT) • Scale - none/low/possible/high (-)
Characterization of Multiple Dimensions • Implementation Cost • Various costs • Designing annotation tasks • Creation of timing controllers • Game mechanism (online or offline) • Network infrastructure (client/server, peer-to-peer) • Record and statistics (user scores, player skill, qualification) • Building intelligent bots • Scale - none/low/fair/high/very high
Characterization of Multiple Dimensions • Exposure • Relating to high social impacts, letting people know is very important • GWAP must itself do this by popularizing on social webs, contributor sites and gaming sites • AMT under umbrella of web service of Amazon sites -> higher impact • Scale - none/low/fair/high
Characterization of Multiple Dimensions • Centralization • measures whether there is a single entity or owner that defines which tasks are being presented to workers • In case of GWAP games1, there are currently 5 games right now. For AMT, anyone can define their own tasks for their evaluation purpose • Scale - yes/no (-)
Characterization of Multiple Dimensions • Scale • metric of how many tasks the system will be able to accomplish • GWAP can produce extremely volumes of data, because the operating costs are low • AMT scales really well, but it costs money • For example • if we have many millions of tasks to accomplish, GWAP is a better approach • At 10,000 tasks, AMT will do well (and requires less effort to setup and less effort to approve submitted tasks) • Scale - none/low/fair/high/very high
Characterization of Multiple Dimensions • Annotation participation • Number of participants • utilize people at different skills to improve diversity of quality of annotation data • A small study6 indicated that demographics of AMT currently correlated with demographics of Internet users • Scale - none/low/fair/high/very high 6http://behind-the-enemy-lines.blogspot.com/2008/03/mechanical-turk-demographics.html
Characterization of Multiple Dimensions • Motivation • Exhibit the attractiveness of annotation systems • Some of reasons: • for money • for entertainment/fun • for killing free time • for challenge/self-competition, • Scale - none/low/fair/high
Characterization of Multiple Dimensions • Interaction • Different ways to create interaction of participants • Scale - none/multiple without interaction/multiple with pair-wise interaction/multiple with multiple interaction
Characterization of Multiple Dimensions • Qualification • limit required workers to ensure that only qualified workers can do these tasks • Scale - none/low/fair/high
Characterization of Multiple Dimensions • Data Selection • Size • choose which data resources will be annotated • Scale - none/small/fair/large/very large
Characterization of Multiple Dimensions • Coverage • Coverage would mean whether the data covers the expected real population and distribution of data • Scale - none/low/fair/high
Characterization of Multiple Dimensions • Quality of Annotation • Annotation accuracy • Use different strategies to control the quality of annotation • Use repetition which is the process that does not consider an output to be correct until a certain number of players have entered it • use the post-processing steps to re-evaluate the annotated data • Scale - none/low/fair/high/very high
Characterization of Multiple Dimensions • Inter-annotator agreement • The inter-agreement means for measuring agreement among data annotators • Scale - none/low/fair/high
Characterization of Multiple Dimensions • Quality control • filter the bad data out, integrate correction model to minimize errors during annotation process • For example: • In AMT, developers will approve all submitted HIT tasks, use the voting threshold to approve the answers • In GWAP, check all data contributed by players just after a fixed time • Scale - none/low/fair/high/very high
Characterization of Multiple Dimensions • Usability • annotated data should be proved to be useful and have a real world impact • Scale - none/low/fair/high
Characterization of Multiple Dimensions • Annotation Speed • measure how many labels per day/hour/minute people can obtain • Scale - none/slow/fair/fast
Characterization of Multiple Dimensions • Annotation Cost • measure total cost to be paid to get annotated data • Scale - none/cheap/fair/expensive
Correlation Analysis • To analyze correlation between dimensions • Collect info of available human computation systems so far • 28 popular systems with 4 types of human computation
Correlation Analysis • Human computation systems