Harnessing manpower for creating semantics (doctoral dissertation)

Institute of Informatics and Software Engineering, • FacultyofInformaticsand InformationTechnologies, • Slovak UniversityofTechnology in Bratislava Harnessing manpower for creating semantics(doctoral dissertation) JakubŠimko jsimko@fiit.stuba.sk Supervised by: prof. Mária Bieliková July 4th, 2013

Games with a purpose (GWAP) for semantics acquisition

Games with a purpose • Cheap (once they are created) • Difficult to create [Quinn & Bederson. Human computation: a survey and taxonomy of a growing field. CHI’11, 2011]

ESP Game: image metadataacquisition What is in the image? Player 1: Player 2: water sky bridge Mostar night river bridge Bosnia The players must blindly match Banned words: blue, towers [Von Ahn & Dabbish: Designing games with a purpose. Commun. ACM, 2008.]

Motivation • Open issues in semantics acquisition • Modelling of specific domains • Personal multimedia metadata acquisition • Metadata upkeep • Games with a purpose (GWAPs): design issues • In general: no design methodology (young problem area) • Cold start problems • Quality management, effectiveness of work allocation

Thesis Goals • Create new, GWAP-based approaches to semantics creation, particularly for specific domains • Bring in generally applicable improvements to GWAP design, focusing on selected problems

Work overview State of the art: • GWAP taxonomy and design space GWAPs we created: • LittleSearch Game: term network acquisition • PexAce: (personal) imagery tag acquisition • CityLights: validation of music metadata General GWAP design improvements: • Helperartifacts: coldstartproblemreduction • Playercompetences: improving GWAP output quality

Our taxonomy of GWAPs

GWAP design • A relatively new area (<10 years) • No holistic design methodology exists • GWAPs are created ad-hoc • Few works aimed at particular design issues • [Ahn, 2008] Player agreement schemes • [Chiou, 2011] Suggested considering player skills in GWAPs • Our contribution: GWAP design dimensions • following the idea of design lenses [Schell, 2008] [Von Ahn & Dabbish: Designing games with a purpose. Commun. ACM, 2008.] [Chiou & Hsu. Capability-aligned matching: improving quality of games with a purpose. AAMAS ’11] [J. Schell. The art of game design a book of lenses. Elsevier/Morgan Kaufmann, 2008.]

Our GWAP design dimensions

Existing GWAPs in our design space

PexAceGoal: acquire (personal) image tagsNew artifact validation modelQuality management through player modelling • International Journal on Human-Computer Studies [In press] • Šimko, J., Tvarožek, M., Bieliková, M. Human Computation: Single-player Annotation Game for Image Metadata. • SMAP 2011 (IEEE CS Press) • Šimko, J., Bieliková, M.: Games with a Purpose: User Generated Valid Metadata for Personal Archives. • I-Semantics 2012 (ACM) • - Šimko, Jakub - Bieliková, Mária: Personal Image Tagging: a Game-based Approach. I-Semantics, 2012

PexAce: acquisition of image metadata • Cards– image pair seeking memorygame • Players create image annotationsto aid their memory

PexAce: general domain deployment • (Standard) Corel 5K dataset: photos + tags + our tags • 107 players, 814 games, 2 792 images • 22 176 annotations, 5 723 tags • Golden standard comparison: 73% precision • Aposteriori evaluation: 94% precision • Automated methods ~70% * • Limited set of tags *[Duygulu et. al. Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary 2002. Springer-Verlag.]

PexAce for personal images • Personal image metadata – virtually impossible to get • Personal images instead of general images in PexAce • Players like that more • They provide specific annotations (metadata) • Experiments: 2 x 2-player groups, 50 images each • Correctness: 94% • 44% specific tags Other (11%) Places (15%) Events (21%) Persons (53%)

„Benevolent“ artifactvalidation model Annotations decomposed to votes: P - players, T- terms, I - Images Original mutual player supervision Less strict heuristics

Artifact validation and cold start problem:A general GWAP issue „Howcan a resultofahumanintelligencetaskbeautomaticallyevaluated?“ GWAPs use: • Approximative or exactautomatedevaluation (case dependent) • Mutualplayersupervision Threat to multiplayer validation schemes: COLD START ‘’The requirement is to have multiple players online at the same time, sometimes with a requirement that they cannot communicate.” Keep the games single-player

Helper artifacts: a new artifact validation principle Helperartifacts: • Decouplescoringfromtasksolving, insteadmotivateplayers to solve tasks to help themselves in the progress of the game • E.g. in PexAce, a player may win the game well enough even without the annotations • Potential of general applicability (to any existing game)

Quality management in GWAPs:Considering differences inplayer competences • Quantify player skills – player model (e.g. player’s task-solving expertise for each sub-domain) • Apply model in • “post-processing” - Solution filtering (e.g. vote weighting) • “pre-processing” - Task assignment (e.g. match task subdomain to expertise areas) • Speed up the process or/and retrieve higher quality results

Measuring player competences: PexAce data • Usefulness (delivery of correct artifacts) • Consensusratio (agreement with other players) • Correlation: 0.496

Little Search GameGoal: acquire lightweight term networkstatistically unsupported, yet valid term relationshipsspecific domain use Int. J. on SemanticWeb and InformationSystems -Šimko, Jakub - Tvarožek, Michal - Bieliková, Mária: SemanticsDiscoveryviaHumanComputationGames. In: InternationalJournal on Semantic Web and InformationSystems (2011) Hypertext 2011 (ACM) -Šimko, Jakub - Tvarožek, Michal - Bieliková, Mária: LittleSearch Game: Term NetworkAcquisitionvia a HumanComputation Game. Hypertext, 2011

Little Search Game (negativesearch game) • Creation of lightweight term network • Player’s task: reduce number of results with negative search Searchquery: „Star –movie –war –death“

LSG Term network evaluation • Aposteriori evaluation: 91% correctness • A potential to add term relationships to existing bases • 59% of LSG rels. do not exist in ConceptNet* corpus • …including demanded non-taxonomic relationships *[Liu & Singh. ConceptNet— A Practical Commonsense Reasoning Tool-Kit. BT Technology Journal 2004]

Hidden term relationships – hard for automated discovery (40% of LSG term network)

LSG modification: TermBlaster(Harvesting relationships for software design domain) • Specific domain • No text typing • 71 % correct, 21% „hidden relationships“

CityLightsGoal: validate existing music tagsquality management through confidence expression I-Semantics 2012 (ACM) -Dulačka, Peter - Šimko, Jakub - Bieliková, Mária: ValidationofMusicMetadatavia Game with a Purpose. I-Semantics 2012

CityLights: music tag validation(a concept of validation question) Tag support value: + increases + player selects the group • decreases • - p. doesn’t select the group • - player rules out the tag Validation question: “Which of these tag groups characterizes the music track you hear?” • Rockabilly, USA, 60ties • Seasonal, rich oldies, xmas • February 08 love, oldies, 60 musik • Wrong and correct tags bubble out • Possitive and negative thresholds

CityLights: experiments • LastFMdataset • 875 games, 4933 questions, 1492 tags • Feedback actions per tag: • 17.75 implicit • 5.29 explicit • Optimized parameter configuration • 68% correctness

Betting mechanism: Measuring competence through confidence • Betting mechanism within a GWAP • Through bet height, the player expresses his confidence in his task solution • CityLights case: bet height aligns with impact on tag validity value • Helps with cold start problem associated with user modeling

Main contributions • Definition GWAP design space • GWAPs for semantics acquisition • For specific domains (personal images, SW engineering) • For otherwise hardly discoverable semantics (hidden rels.) • New GWAP design principles • Helper artifacts for cold start reduction • Metrics for long term player competence modeling • Betting mechanism for short term player competence acq. • Metadata validation GWAP concept

Summary • GWAP taxonomy and design dimensions • [survey paper prepared] • Little Search Game – Lightweight term network acquisition • Hidden term relationships • Hypertext 2011, ACM • Int. J. of Semantic Web and Information Systems, 2011 (CC, IGI) • PexAce – Personal image metadata acquisition • Helper artifacts • Competence measures • SMAP 2011, IEEE • I-Semantics 2012, ACM • Int. J. of Human-Computer Studies, 2013 (CC, Elsevier) • CityLights – Music metadata validation • Betting mechanics – player competence through confidence • I-Semantics 2012b, ACM

Selected publications • SemanticsDiscoveryviaHumanComputationGames. In: InternationalJournal on Semantic Web and InformationSystems. 2011 • HumanComputation: Single-playerAnnotation Game forImageMetadata. InternationalJournal on Human-ComputerStudies. 2012 [In press]. • ValidationofMusicMetadatavia Game with a Purpose. I-Semantics 2012 (ACM) • Gameswith a Purpose: UserGeneratedValidMetadataforPersonalArchives. SMAP 2011 (IEEE CS) • LittleSearch Game: Term NetworkAcquisitionvia a HumanComputation Game. Hypertext 2011 (ACM) • PersonalImageTagging: a Game-basedApproach. I-Semantics 2012 (ACM)

Harnessing manpower for creating semantics (doctoral dissertation)