140 likes | 302 Views
Topics in AI: Applied Natural Language Processing. Information Extraction and Recommender Systems for Video Games: Gameplay. Krishna Achuthan , Stephanie Hasz , Carl Staab. November 23, 2009. Initial Tasks. Research prior work Video game review analysis Other product review analysis
E N D
Topics in AI: Applied Natural Language Processing Information Extraction and Recommender Systems for Video Games: Gameplay Krishna Achuthan, Stephanie Hasz, Carl Staab November 23, 2009
Initial Tasks • Research prior work • Video game review analysis • Other product review analysis • Recommender methods • Create a lexicon of domain-specific terms for named entity recognition • Crawling sites, existing lexicons
Previous Research • Jose Zagal's paper • Reviews include different commentary types • Found that game review NLP is a virgin topic • One paper finding polarity of adjectives using review scores • A couple papers using presence of feature nouns in user reviews for search
NER & Recommender Research • Reviewed allgame, GameFly, GameSpot, GameSpy, GiantBomb, IGN, IMDB, MobyGames • GiantBomb: API for retrieving metadata • IGN: lexicon of video game terminology • Most sites had no “similar games” feature • Those that did used page views, genre, or user-submitted data
Giantbomb Extraction • Crawled GiantBomb game database and extracted entity names and types for each game • Necessary for efficient tagging • Established a fixed dataset to avoid unexpected errors from editing on live database • Games, franchises and their games, platforms, companies, genres, characters, locations, concepts
Named Entity Tagging • Used GiantBomb data to identify named entities in review text and their types • Tagger underwent several iterations • Result is flexible in terms of specifying capitalization or level of abbreviation for different starting strings, types of NEs • Most effective strategy: prioritize-but-overwrite-shorter
Named Entity Tagging • Example: occurrence of “Super Mario World” in review text for “Mario Galaxy” • Super <Mario CHARACTER> World • <Super Mario FRANCHISE> World • <Mario TITLE_PART> tag rejected - not longer than <Super Mario FRANCHISE> • <Super Mario FRANCHISE> <World LOCATION> • <Super Mario World OTHER_GAME>
Defining Gameplay • Read reviews, looking for sentences describing gameplay • Age of Empire III, Legend of Zelda: Twilight Princess, Animal Crossing, Gauntlet: Dark Legacy, Tony Hawk’s Pro Skater 3, Mario & Luigi: Partners in Time • Lack of emotional content in user reviews • Flaws described in more detail than strengths • Reviews focus on plot description • Categories emerged • Purchasing advice, story/structure, staying power/replay value, non-emotional and emotional gameplay experience, external factors
Gameplay Adjectives • Google bigram dataset gave us 531 adjectives describing gameplay • Separated review files into sentences, extracted sentences containing Google adjectives • Also extracted adjectives from GameSpot reviews • Needed domain-specific data • Adjectives might show that users are describing things we haven't considered • Later used for noun extraction
Review Adjectives • Using Stanford POS tagger, extracted adjectives from a subset of 3,074 reviews • Review subset taken from all genres with > 200 games • 60,000+ “adjectives” • Manually analyzed the list for gameplay words • Eliminated: • < 20 occurrences • Generic qualitative adjectives • Personality descriptors • Kept: action and experience words
Resultant Adjective List • 1,141 adjectives from 20 to 16,094 occurrences • Words describing: • Size: massive/tiny • Pace: quick/slow • Ease: easy/impossible • Uniqueness: innovative/uninspired • Experience: addictive/tedious • Aesthetics: gorgeous/ugly
Towards Using Adjectives • Extracted sentences with potentially interesting adjectives from a sample of reviews and parsed with the Minipar parser • Will allow us to further refine our lists of adjectives and especially nouns of interest • Eventually, will also use the MK-means clustering algorithm implemented this quarter to determine which adjectives are most useful
Interface • Backend-functionality for basic interface coded by Krishna • Utilizes a different database, but ASP code might be portable • Database contains all GiantBomb data vs. the GameSpot subset with review data
Next Steps • Cluster gameplay adjectives using Mkmeans • Description vs. experience? • Derive categories of gameplay • Assign games to gameplay categories • Extract sentences with both a gameplay adjective and noun • Assign games to their adjectives' categories • Incorporate gameplay features into database • Back-end coding of website