1 / 12

Extracting Local Understandings from User-Generated Reviews on City Guide Websites

Extracting Local Understandings from User-Generated Reviews on City Guide Websites Andrea Moed IS256 Applied Natural Language Processing Professor Marti Hearst December 6, 2006 Overview Motivations Corpus Processing Nickname discovery Ongoing experiments Attraction extraction

lotus
Download Presentation

Extracting Local Understandings from User-Generated Reviews on City Guide Websites

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Extracting Local Understandings from User-Generated Reviews on City Guide Websites Andrea Moed IS256 Applied Natural Language Processing Professor Marti Hearst December 6, 2006

  2. Overview • Motivations • Corpus • Processing • Nickname discovery • Ongoing experiments • Attraction extraction • Review classification • Future work Andrea Moed | IS56 ANLP

  3. Motivations • Local knowledge of well-known places… for locals • “Nobody goes there anymore, it’s too crowded” • Major draws (views, dishes, people…) • Best times/seasons/modes of transport? • Places to combine in one excursion • “A good place for X” vs. a Great Good Place* • *Ray Oldenburg, The Great Good Place: Cafes, Coffee Shops, Bookstores, Bars, Hair Salons, and Other Hangouts at the Heart of a Community, 1999 Andrea Moed | IS56 ANLP

  4. Corpus • Yelp San Francisco • Social site organized around cities, launched 2004 • Thousands of SF places, reviews and reviewers • Largely local interest (Mass Media, Pets) • Some areas useful for visitors (Night Life, Shopping) • Writerly culture high structural and stylistic variation in the text • Categories: Restaurants, Night Life, Shopping, Active Life, Local Flavor • Destinations • Frequently reviewed places: 20+ reviews Andrea Moed | IS56 ANLP

  5. Processing • Used Dappit to build page scrapers • Generated XML; parsed in Python • Place objects consisting of location info + reviews • Corpus collects place objects from various categories • Challenges of screen scraping • Tradeoff between more places and places with most reviews (optimization requires exhaustive search) • TripAdvisor proved too difficult • Analysis with Python and NLTK Lite Andrea Moed | IS56 ANLP

  6. Place Nickname Discovery • Goal: Discover alternate search terms to surface more diverse local results in web search • Method: Regular expression matching Andrea Moed | IS56 ANLP

  7. Place Nickname Discovery • Steps • Counted frequency of Yelp-given place name in reviews of that place • Tokenized name on whitespace • Rule-based generation of candidate nicknames: acronym, subsets of tokens • Compared frequencies of given name and each nickname • Potentially useful nicknames are those that occur at least half as often as the given name Andrea Moed | IS56 ANLP

  8. Place Nickname Discovery • Results • From 61 places (Restaurants, Active Life, Local Flavor), 38 reviews each • 23 of 61 places appeared to have frequently used nicknames • BUT in 9 cases this was due to common words in names • First word most commonly used nickname in remaining cases • Hypothesis: Long tail of less predictable nicknames Andrea Moed | IS56 ANLP

  9. Ongoing Work • Attraction extraction • TF/IDF calculation to find the concepts most widely associated with a place • Further text analysis to collect understandings of key concepts • Specificity • Sentiment • Temporality Andrea Moed | IS56 ANLP

  10. Ongoing Work • Attraction extraction • TF/IDF calculation to find the concepts most widely associated with a place • Further text analysis to collect understandings around key concepts • Specificity • Sentiment • Temporality Andrea Moed | IS56 ANLP

  11. Ongoing Work • Classification of reviews: recommendation vs. narrative • Recommendations help people “use” a city • Narrative is associated with memorable and unique locations • Features for classification • Verb tense distribution • Paragraph breaks • Opinion words at beginning and end (recommendation) • Memory and relationship words (narrative) Andrea Moed | IS56 ANLP

  12. Future Work • Relating understanding about location features to external data (geocoding, weather) • Visualization of extracted concepts • Development of a training set for classification Andrea Moed | IS56 ANLP

More Related