270 likes | 377 Views
Opinion Mapping Travelblogs. Efthymios Drymonas Alexandros Efentakis Dieter Pfoser Research Center Athena Institute for the Management of Information Systems Athens, Greece http:// www.imis.athena-innovation.gr. Introduction. Users create vast amounts of “geospatial” narratives
E N D
Opinion Mapping Travelblogs EfthymiosDrymonas AlexandrosEfentakis Dieter Pfoser Research Center Athena Institute for the Management of Information Systems Athens, Greece http://www.imis.athena-innovation.gr
Introduction Users create vast amounts of “geospatial” narratives …travel diaries, travel blogs… How to quickly assess them?
Motivation • Simple assessment of user-generated geospatial content • Visualization • Geospatial opinion maps
Opinion Mapping generating steps • Relating text to location – Geocoding • Relating user sentiment to text – Opinion Coding • Relating opinions to location – Opinion Mapping
1. Relating text to location – Geocoding • Web crawling • Geoparsing • Geocoding
1a. Web Crawling • Crawled for travel blog articles • Parsed ~ 150k HTML documents
1b. Geoparsing -Processing Pipeline Overview • GATE • Cafetiere IE system • YAHOO! API • Placemaker • Placefinder
1b. Linguistic Preprocessing • Tokeniser & Orthographic Analyser • Sentence Splitter • POS Tagger • Morphological Analysis, WordNet • Ex. “went south”, “goes south” = “go south”
1b. Semantic Analysis: i. Ontology Lookup Ontology access to retrieve potential semantic class information
1b. Semantic Analysis: ii. Feature Extraction (IE engine) • Compilation of semantic analysis rules • IE engine uses all previous info • Linguistic information (POS tags, orthographic info etc.) • Semantic and context information • Extraction of spatial objects
1c. PostProcessor - Geocoding • Collecting semantic analysis results and annotating them to the original text • Preparing the input to the geocoder module
1c. Geocoding • Place name info from semantic analysis transformed to coordinates • YAHOO! Placemaker for disambiguation • YAHOO! Placefindergeocoder
output XML file • From plain text to structured information • Also global document info extracted
2. Relating user sentiment to text– Opinion Coding 1/2 • OpinionFinder tool • Annotates text with positive or negative sentiments • Retain paragraphs only containing spatial info • Total positive and negative sentiments for each paragraph
2. Relating user sentiment to text– Opinion Coding 2/2 • Score for this paragraph : +2
3. Mapping opinions to location -Opinion Mapping Scoring method Spatial grid Aggregation method
Opinion Mapping (Scoring) • Each paragraph is characterized by a MBR • Visualized paragraph’s MBR do not exceed 0.5º x 0.5º • Each paragraph’s MBR is mapped to a sentiment color according to users’ opinions
Opinion Mapping (Issues) Problem: • Multiple paragraphs may partially target the same area (overlapping areas) • How to visualize partially overlapping MBRs of different paragraphs and sentiments
Opinion Mapping (Spatial grid) Solution: • We split earth into small tiles of 0.0045º x 0.0045º (~500m x 500m) • Each paragraph’s MBR consists of several such small tiles
Opinion Mapping (Aggregation Method) 1/2 • Partially overlapping paragraph MBRs translated to a set of overlapping tiles • Sentiment aggregation per tile (for drawing purposes) • Instead of sentiment aggregation per MBR
Opinion Mapping (Aggregation Method) 2/2 An example: • For one cell/tile there are four scores: -1, -2, 1, 0 • Resulting score is their sum: -2
Opinion Mapping examples Original MBRs of paragraphs
Opinion Mapping examples Paragraph MBRs divided in tiles – Aggregation per tile
Opinion Mapping examples Final result
Conclusions • Aggregating opinions is important for utilizing and assessing user-generated content • Total of more than 150k web pages/articles were processed • Sentiment information from various articles is aggregated and visualized • Relate portions of texts to locations • Geospatial opinion-map based on user-contributed information
Future Work • Better approach on sentiment analysis • More in-depth analysis of the results • Examine micro blogging content streams • Live updated sentiment information