Adaptive Context Features for Toponym Resolution in Streaming News

Adaptive Context Features for Toponym Resolutionin Streaming News Group 12 HariKishanBandaru V S P V S K Kumar Parimi SnehaAnandYeluguri

Paper • Adaptive Context Features for Toponym Resolution in Streaming News • Michael D. Lieberman , Hanan Samet • Venue: In SIGIR’12: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval

Outline • Motivation • Related work • Problem Definition • Key concepts • Method • Validation/Results • Conclusion

Motivation • Demand for ever growing volumes of news and information. • People strive to stay up-to-date. • Internet-enabled mobile devices require location-based services.

Related work • Several commercial products for geotagging text are available, such as • MetaCarta’s Geotagger • Thomson Reuters’s OpenCalais • Yahoo!’s Placemaker

Problem Definition • The problem of assigning each toponym its correct lat/long values in the process of Geotagging, called toponym resolution, is a classification problem, where each of the possible interpretations for each toponym is classified as correct or incorrect, can be solved using our adaptive context features.

Introduction • News itself often has a strong geographic component. • Articles describing events that are relevant to geographic locations of interest to their readers. • Understand the geographic content present in the articles (Geotagging).

Geotagging Steps • Toponym recognition • finding all textual references to geographic locations. • Toponym Resolution • choosing the correct location interpretation for each toponym.

Key concepts • GEOTAGGING FRAMEWORK • Toponym Recognition • Toponym Resolution • Resolution Features • ADAPTIVE CONTEXT FEATURES • Proximity Features • Sibling Features • Feature Computation • Feature Propagation

Toponym Recognition • Toponym recognition procedure is designed as a multifaceted process involving • both rule-based and statistics-based • Perform lookups into various tables of entity names including location names, abbreviations, business names, person names, as well as cue words

Toponym Recognition • NLP tools, an NER package to recognize toponyms and other entities, and perform extensive post-processing on its output to ensure higher quality. • also perform • part-of-speech (POS) tagging to find phrases of proper • nouns, since names of locations (and other types of entities) • tend to be composed of proper nouns

Toponym Resolution • Methods from supervised machine learning to implement toponym resolution were used. • For a given toponym/interpretation pair (t, lt), decision is correct or incorrect. • Location interpretations are drawn from a gazetteer

Toponym Resolution • Decision tree-based ensemble classifier method random forests. • The random forests method constructs many decision trees based on different random subsets of the dataset, sampled with replacement. • Each decision tree is constructed using random subsets of features from the training feature vectors.

Previous Methods • One early proposed method considered the use of SVM regression to estimate a distance function based on feature vector values that is intended to capture the distance between a given lt, and t’s ground truth interpretation.

Resolution Features • Used several baseline toponym resolution features • I: Number of interpretations for t. • P: The population of lt, where a larger population indicates that lt is more well-known. • A: Number of alternate names for lt in various languages. More names indicates greater renown of lt. • D: Geographic distance of lt from an interpretation of a dateline toponym, which establishes a general location context for a news article. • L: Geographic distance of lt from the newspaper’s local lexicon, the expected location of its primary audience, expressed as a lat/long point.

Adaptive Context Features • Features reflect two aspects of toponym co-ocurrence and the evidence that interpretations impart to each other • Proximate interpretations • Sibling interpretations

Proximity Features • These are based on geographic distance. • Find for each other toponym o in the window around t the closest interpretation lo to lt. • The author computes the proximity feature for (t, lt) as the average of the geographic distances to the other interpretations. • The learning procedure can learn appropriate distance thresholds from its training data.

Sibling Features • Capture the relationships between textually proximate toponyms that share the same country, state, or other administrative division. • For each toponym/interpretation pair (t, lt), sibling feature value the number of other toponyms o in the window around t with an interpretation that is a sibling of lt at a given resolution.

Adaptive Features

Feature Accuracy • Window breadth, corresponds to size of the window around t . • Window depth is the maximum number of interpretations to be considered for each toponym in the window. • Rank these interpretations using various factors like GeoNames, Population of the location, Geographic distance.

Compute adaptive context features.

Validation/Results • General diﬃculty of geotagging due to large gazetteer, large amount of toponym ambiguity. • The extensive experiments performed on adaptive method and competing geotagging methods: • Thomson Reuters’s OpenCalais, and • Yahoo!’s Placemaker • Vary the adaptive context parameters(window breadth and depth) and their aﬀect on • feature computation time • accuracy of the Adaptive method

Gazetteer Ambiguity Toponyms and the number of interpretations

Datasets Breakdown of location types within each of test corpora

Resolution Accuracy Resolution accuracy of various methods

Resolution Accuracy(Contd.) Importance of features used in the Adaptive method

Adaptive Parameters

Conclusion And Future Work • Adaptive context features serve as a flexible, useful addition to geotagging algorithms for streaming news and other textual domains. • Test different toponyms weightings in window to judge their effect on resolution accuracy. • Consider clusters of news articles about the same topic and design other features using these clusters.

Thank You

Queries

Adaptive Context Features for Toponym Resolution in Streaming News

Adaptive Context Features for Toponym Resolution in Streaming News

Presentation Transcript

MediaNet : User-defined Adaptive Scheduling for Streaming Data

Multimedia Proxy Caching Mechanism for Quality Adaptive Streaming Applications in

Adaptive Video Streaming over ICN

Adaptive Video Streaming over ICN

Adaptive Batch Resolution Algorithm for CSMA Wireless Networks

Adaptive Batch Resolution Algorithm for CSMA Wireless Networks

Adaptive Peer-to-Peer Streaming

Adaptive tree walk: collision resolution

Adaptive Transmission for layered streaming in heterogeneous Peer-to-Peer networks

JSF 2.2 New Features In Context

CONTEXT: RECENT NEWS STORIES…

Incremental Context Mining for Adaptive Document Classification

Marvin Features and News

Adaptive middleware for context-aware applications in smart-homes

Marvin features and news

Adaptive Content-Aware Scaling for Improved Video Streaming.

Recent Developments in High-Resolution and Adaptive Methods for CFD

Layer-Encoded Video in Scalable Adaptive Streaming

Real-time smoothing for network adaptive video streaming

Run-Time Conflict Resolution for Personal Features

Feedback Control for Adaptive Live Video Streaming

ABP News Live | ABP News Live Streaming | Live ABP News