190 likes | 355 Views
Finding Support Sentences for Entities. Roi Blanco, Hugo Zaragoza SIGIR‘10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh. Outline. Introduction Notations Features for Ranking Support Sentences Entity Ranking Sentence Ranking with Entity Ranking Information Experiment. Introduction.
E N D
Finding Support Sentences for Entities Roi Blanco, Hugo Zaragoza SIGIR‘10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh
Outline • Introduction • Notations • Features for Ranking Support Sentences • Entity Ranking • Sentence Ranking with Entity Ranking Information • Experiment
Introduction • Ranking entities (e.g. experts, locations, companies, etc) became a standard information retrieval task. • Only show entities without explanations is not enough for users to indentify the relevance between query and entities. • Retrieving and ranking entity support sentences to explain the relevance of an entity with respect to a query.
Notations • : a collection of sentences (paragraphs or text window of fixed size) • : contexts surround • : sentence-entity matrix, , if entity j is present in sentence i , otherwise • :
Notations • : top k relevant sentences for query q • : augmentby adding contexts with respect to each • : candidate support sentences in for an entity e • : candidate support sentences in for an entity e
Features for Ranking Support Sentences • : using the original score of the sentence (measured by BM25) • : context-aware model using BM25F Only consider the relevance between query and sentences
Entity Ranking • : number of sequences containing the entity e (like tf) • : penalize very frequent entities (like idf) • : discover special entities
Position Consideration the distance between the last match of query and entity
Experiment • Using Semantically Annotated Snapshot, which contains - 1.5 M documents - 75M sentences - 20.3M unique name entities (using 12 first level Wall Street Journal entity types) • Built dataset of 226 (query, entity) pairs with 45 unique queries manually.
Experiment • Assessors produce queries about topic they know well. • System produces a set of candidate entities • Assessors eliminate the non-relevant entities with respect to the query • System produces candidate sentences for each (query, entity) pair
Experiment • Assessors evaluate four levels of relevance: 1. Non-relevant 2. Fairly relevant 3. Relevant 4. Very relevant • A triple is considered relevant iff
Experiment • Measurement - MRR - NDCG - P@1 - MAP Tie-aware evaluation is used
Experiment • functions operate on a top-k set for a given query that can be augmented with a context • The context of a sentence was defined as - The surrounding four sentences - The title of its Wikipedia entry • Represent each sentence in three fields - First: the sentence s - Second: the surrounding sentences - Third: Wikipedia title
Result Combination > KLD > Frequency > Rarity Sum > Average
The Role of Context • Given a fixed query q and a fixed entity e - Correct support sentence for (q, e) - The context in the ranking function itself
Conclusions & Future work • Developed several features embracing different paradigms to tackle the problem • The context of a sentence which can be effectively exploited using the BM25F • The methods might have a bias for longer sentences – apply sentence normalization • Pursuing other linguistic features of sentences