Finding Support Sentences for Entities

Finding Support Sentences for Entities Roi Blanco, Hugo Zaragoza SIGIR‘10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh

Outline • Introduction • Notations • Features for Ranking Support Sentences • Entity Ranking • Sentence Ranking with Entity Ranking Information • Experiment

Introduction • Ranking entities (e.g. experts, locations, companies, etc) became a standard information retrieval task. • Only show entities without explanations is not enough for users to indentify the relevance between query and entities. • Retrieving and ranking entity support sentences to explain the relevance of an entity with respect to a query.

Notations • : a collection of sentences (paragraphs or text window of fixed size) • : contexts surround • : sentence-entity matrix, , if entity j is present in sentence i , otherwise • :

Notations • : top k relevant sentences for query q • : augmentby adding contexts with respect to each • : candidate support sentences in for an entity e • : candidate support sentences in for an entity e

Features for Ranking Support Sentences • : using the original score of the sentence (measured by BM25) • : context-aware model using BM25F Only consider the relevance between query and sentences

Entity Ranking • : number of sequences containing the entity e (like tf) • : penalize very frequent entities (like idf) • : discover special entities

Sentence Ranking with Entity Ranking Information

Position Consideration the distance between the last match of query and entity

Experiment • Using Semantically Annotated Snapshot, which contains - 1.5 M documents - 75M sentences - 20.3M unique name entities (using 12 first level Wall Street Journal entity types) • Built dataset of 226 (query, entity) pairs with 45 unique queries manually.

Experiment • Assessors produce queries about topic they know well. • System produces a set of candidate entities • Assessors eliminate the non-relevant entities with respect to the query • System produces candidate sentences for each (query, entity) pair

Experiment • Assessors evaluate four levels of relevance: 1. Non-relevant 2. Fairly relevant 3. Relevant 4. Very relevant • A triple is considered relevant iff

Experiment • Measurement - MRR - NDCG - P@1 - MAP Tie-aware evaluation is used

Experiment • functions operate on a top-k set for a given query that can be augmented with a context • The context of a sentence was defined as - The surrounding four sentences - The title of its Wikipedia entry • Represent each sentence in three fields - First: the sentence s - Second: the surrounding sentences - Third: Wikipedia title

Result Combination > KLD > Frequency > Rarity Sum > Average

The Role of Context • Given a fixed query q and a fixed entity e - Correct support sentence for (q, e) - The context in the ranking function itself

Conclusions & Future work • Developed several features embracing different paradigms to tackle the problem • The context of a sentence which can be effectively exploited using the BM25F • The methods might have a bias for longer sentences – apply sentence normalization • Pursuing other linguistic features of sentences

Finding Support Sentences for Entities

Finding Support Sentences for Entities

Presentation Transcript

Finding the Subject of Sentences

Not-for-Profit Entities

Not-for-Profit Entities

SFRS FOR SMALL ENTITIES

Finding Support

Intelligent Behaviors for Simulated Entities

Non for Profit Entities

Finding Informative Sentences in Full-text Journal Articles

Finding the Subject/Verb in Sentences

Finding the Subject of Sentences

A System for Finding Biological Entities that Satisfy Certain Conditions from Texts

Defining Entities for Description

Applicant entities / Designated operational entities Different checkpoints for new methodologies

Do Support and Emphatic Sentences

Sentences . Sentences? Sentences!

Entities

Finding Research Support

Entities

Defining Entities for Description

Defining Entities for Description

Entities

Not-for-Profit Entities