Entity Ranking Using Wikipedia as a Pivot

Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu

Outline • Introduction • From Wikipedia Entities to Web Entities and back • Entity Ranking on Wikipedia • Entity Ranking on Web • Conclusion

Introduction • Entity ranking is the task of finding documents representing entities of a correct type that are relevant to a query. • presenting a ranked list of entities directly, rather than a list of web pages with relevant but also potentially redundant information about these entities.

Differs from document retrieval on at least three points: • i) returned documents have to represent an entity • ii) this entity should belong to a specified entity type • iii) to create a diverse result list an entity should only be returned once.

Main Goal • To Rank Web entities • 1. Associate target entity types with the query • 2. Rank Wikipedia pages according to their similarity with the query and target entity types • 3. Find web entities corresponding to the Wikipedia entities

Using Wikipedia as a pivot • entities: Wikipedia pages • the name of the entity: the title of the page • the content of the page: the representation of the entity • Each Wikipedia page is assigned to a number of categories: topical, type, and administrative categories.

From Wikipedia Entities to Web Entities and back • From Web to Wikipedia • these repositories provide enough clues to find the corresponding entities on theWeb? • they contain enough entities that cover the complete range of entities needed to satisfy all kinds of information needs?

From Wikipedia to Web • Use External Link

Entity Ranking on Wikipedia* Entity Types • Entity Type Assignment • exploit the existing Wikipedia categorization of documents • Pseudo-relevance feedback of the top retrieved documents • we extract the categories that are most frequently assigned • the top 10 results, and look at the 2 most frequently occurring categories belonging to these documents

: the query terms: the document: the entire Wikipedia document collection : the name of the category: the category *Entity Types-Scoring Entities • estimate background probabilities • smooth the probabilities of a term occurring in a category name with the background collection

Similarity between two categories • The entity type score for a document in relation to a query topic • Score Normalization

Entity Ranking on Wikipedia*Experimental Setup • Data Set: • INEX: specific, ex countries, national parks.. • TREC: people, organization, product • Advantage: clear, few options, could be easily selected • Disadvantage: cover a small part of all possible entity ranking queries manually assigned more specific entity types

rerank the top 2,500 results of the baseline • Manually assigned (author) • Automatically assigned (PRF) • evaluation • 2009 TREC:P10 and NDCG@20 • INEX:P10 and MAP • INEX 2006-2008 consisting of 79 topics • INEX 2009 topics consisting of a selection of 55 topics from the 2006-2008 topics. • only count the so-called ‘primary’ pages

Entity Ranking on The Web • We have three approaches for finding web pages associated with Wikipedia pages. • 1. External links: • the External links section of the Wikipedia page • 2. Anchor text: • Wikipedia page title as query • retrieve pages from the anchor text index • 3. Combined: • not all Wikipedia pages have external links • not all external links of Wikipedia pages are part of the Clueweb collection • less than 3 webpages are found, we fill up the results to 3 pages using the top pages retrieved using anchor text

Conclusion • Our experiments show that our wikipedia-as-a-pivot approach outperforms a baselines of full-text search. • Both external links on Wikipedia pages, and searching an anchor text index of the web are effective approaches to find homepages for entities represented by Wikipedia pages.

Entity Ranking Using Wikipedia as a Pivot

Entity Ranking Using Wikipedia as a Pivot

Presentation Transcript

ENHANCING CLUSTER LABELING USING WIKIPEDIA

Computers As A One Stop Entity

Wikitology Wikipedia as an Ontology

Wikitology Wikipedia as an Ontology

APS Wikipedia Initiative: Using Wikipedia Writing in Psychology Classes

Vandalism Detection in Wikipedia using Trustworthy Ranking and Semantic Context Analysis

Augmenting Wikipedia with Named Entity Tags

Vandalism Detection in Wikipedia using Trustworthy Ranking and Semantic Context Analysis

Topic: The use of Wikipedia ( wikipedia ) as a tool for learning English.

Operating as a Hybrid Entity at Cornell

In Situ Evaluation of Entity Ranking and Opinion Summarization using

Entity Ranking and Relationship Queries Using an Extended Graph Model

Natural Language Processing using Wikipedia

Wikipedia as a teaching tool

Entity Movement and Pivot in Blitz3D

Finding Domain Terms using Wikipedia

Wikitology Wikipedia as an Ontology

Modeling Change as a First-Class Entity

Using Pivot Tables

Wikipedia as a resource for Computational Biology training

Named Entity Disambiguation by Leveraging Wikipedia Semantic Knowledge