170 likes | 299 Views
Entity Ranking Using Wikipedia as a Pivot. (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu. Outline. Introduction From Wikipedia Entities to Web Entities and back Entity Ranking on Wikipedia Entity Ranking on Web Conclusion. Introduction.
E N D
Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu
Outline • Introduction • From Wikipedia Entities to Web Entities and back • Entity Ranking on Wikipedia • Entity Ranking on Web • Conclusion
Introduction • Entity ranking is the task of finding documents representing entities of a correct type that are relevant to a query. • presenting a ranked list of entities directly, rather than a list of web pages with relevant but also potentially redundant information about these entities.
Differs from document retrieval on at least three points: • i) returned documents have to represent an entity • ii) this entity should belong to a specified entity type • iii) to create a diverse result list an entity should only be returned once.
Main Goal • To Rank Web entities • 1. Associate target entity types with the query • 2. Rank Wikipedia pages according to their similarity with the query and target entity types • 3. Find web entities corresponding to the Wikipedia entities
Using Wikipedia as a pivot • entities: Wikipedia pages • the name of the entity: the title of the page • the content of the page: the representation of the entity • Each Wikipedia page is assigned to a number of categories: topical, type, and administrative categories.
From Wikipedia Entities to Web Entities and back • From Web to Wikipedia • these repositories provide enough clues to find the corresponding entities on theWeb? • they contain enough entities that cover the complete range of entities needed to satisfy all kinds of information needs?
From Wikipedia to Web • Use External Link
Entity Ranking on Wikipedia* Entity Types • Entity Type Assignment • exploit the existing Wikipedia categorization of documents • Pseudo-relevance feedback of the top retrieved documents • we extract the categories that are most frequently assigned • the top 10 results, and look at the 2 most frequently occurring categories belonging to these documents
: the query terms: the document: the entire Wikipedia document collection : the name of the category: the category *Entity Types-Scoring Entities • estimate background probabilities • smooth the probabilities of a term occurring in a category name with the background collection
Similarity between two categories • The entity type score for a document in relation to a query topic • Score Normalization
Entity Ranking on Wikipedia*Experimental Setup • Data Set: • INEX: specific, ex countries, national parks.. • TREC: people, organization, product • Advantage: clear, few options, could be easily selected • Disadvantage: cover a small part of all possible entity ranking queries manually assigned more specific entity types
rerank the top 2,500 results of the baseline • Manually assigned (author) • Automatically assigned (PRF) • evaluation • 2009 TREC:P10 and NDCG@20 • INEX:P10 and MAP • INEX 2006-2008 consisting of 79 topics • INEX 2009 topics consisting of a selection of 55 topics from the 2006-2008 topics. • only count the so-called ‘primary’ pages
Entity Ranking on The Web • We have three approaches for finding web pages associated with Wikipedia pages. • 1. External links: • the External links section of the Wikipedia page • 2. Anchor text: • Wikipedia page title as query • retrieve pages from the anchor text index • 3. Combined: • not all Wikipedia pages have external links • not all external links of Wikipedia pages are part of the Clueweb collection • less than 3 webpages are found, we fill up the results to 3 pages using the top pages retrieved using anchor text
Conclusion • Our experiments show that our wikipedia-as-a-pivot approach outperforms a baselines of full-text search. • Both external links on Wikipedia pages, and searching an anchor text index of the web are effective approaches to find homepages for entities represented by Wikipedia pages.