200 likes | 309 Views
Assigning Global Relevance Scores to DBpedia Facts. Philipp Langer, Patrick Schulze, Stefan George, Tobias Metzke, Ziawasch Abedjan , Gjergji Kasneci DESWeb 03/31/2014. Structured Data. Advantages of structured data over unstructured data: S earch for explicit facts
E N D
Assigning Global Relevance Scores to DBpedia Facts Philipp Langer, Patrick Schulze, Stefan George, Tobias Metzke, Ziawasch Abedjan, Gjergji Kasneci DESWeb 03/31/2014
Structured Data • Advantages of structured data over unstructured data: • Search for explicit facts • Summarization of possibly interesting information • Automated knowledge discovery • Google Knowledge Graph • RDF Knowledge bases • DBpedia, YAGO/NAGA A handful of salient facts about the query entity. Assigning Global Relevance Scores to DBpedia Facts
Querying YAGO • Asking for classes to which Albert Einstein belongs Assigning Global Relevance Scores to DBpedia Facts
Querying DBpedia • Asking for classes to which Albert Einstein belongs Assigning Global Relevance Scores to DBpedia Facts
Challenge select distinct ?p, ?o where { dbpedia:Barack_Obama ?p ?o} Web Documents Assigning Global Relevance Scores to DBpedia Facts
Challenges Big Data DBpedia 3.8, ClueWebcorpus Architecture Text extraction, score computation/ranking, queryprocessing Ranking Strategies Imrovetherankingresults Evaluation Conductionofuserstudies Assigning Global Relevance Scores to DBpedia Facts
Overview Web application (Django) • Languages • Python • Java • SPARQL • JavaScript • Frameworks: • Django • Lucene Ranking strategies Ranking strategies Intra DBpedia strategies Web Corpus strategies User Studies Querying Web corpus (Lucene Index) Application Data (Postgres) DBpediaEndpoint (Apache Jena) 6 Assigning Global Relevance Scores to DBpedia Facts
Ranking Facts • Query types: • Subject queries - return all physicists • Property queries - return all facts related to Einstein • Ranking strategies • Ranking by frequency and document frequency • Ranking by information diversity • Random walk • Web-based co-occurrence statistics • SELECT ?s { ?stype Physicist } • SELECT ?p ?o{ Albert_Einstein?p ?o } Assigning Global Relevance Scores to DBpedia Facts
Ranking by frequency and document frequency [Shady et al ESWC’11] subjectdocumentof „Albert Einstein“ predicatedocumentof „topic“ <Albert_Einstein> • <topic><Nobel_laureates>; • <topic><Theoretical_physicists>; • <topic><German_physicists>; • <topic><American_inventors>; <type> <Scientist>; <type> <Person>; <type> <Thing>; <residence> "Switzerland"; <residence> "Austria-Hungary"; <residence> "German Empire"; <spouse> "Mileva Maric"; ... • <Newton> <topic> <Theoretical_physicists>. • <Newton> <topic> <Nobel_laureates>. • <Newton> <topic> <Mathematicians>. • <Newton> <topic> <Optical_physicists>. • <Newton> <topic> <History_of_calculus>. • <Newton> <topic> <English_alchemists>. • <Einstein><topic> <Theoretical_physicists>. • <Einstein><topic> <Nobel_laureates>. • <Einstein><topic> <German_physicists>. • <Einstein><topic> <American_inventors>. objectdocumentof „Theoreticalphysicists“ • <Isaac_Newton> <topic> <Theoretical_physicists>. • <Albert_Einstein><topic> <Theoretical_physicists>. • <Bruno_Coppi> <topic> <Theoretical_physicists>. • <Ravi_Gomatam> <topic> <Theoretical_physicists>. ... Assigning Global Relevance Scores to DBpedia Facts
Ranking by frequency and document frequency Isaac Newton academicAdvisor ...; birthDate ...; birthPlace ...; comment ...; ethnicity ...; field ...; influenced ...; influencedBy ...; knownFor ...; label ...; notableStudent ...; subject ...; subject ...; type ...; Ravi Gomatam subject ...; subject ...; subject ...; subject ...; subject ...; • Subject queries: • Global relevance Assigning Global Relevance Scores to DBpedia Facts
Limitations for Property Queries • Property queries: • Global relevant but distinctive to the given subject • typePerson vs. typeScientist Assigning Global Relevance Scores to DBpedia Facts
Ranking by diversity • Following a probabilistic model • Property queries: • Properties and objects that are as discriminative as possible • Subject queries: Assigning Global Relevance Scores to DBpedia Facts
Random Walk Model • Consider the knowledge base as a directed graph • Already applied in [Kasneci CIKM’09] • Problem: literals have no outgoing link • Use Wiki Pagelinks and Infobox Property Mappings • Entities with high indegree, such as countries, are favored • Good for subject queries • Bad for property queries Assigning Global Relevance Scores to DBpedia Facts
Co-occurrence statistics Web Documents • Lemur Project Clueweb09 Category-B web corpus • 50 million web documents (1.5 TB) • Only English-language documents • Includes approx. 2.7 million Wikipedia articles • Create an inverted index • Consider different word distance limits as documents • Rank subject-object pairs • „Albert Einstein“ and „Physicist“ • Store only pairwise co-occurrence: • Compute frequency of s: Assigning Global Relevance Scores to DBpedia Facts
Evaluation • User study 1 • 8 queries • all results • 12 users • 19 approaches/ configurations • 1-4: irrelevant- highly relevant • User study 2 • 8+20 queries • top-10 results of best 4 approaches side-by-side 10 users • Best 3 approaches from user study 1 Assigning Global Relevance Scores to DBpedia Facts
Top 4 Approaches in User study 1 Assigning Global Relevance Scores to DBpedia Facts
User study 2 Assigning Global Relevance Scores to DBpedia Facts
Results Example:Theoretical Physicists DBpedia Random Walk Model Assigning Global Relevance Scores to DBpedia Facts
Results Example: Albert Einstein • DBpedia Co-occurrence statistics Assigning Global Relevance Scores to DBpedia Facts
Conclusions • Investigated multiple approaches to rank DBpedia facts • Information theory, statistical reasoning, random walk, and co-occurrence statistics in web documents • DBpedia Knowledge base already provides enough information to improve the ranking of results • Improvement of property queries through web-based co-occurrence statistics • We provide the annotated datasets at • https://www.hpi.uni-potsdam.de/naumann/sites/dbpedia/ Assigning Global Relevance Scores to DBpedia Facts