800 likes | 859 Views
YAGO: Yet Another Great Ontology. PhD Defense Fabian M. Suchanek (Max-Planck Institute for Informatics, Saarbr ü cken ) . Overview. Motivation: Why would anybody need Ontologies? Building a Core Ontology: YAGO Extending the Core Ontology: SOFIE. Santa Claus in Need.
E N D
YAGO: Yet Another Great Ontology PhD Defense Fabian M. Suchanek (Max-Planck Institute for Informatics, Saarbrücken) YAGO - A Core of Semantic Knowledge
Overview • Motivation: Why would anybody need Ontologies? • Building a Core Ontology: YAGO • Extending the Core Ontology: SOFIE YAGO - A Core of Semantic Knowledge
YAGO - A Core of Semantic Knowledge Santa Claus in Need World population
The Search for a Second Santa Claus strong, tall guy , australian Seeking strong, tall Australian man I'm 27, blue eyes, looking for a tall strongAustralian man. girls-seek-guys.com/london/42CachedSimilar pages YAGO - A Core of Semantic Knowledge
The Search for a Second Santa Claus strong person, > 1.90, Australian Seeking strong, tall Australian man I'm 27, blue eyes, looking for a tall strongAustralian man. ... I'm 190 kg girls-seek-guys.com/london/42CachedSimilar pages YAGO - A Core of Semantic Knowledge
The Search for a Second Santa Claus Hi Larry, it's me, Santa Claus. I think you misunderstood wh Seeking strong, tall Australian man I'm 27, blue eyes, looking for a tall strongAustralian man. girls-seek-guys.com/london/42CachedSimilar pages YAGO - A Core of Semantic Knowledge
Solution: An Ontology physical entity is a person is a is a continent is a isFrom height Australia 1.90m YAGO - A Core of Semantic Knowledge
Solution: An Ontology physical entity is a Classes person is a Relations is a continent is a isFrom Individuals Australia YAGO - A Core of Semantic Knowledge
Vision Gathering the knowledge of this world in a structured ontology. رSemantic Search رQuestion answering رMachine Translation رDocument classification ر… The world, I‘d like to say, even though some may contradict, is not as it seems. It rather seems as if the world seems not what it seems YAGO - A Core of Semantic Knowledge
Plan of Attack • Motivation • Building a Core Ontology: YAGO • Extending the Core Ontology: SOFIE The world, I‘d like to say, even though some may contradict, is not as it seems. It rather seems as if the world seems not what it seems YAGO - A Core of Semantic Knowledge
YAGO: Goal Goal: Build a Large Ontology Previous Approaches: رAssemble the ontology manually (WordNet, SUMO, Cyc, GeneOntology) Problem: Usually low coverage (MPI is in none of these) ر Use community work (Semantic Wikipedia, Freebase) Problem: We don't know yet whether it takes off YAGO - A Core of Semantic Knowledge
YAGO: Goal Goal: Build a Large Ontology Our Approach: رExtract knowledge from Wikipedia and WordNet (securing high coverage) ر Use extensive quality control techniques (securing high consistency) YAGO - A Core of Semantic Knowledge
YAGO: Infoboxes Claus K bornIn Sydney blah blah blub (don't read this! Better listen to the talk!) laber fasel suelz. Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Blub, aber blah! Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Insbesondere, blub, texte zu, und so weiter Exploit infoboxes Born in: Sydney ... YAGO - A Core of Semantic Knowledge
YAGO: Categories Claus K bornIn born Sydney blah blah blub (don't read this! Better listen to the talk!) laber fasel suelz. Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Blub, aber blah! Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Insbesondere, blub, texte zu, und so weiter 1980 Exploit infoboxes Exploit relational categories Categories: 1980_births YAGO - A Core of Semantic Knowledge
YAGO: Categories Australian Boxer Claus K isA bornIn born Sydney blah blah blub (don't read this! Better listen to the talk!) laber fasel suelz. Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Blub, aber blah! Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Insbesondere, blub, texte zu, und so weiter 1980 Exploit infoboxes Exploit relational categories Categories: Exploit conceptual categories Australian Boxers YAGO - A Core of Semantic Knowledge
YAGO: Categories Kick boxing Australian Boxer Claus K isA isA bornIn born Sydney blah blah blub (don't read this! Better listen to the talk!) laber fasel suelz. Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Blub, aber blah! Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Insbesondere, blub, texte zu, und so weiter 1980 Exploit infoboxes Exploit relational categories Categories: Exploit conceptual categories Kick boxing Avoid thematic categories YAGO - A Core of Semantic Knowledge
YAGO: Upper Model entity ? person Australian boxer is a born 1980 YAGO - A Core of Semantic Knowledge
YAGO: Upper Model Business Social_group ? People_by_occupation Australian boxer is a born 1980 YAGO - A Core of Semantic Knowledge
YAGO: Upper Model Person subclass WordNet Boxer subclass Australian boxer is a Wikipedia born 1980 [Suchanek et al.: WWW 2007] YAGO - A Core of Semantic Knowledge
YAGO: Quality Control 1. Canonicalization 1. ... of entities Santa Klaus Santa Clause Santa Claus Santa YAGO - A Core of Semantic Knowledge
YAGO: Quality Control 1. Canonicalization 1. ... of entities YAGO - A Core of Semantic Knowledge
YAGO: Quality Control 1. Canonicalization 1. ... of entities 2. ... of facts born 1980 born 1980-12-19 YAGO - A Core of Semantic Knowledge
YAGO: Quality Control 1. Canonicalization 1. ... of entities 2. ... of facts 2. Type Checks 1. Reductive Type Checking range(bornOnDate, timepoint) bornOnDate(Claus_Kent, Sydney) YAGO - A Core of Semantic Knowledge
YAGO: Quality Control Entity 1. Canonicalization 1. ... of entities 2. ... of facts 2. Type Checks 1. Reductive Type Checking 2. Type Coherence Checking Person Artifact Boxer, Swimmer, Flight instructor, Airplane YAGO - A Core of Semantic Knowledge
YAGO: Quality Control 1. Canonicalization 1. ... of entities 2. ... of facts 2. Type Checks 1. Reductive Type Checking 2. Type Coherence Checking Every fact and every entity occurs exactly once Every fact fulfills its type constraints [Suchanek et al.: JWS 2008] YAGO - A Core of Semantic Knowledge
YAGO: Numbers bornIn, actedIn, hasInflation,... Relations: 100 Entities: 2 million Facts: 19 million Accuracy: 95% One of the largest public free ontologies Unprecedented quality among automatedly constructed ontologies YAGO - A Core of Semantic Knowledge
YAGO: Model boxer #1 (ClausKent,is_a,boxer) #2 (#1, since, 1990) #3 (#1, source, Wikipedia) since 1990 is a source Wikipedia YAGO - A Core of Semantic Knowledge
YAGO: Model • A YAGO ontology over • a set of relations R • a set of common entities C • a set of fact identifiers I • is a function I (RCI) R (RIC) #1 (ClausKent,is_a,boxer) #2 (#1, since, 1990) #3 (#1, source, Wikipedia) • We can talk about • facts (#1, source, Wikipedia) • additional arguments (#1, since, 1990) • relations (since, hasRange, time_interval) Still: Decideable Consistency YAGO - A Core of Semantic Knowledge
YAGO: Summary YAGO is an ontology that is رlarge (combining Wikipedia and WordNet) رaccurate (using extensive quality control) رcomputationally tractable (with a decideable consistency) YAGO - A Core of Semantic Knowledge
Plan of Attack • Motivation • Building a Core Ontology: YAGO • Extending the Core Ontology: SOFIE YAGO The world, I‘d like to say, even though some may contradict, is not as it seems. It rather seems as if the world seems not what it seems YAGO - A Core of Semantic Knowledge
SOFIE: Goal Statement bornIn Patara Saint Nicholas Goal: Extending the ontology Saint Nicholas was born in Patara. YAGO - A Core of Semantic Knowledge
SOFIE: Goal Statement bornIn Patara Saint Nicholas Goal: Extending the ontology Saint Nicholas ce e poдuлвPatara. YAGO - A Core of Semantic Knowledge
SOFIE: Goal Statement bornIn Patara Saint Nicholas Goal: Extending the ontology recoverWithout(most_people, medication) areUnder(0%, the_age_of_18) support(these_findings, the_notion) Saint Nicholas was born in Patara. Previous Approaches: ر Extract knowledge from corpora (e.g. the Web) (Text2Onto, Espresso, Snowball, TextRunner) Problems: Low accuracy, non-canonicity YAGO - A Core of Semantic Knowledge
SOFIE: Goal Statement bornIn Patara Saint Nicholas Goal: Extending the ontology Saint Nicholas was born in Patara. Our Approach (1): رLEILA - Combining Linguistic and Statistical Analysis [Suchanek et al.: KDD 2006] Has high accuracy, but does not deliver canonicity YAGO - A Core of Semantic Knowledge
SOFIE: Goal Statement bornIn Patara Saint Nicholas Goal: Extending the ontology Saint Nicholas was born in Patara. Our Approach (2): ر SOFIE: Use logical reasoning to guarantee canonicity YAGO - A Core of Semantic Knowledge
SOFIE: Example YAGO ~ Worshipped People ~ bornInYear 1935 Saint Nicholas was born in the year 1417. Elvis Presley was born in the year 1935. "was born in the year" expresses bornInYear Pattern occurrence ~~> pattern meaning YAGO - A Core of Semantic Knowledge
SOFIE: Example YAGO ~ Worshipped People ~ bornInYear 1935 Saint Nicholas was born in the year 1417. Elvis Presley was born in the year 1935. "was born in the year" expresses bornInYear Pattern occurrence ~~> pattern meaning Pattern occurrence ~~> sentence meaning bornInYear 1417 YAGO - A Core of Semantic Knowledge
SOFIE: Example YAGO ~ Worshipped People ~ bornInYear 1935 Saint Nicholas was born in the year 1417. diedInYear Elvis Presley was born in the year 1935. 347 "was born in the year" expresses bornInYear Pattern occurrence ~~> pattern meaning Pattern occurrence ~~> sentence meaning bornInYear 1417 People should be born before they die. YAGO - A Core of Semantic Knowledge
SOFIE: Example YAGO ~ Worshipped People ~ bornInYear 1935 Saint Nicholas was born in the year 1417. diedInYear Elvis Presley was born in the year 1935. 347 "was born in the year" expresses bornInYear Pattern occurrence ~~> pattern meaning Pattern occurrence ~~> sentence meaning bornInYear 1417 People should be born before they die. YAGO - A Core of Semantic Knowledge
SOFIE: Example YAGO Task 1: Find Patterns bornInYear 1935 Saint Nicholas was born in the year 1417. diedInYear Elvis Presley was born in the year 1935. 347 Task 2: Use semantic reasoning Task 3: Disambiguate entities Pattern occurrence ~~> pattern meaning Pattern occurrence ~~> sentence meaning bornInYear 1417 People should be born before they die. YAGO - A Core of Semantic Knowledge
SOFIE: It‘s all logical formulae! YAGO Task 1: Find Patterns bornInYear(ElvisPresley,1935) diedInYear(NicholasOfMyra,347) occurs("was born in the year", SaintNicholas,1417) occurs("was born in the year", ElvisPresley,1935) Task 2: Use semantic reasoning Task 3: Disambiguate entities occurs(P,X,Y) /\ expresses(P,R) => R(X,Y) means(SaintNicholas,NicholasOfMyra) 0.8 means(SaintNicholas,NicholasOfFüe) 0.2 refersTo(SaintNicholas,NicholasOfFüe) ? bornOnDate(NicholasOfFüe, 1417) ? bornInYear(X,B) /\ diedInYear(X,D) => B<D YAGO - A Core of Semantic Knowledge
SOFIE: Information Extraction as MAX SAT We have a Weighted MAX SAT Problem r(x,y) /\ s(x,z) => t(x,z) [w] ... Problem: ر The Weighted MAX SAT Problem is NP-hard ر Our instance contains YAGO (19 million facts) and textual facts (e.g. 10,000 facts) ر The best-known approximation algorithm cannot deal well with our specific instance YAGO - A Core of Semantic Knowledge
SOFIE: A Unifying Framework r(a,b) => s(x,y) Task 1: Find Patterns Polynomial time Algorithm Functional MAX SAT FOR i=1 TO 42 ... NEXT i Task 2: Use semantic reasoning Approximation Guarantee Task 3: Disambiguate entities 1417 NicholasOfFlüe [Suchanek et al: TR 2009] YAGO - A Core of Semantic Knowledge
SOFIE: Experiments YAGO - A Core of Semantic Knowledge
SOFIE: Summary SOFIE unifies 3 tasks in a single framework: SOFIE delivers رcanonicalized facts رof high precision Task 1: Find Patterns Task 2: Use semantic reasoning Task 3: Disambiguate entities YAGO - A Core of Semantic Knowledge
But back to the original question... Is there any Australian guy taller than 1.90m who could help me out? YAGO - A Core of Semantic Knowledge
Conclusion: Good News ر We made a great step towards gathering the knowledge of this world in a structured ontology YAGO SOFIE The world, I‘d like to say, even though some may contradict, is not as it seems. It rather seems as if the world seems not what it seems ر Christmas is safe! YAGO - A Core of Semantic Knowledge
References [Suchanek et al.: KDD 2006] Fabian M. Suchanek, Georgiana Ifrim and Gerhard Weikum "Combining Linguistic and Statistical Analysis to Extract Relations from Web Documents" Conference on Knowledge Discovery and Data Mining (KDD 2006) [Suchanek et al.: WWW 2007] Fabian M. Suchanek, Gjergji Kasneci and Gerhard Weikum "YAGO - A Core of Semantic Knowledge" International World Wide Web conference (WWW 2007) [Suchanek et al.: JWS 2008] Fabian M. Suchanek, Gjergji Kasneci and Gerhard Weikum "YAGO - A Large Ontology from Wikipedia and WordNet" Suchanek et al.: JWS Journal of Web Semantics 2008 [Suchanek et al.: TR 2009] Fabian M. Suchanek, Mauro Sozio, Gerhard Weikum „SOFIE – A Self-Organizing Framework for Information Extraction“ Submitted to the International World Wide Web conference (WWW 2009) See Technical Report or my PhD Thesis on http://mpii.de/~suchanek YAGO - A Core of Semantic Knowledge
Acronyms LEILA: Learning to Extract Information by Linguistic Analysis YAGO: Yet Another Great Ontology SOFIE: Self-Organizing Framework for Information Extraction NAGA: Not another Google Answer YAGO - A Core of Semantic Knowledge
YAGO: Thematic vs Conceptual Categories Australian boxers of German origin conceptual: thematic: Kick boxing in Australia Shallow linguistic noun phrase parsing: Premodifier Head Postmodifier Heuristics: If the head is a plural word, the category is conceptual YAGO - A Core of Semantic Knowledge