200 likes | 466 Views
YAGO-QA Answering Questions by Structured Knowledge Queries. Peter Adolphs Martin Theobald Ulrich Sch äfer Hans Uszkoreit Gerhard Weikum. ICSC Stanford University September 19, 2011. Jeopardy!. A big US city with two airports, one named after a World
E N D
YAGO-QAAnswering Questions by Structured Knowledge Queries Peter Adolphs Martin Theobald Ulrich Schäfer Hans Uszkoreit Gerhard Weikum ICSC Stanford University September 19, 2011
Jeopardy! A big US city with two airports, one named after a World War II hero, and one named after a World War II battle field? YAGO-QA: Answering Questions by Structured Knowledge Queries
Deep-QA in NL William Wilkinson's "An Account of the Principalities of Wallachia and Moldavia" inspired this author's most famous novel This town is known as "Sin City" & its downtown is "Glitter Gulch" As of 2010, this is the only former Yugoslav republic in the EU 99 cents got me a 4-pack of Ytterlig coasters from this Swedish chain question classification & decomposition knowledge backends D. Ferrucci et al.: Building Watson: An Overview of the DeepQA Project.AI Magazine, 2010. YAGO www.ibm.com/innovation/us/watson/index.htm YAGO-QA: Answering Questions by Structured Knowledge Queries
Structured Knowledge Queries A big US city with two airports, one named after a World War II hero, and one named after a World War II battle field? Select Distinct ?c Where { ?c type City . ?c locatedIn USA . ?a1 type Airport . ?a2 type Airport . ?a1 locatedIn ?c . ?a2 locatedIn ?c . ?a1 namedAfter ?p . ?p type WarHero . ?a2 namedAfter ?b . ?b type BattleField . } In this work: focus on factoid and list questions YAGO-QA: Answering Questions by Structured Knowledge Queries
Agenda • YAGO Server & API • Wikipedia-based information extraction • Searching & ranking in large RDF graphs • Names, Surface Patterns & Paraphrases • Named entity disambiguation • Mapping surface patterns onto semantic relations • Crowdsourcing for questions paraphrases • YAGO-QA Architecture • Template-based mapping of NL questions onto SPARQL • Conclusions & Future Work YAGO-QA: Answering Questions by Structured Knowledge Queries
Information Extraction from Wikipedia YAGO-QA: Answering Questions by Structured Knowledge Queries
YAGO Knowledge Base • Combine knowledge from WordNet & Wikipedia • Additional Gazetteers (geonames.org) • Part of the Linked-Data cloud YAGO-QA: Answering Questions by Structured Knowledge Queries
YAGO-2 Numbers estimated precision > 95% (for base relations excl. space, time & provenance) www.mpi-inf.mpg.de/yago-naga/ YAGO-QA: Answering Questions by Structured Knowledge Queries
Searching & Ranking RDF Graphs in NAGA Rankingbasedon confidence, compactnessandrelevance Discoveryqueries: hasWon diedOn Nobel prize $a $x type bornIn Kiel $x scientist > hasSon diedOn $y $b Connectednessqueries: type * German novelist Thomas Mann Goethe Querieswithregularexpressions: hasFirstName | hasLastName type Ling $x scientist (coAuthor | advisor)* worksFor locatedIn* $y Zhejiang Beng Chin Ooi YAGO-QA: Answering Questions by Structured Knowledge Queries
YAGO Server: UI & API % YAGO-QA: Answering Questions by Structured Knowledge Queries
YAGO Server: UI & API YAGO-UI • Interactive online demo • RDF with time, space & provenance annotations • SPARQL + keywords YAGO-API Two basic WebServices: • processQuery (String query) • getYagoEntitiesByNames (String[] names) … www.mpi-inf.mpg.de/yago-naga/demo.html YAGO-QA: Answering Questions by Structured Knowledge Queries
Names, Surface Patterns & Paraphrases Which chemist was born in London? • (I) Named entity disambiguation • chemist wordnet_chemist, wordnet_pharmacist • born Bertran_de_Born, Born_Identity_(Movie), Born_(Album) • London London_UK, London_Arkansas, Antonio_London • (II) Mapping surface patterns onto semantic relations • <person>was_born_in<location> bornIn(<person>, <location>) • <person>was_born_in<date> bornOn(<person>, <date>) • (III) Paraphrases of questions <person>[was] born in<location> <location>-born <person> NN VBD VBN IN NNP/LOC bornIn(<person>,<location>) YAGO-QA: Answering Questions by Structured Knowledge Queries
(I) Named Entity Disambiguation #inlinks with anchor “Paris” Paris 32,362 Paris, France 570 Paris Masters 134 Paris (mythology) 118 University of Paris 79 Paris, Texas 56 Paris, Ontario 45 Paris (rapper) 29 Open Gaz de France 26 Paris, Kentucky 20 Paris (2008 film) 19 Gare Saint-Lazare 18 Paris, Tennessee 17 BNP Paribas Masters 16 Paris, Maine 14 Paris Hilton 12 Paris, Arkansas 11 Paris (Supertramp album) 10 Gare du Nord 9 Paris (1979 TV series) 8 Count Paris 7 PalaisOmnisports de Paris-Bercy 6 Paris, Virginia 5 Paris 2012 Olympic bid 4 Paris (2003 film) 3 • Wikipedia link structure • 65,872,435 intra-wiki links • 2,782,297 disambiguation pages & 328,372 redirects • 2,886,027 distinct link anchor texts YAGO “means” relation • 18,470,099 mappings of names to entities • 6.2 distinct names per entity (on avg.) Individual name disambiguation vs. joint disambiguation AIDA tool for graph-based disambiguation in YAGO-2: “Robust Disambiguation of Named Entities in Text” J. Hoffart et al. In EMNLP, Edinburgh, Scotland, 2011 www.mpi-inf.mpg.de/yago-naga/aida/ YAGO-QA: Answering Questions by Structured Knowledge Queries
(II) From Patterns to Semantic Relations • PROSPERA – statistical pattern mining from free-text • Domain-oriented extraction of patterns for known relations (POS-enhanced n-grams) X carried out his doctoral research in math under the supervision of Y X { carried out PRPdoctoralresearch[IN NP] [DET]supervision [IN] } Y • Confidence & support based on seeds & counter seeds • Pattern/fact-duality & consistency reasoning 10s to 100s of typed patterns per relation occurs(p,x,y) expresses(p,R) R(x,y) pattern-fact duality occurs(p,x,y) R(x,y) expresses(p,R) Spouse Person Person type constraints capitalOfCountry cityOfCountry inclusion dependencies Spouse(x,y): x y, y x functional dependencies YAGO-QA: Answering Questions by Structured Knowledge Queries
PROSPERA Architecture • Gathering: Enhanced Hearst patterns • POS-enhanced n-grams • Pattern-fact duality & constraints • Analysis: Refined pattern weights • Carefully chosen seeds and counter seeds • Thresholds for pattern confidence & support • Reasoning:Scalable extraction & consistency reasoning • MapReduce functions for pattern extraction & statistics gathering • Distributed MaxSat solver • (MAP Inference) YAGO-QA: Answering Questions by Structured Knowledge Queries
(III) Crowdsourcing for Question Paraphrases YAGO-QA: Answering Questions by Structured Knowledge Queries • Pattern acquisition from the crowd • Annotators paraphrase natural-language seed questions • Seed questions are associated with their semantic arguments and functions • Gold resourcefor pattern acquisition and system evaluation • Preliminary results • 4,620 paraphrases for 254 seed questions with 7 annotators • Total annotation time: ~49 hours, ~1 work-day per annotator
YAGO-QA Architecture • Input analysis • SProUTfor tokenization, stemming & NER (http://sprout.dfki.de/) • NE gazetteerextendedby YAGO entities • Input interpretation • Named-entitydisambiguationbased on YAGO statistics • Vaguematchingagainstthegatheredquestionparaphrases YAGO-QA: Answering Questions by Structured Knowledge Queries
YAGO-QA Architecture (ct’d) • Input interpretation / Answerretrieval • An actorwhoseplaceofbirthis Chicago. • Whichactor was born in Chicago ? • Which<actor>was_born_in<Chicago>? • ?x typeARG1 . ?x bornInARG2 . • Template-basedanswergeneration • Who/whatis/are <?x> ? YAGO-QA: Answering Questions by Structured Knowledge Queries
YAGO-QA Example • Multiple named entity annotations: all names are annotated • Interpretation picks suitable NE readings • Vague matching against surface templates YAGO-QA: Answering Questions by Structured Knowledge Queries
Conclusions & Future Work • QA based on structured knowledge queries (beyond IR-style retrieval of matching sentences/paragraphs) • Wikipedia as rich knowledge backend • Entities, semantic classes & typed relations • Large-scale statistics for entity disambiguation & surface patterns • Crowdsourcing for question paraphrases • Predefined question templates translated into join queries • Future work • “Open-QA” via open-domain information extraction • Dynamic learning of template structures from grammars • More modular template structures YAGO-QA: Answering Questions by Structured Knowledge Queries