280 likes | 409 Views
Open IE to KBP Relations in 3 Hours . Stephen Soderland John Gilmer, Rob Bart, Oren Etzioni, Daniel S. Weld Turing Center University of Washington. Open IE. “Steve Jobs, the co-founder of Apple, died of cancer in his Palo Alto home.” Arg1 Rel Arg2 ( Steve Jobs , died of , cancer).
E N D
Open IE to KBP Relations in 3 Hours Stephen Soderland John Gilmer, Rob Bart, Oren Etzioni, Daniel S. Weld Turing Center University of Washington TAC-KBP Workshop
Open IE “Steve Jobs, the co-founder of Apple, died of cancer in his Palo Alto home.” Arg1RelArg2 (Steve Jobs , died of , cancer) TAC-KBP Workshop
Open IE “Steve Jobs, the co-founder of Apple, died of cancer in his Palo Alto home.” Arg1RelArg2 (Steve Jobs , died of , cancer) (Steve Jobs , died in , his Palo Alto home) TAC-KBP Workshop
Open IE “Steve Jobs, the co-founder of Apple, died of cancer in his Palo Alto home.” Arg1RelArg2 (Steve Jobs , died of , cancer) (Steve Jobs , died in , his Palo Alto home) (Steve Jobs , is co-founder of , Apple) TAC-KBP Workshop
Open IE “Steve Jobs, the co-founder of Apple, died of cancer in his Palo Alto home.” Arg1RelArg2 (Steve Jobs , died of , cancer) (Steve Jobs , died in , his Palo Alto home) (Steve Jobs , is co-founder of , Apple) “Hamas denied responsibility for the attacks , which threaten to derail ongoing peace talks.” Arg1RelArg2 (Hamas , denied responsibility for, the attacks) TAC-KBP Workshop
Open IE “Steve Jobs, the co-founder of Apple, died of cancer in his Palo Alto home.” Arg1RelArg2 (Steve Jobs , died of , cancer) (Steve Jobs , died in , his Palo Alto home) (Steve Jobs , is co-founder of , Apple) “Hamas denied responsibility for the attacks , which threaten to derail ongoing peace talks.” Arg1RelArg2 (Hamas , denied responsibility for, the attacks) (the attacks , threatened to derail, ongoing peace talks) TAC-KBP Workshop
Open IE “Steve Jobs, the co-founder of Apple, died of cancer in his Palo Alto home.” Arg1RelArg2 (Steve Jobs , died of , cancer) (Steve Jobs , died in , his Palo Alto home) (Steve Jobs , is co-founder of , Apple) “Hamas denied responsibility for the attacks , which threaten to derail ongoing peace talks.” Arg1RelArg2 (Hamas , denied responsibility for, the attacks) (the attacks , threatened to derail, ongoing peace talks) “Ribosomes , which are complexes made of ribosomal RNA and protein, are the cellular components that carry out protein synthesis.” Arg1RelArg2 (Ribosomes , are complexes made of , ribosomal RNA and protein) TAC-KBP Workshop
Open IE “Steve Jobs, the co-founder of Apple, died of cancer in his Palo Alto home.” Arg1RelArg2 (Steve Jobs , died of , cancer) (Steve Jobs , died in , his Palo Alto home) (Steve Jobs , is co-founder of , Apple) “Hamas denied responsibility for the attacks , which threaten to derail ongoing peace talks.” Arg1RelArg2 (Hamas , denied responsibility for, the attacks) (the attacks , threatened to derail, ongoing peace talks) “Ribosomes , which are complexes made of ribosomal RNA and protein, are the cellular components that carry out protein synthesis.” Arg1RelArg2 (Ribosomes , are complexes made of , ribosomal RNA and protein) (Ribosomes , are , the cellular components) TAC-KBP Workshop
Open IE “Steve Jobs, the co-founder of Apple, died of cancer in his Palo Alto home.” Arg1RelArg2 (Steve Jobs , died of , cancer) (Steve Jobs , died in , his Palo Alto home) (Steve Jobs , is co-founder of , Apple) “Hamas denied responsibility for the attacks , which threaten to derail ongoing peace talks.” Arg1RelArg2 (Hamas , denied responsibility for, the attacks) (the attacks , threatened to derail, ongoing peace talks) “Ribosomes , which are complexes made of ribosomal RNA and protein, are the cellular components that carry out protein synthesis.” Arg1RelArg2 (Ribosomes , are complexes made of , ribosomal RNA and protein) (Ribosomes , are , the cellular components) (Ribosomes , carry out , protein synthesis) TAC-KBP Workshop
Advantages of Open IE • Robust • Massively scalable • Works out of the box • Finds whatever relations are expressed in the text • Not tied to an ontology of relations • Disadvantages • Finds whatever relations are expressed in the text • Not tied to an ontology of relations • Challenge • Map Open IE to an ontology of relations • Minimum of user effort github/knowitall/openie TAC-KBP Workshop
per:cause_of_death: (Steve Jobs , died of cancer) (Steve Jobs ,died from , cancer) (Steve Jobs ,passed away from , cancer) (Steve Jobs ,succumbed to , cancer) (cancer , killed , Steve Jobs) … (cancer ,claimed the life ofSteve Jobs) (Steve Jobs , lost his battle to , cancer) (Steve Jobs ,was a victim of cancer) (Steve Jobs , could not beat , cancer) (Steve Jobs , could not have prevented , his deathfrom cancer) (Steve Jobs , joins the ranks of cancer fatalities) … Head: high frequency Long tail: low frequency TAC-KBP Workshop
Outline • Rules to map to target relations • Rule language • Semantic taggers • KBP system • Architecture • 3 hour rule set vs. 12 hour rule set • Results and discussion • Future work TAC-KBP Workshop
Desiderata for Target Relation Mapping • Works even if no annotated training • User may have limited skill in NLP and ML • Rules are understandable to user • High precision and good generalization Approach: • Manually created rules based on Open IE tuples • Simple rule language • Rules combine lexical and semantic type constraints • Extensible semantic types based on keyword tagger TAC-KBP Workshop
Rule language (Smith, was appointed, Acting Director of Acme Corporation) entity slotfill Terms in Rule Example Target relation:per:employee_or_member_of Query entity in: Arg1 Slotfill in: Arg2 Slotfill type: Organization Arg1 terms: - Relation terms: appointed Arg2 terms: <JobTitle> of Functional? no TAC-KBP Workshop
Rule language (Smith, was appointed, Acting Director of Acme Corporation) per:employee_or_member_of (Smith, Acme Corporation) Terms in Rule Example Target relation:per:employee_or_member_of Query entity in: Arg1 Slotfill in: Arg2 Slotfill type: Organization Arg1 terms: - Relation terms: appointed Arg2 terms: <JobTitle> of Functional? no TAC-KBP Workshop
Semantic Tagging • General types • Person, Organization, Location, Date • NER tagger • WordNet • User-specified types • Keyword tagger • User creates file of terms for the semantic type • Taggers takes file as input • Used lists from CMU’s NELL for KBP github/knowitall/taggers TAC-KBP Workshop
Semantic Types from CMU’s NELL • 4K Job titles • academic coordinator … zonal underwriting manager • 182 Head job titles • acting chief director … vice-director • 47 Religions • Adventist … Zoroastrianism • 114 Nationalities • Akkadian … Zambian • 5K Cities: Aachen … Zwolle • 536 State-provinces: Ad Dali … Zlitan • 241 Countries: Afghanistan … Zimbabwe TAC-KBP Workshop
Outline • Rules to map to target relations • Rule language • Semantic taggers • KBP system • architecture • 3 hour rule set vs. 12 hour rule set • Co-reference • Results and discussion • Future work TAC-KBP Workshop
KBP Architecture 200M tuples TAC-KBP Workshop
What We Did Not Handle • Entity disambiguation needed for KBP precision • Good extraction for “Paul Gray”, but wrong Paul Gray • Mostly ignored this in our system • Find any tuple that matched entity string • Detect ambiguous entities if linked to multiple KB entries • Discard all results for ambigous entities TAC-KBP Workshop
Creating Rule Sets • 3 Hour Rules set • Avg 3 rules per relation • Light editing of NELL keyword lists per:cause_of_death = “died of”, “died from”, “died as a result of”, “died due to” • 12 Hour Rules set (over two week period) • Avg 16 rules per relation • Refined rules, testing on 2012 KBP answer key • Further editing of NELL keyword lists per:cause_of_death = “die of”, “dies of”, “dying of”, … “succumbed to”, “succumbs to”, … TAC-KBP Workshop
Outline • Rules to map to target relations • Rule language • Semantic taggers • KBP system • architecture • 3 hour rule set vs. 12 hour rule set • Co-reference • Results and discussion • Future work TAC-KBP Workshop
KBP Results 35% recall boost from 12 hours Extractor Precision: per:title(Paul Gray, bassist) per:title(Paul Gray, president) KBP Precision: per:title(Paul Gray, bassist) per:title(Paul Gray, president) TAC-KBP Workshop
Error Analysis • 31% “Looked right to me” “Tantawi was the grand sheik” => per:title(Tantawi, sheik) “ETA's political wing Batasuna” => org:subsidiary(ETA, Batasuna) • 23% Overgeneralized rules “Ginzburg was an outspoken critic” => per:title(Ginzburg, critic) “Meredith led the NFL in scoring” => per:employee_or_member_of(Meredith, NFL) • 19% Rules matched on non-head terms “Kahn’s younger sister married Shankar” => per:spouse(Kahn, Shankar) • 15% Open IE errors • 12% Coref errors TAC-KBP Workshop
Ceiling for Recall from Open IE • 42% Extracts all information for KBP relation • 16% Extractor truncates an argument Omits appositive or parenthetical “Sheikh Tantawi, the top Egyptian cleric who died on Wednesday…” (the top Egyptian cleric , died on, Wednesday) • 10% Extractor misses “relational noun” “Tantawi, the Grand Imam of Al-Azhar” • 10% No extraction of relevant part of sentence Syntactic complexity • 4% Extraction error • 18% Other 68% TAC-KBP Workshop
Future Work • Increase recall of Open IE • Increase precision of rule applier • General method not tied to KBP task • Plug in any ontology of relations • Results not tied to query entity • Release as open-source software TAC-KBP Workshop
Conclusion • Novel approach for KBP Slot Filling • Run Open IE extractor on corpus • Semantic taggers based on user-written keyword lists • User-written rules to map target relations to Open IE • Results • High extraction precision 0.80 • Moderate recall 0.10(comparable to all but top sites) • Low human effort • Requires no NLP or ML experience • Only 3 hours effort gives high precision TAC-KBP Workshop
Thank you github/knowitall/openie github/knowitall/taggers TAC-KBP Workshop