260 likes | 379 Views
Discovery of Inference Rules for Question Answering. Dekang Lin and Patrick Pantel (Univ Alberta, CA) [ Lin now at Google, Pantel now at ISI ] as (mis-)interpreted by Peter Clark (Nov 2007). Preview. This is a most remarkable piece of work! Extent:
E N D
Discovery of Inference Rules for Question Answering Dekang Lin and Patrick Pantel (Univ Alberta, CA) [ Lin now at Google, Pantel now at ISI ] as (mis-)interpreted by Peter Clark (Nov 2007)
Preview • This is a most remarkable piece of work! • Extent: • Hasegawa paraphrase database: 4,600 rules • Sekine paraphrase database: 211,000 rules • DIRT: 12,000,000 rules • Quality: ~50% are sensible (personal opinion) • Content: not just paraphrases but world knowledge
Overview • Problem: There are many ways of saying the same thing “Who is the author of the Star Spangled Banner?” …Francis Scott Key wrote the “Star Spangled Banner” in 1814. …comedian-actress Roseanne Barr sang her famous shrieking rendition of the “Star Spangled Banner” before a San Diego Padres-Cincinnati Reds Game.
Problem… there are lots of paraphrases! Can we learn them automatically?
DIRT Method • Take a binary relation, e.g., X buys Y. • Collect examples of “X buys…”, and “…buys Y”
Method • Take a binary relation, e.g., X buys Y. • Collect examples of “X buys…”, and “…buys Y” - build a “tuple” database The following can buy: things (116) people (37) companies (34) Chicago (33) buyers (18) groups (6) … The following can be bought: shares (145) managers (43) stakes (27) power (25) securities (17) people (13) amounts (12) …
Method • Take a binary relation, e.g., X buys Y. • Collect examples of “X buys…”, and “…buys Y” • Find other relations with a similar distribution The following can buy: things (116) people (37) companies (34) Chicago (33) buyers (18) groups (6) … The following can be bought: shares (145) managers (43) stakes (27) power (25) securities (17) people (13) amounts (12) … The following can purchase: things (97) companies (23) people (21) businesses (13) governments (5) … The following can be purchased: shares (56) companies (23) securities (21) power (20) dogs (14) Macs (10) firms (2) …
Method • Take a binary relation, e.g., X buys Y. • Collect examples of “X buys…”, and “…buys Y” • Find other relations with a similar distribution The following can buy: things (116) people (37) companies (34) Chicago (33) buyers (18) groups (6) … The following can be bought: shares (145) managers (43) stakes (27) power (25) securities (17) people (13) amounts (12) … The following can purchase: things (97) companies (23) people (21) businesses (13) governments (5) … The following can be purchased: shares (56) companies (23) securities (21) power (20) dogs (14) Macs (10) firms (2) … • 4. Collect as rules: “X buys Y” ↔ “X purchases Y”
The Math • Can’t simply compare counts • as some words are more frequent than others The following can buy: things (116) people (37) companies (34) Chicago (33) buyers (18) groups (6) … The following can purchase: things (97) companies (23) people (21) businesses (13) governments (5) … “people” counts not very indicative of similarity, as “people” occurs with many relations
The Math • Mutual information gives better notion of “importance” mi(x,y) = log [ p(x,y) / p(x) p(y) ] P(“company buy”)/p(“company”)p(“buy”) The following can buy: things (116) 0.7 people (37) 0.5 companies (34) 8.4 Chicago (33) 8.1 buyers (18) 4.5 groups (6) 2.3 … Can compute from frequency counts: freq(“company buy *”) x freq(“* * *”) freq(“company * *”) x freq(“* buy *”) ] mi(“company buy”) = log [
The Math • Now use a similarity measure, and sum over all words… Sum [ mi(“X buy”) + mi(“X purchase”) ] Xs that can “buy” AND “purchase” sim(“X buy”,”X purchase”) = Sum mi(“X buy”) + Sum mi(“X purchase”) Xs that can “buy” Xs that can “purchase” The following can buy: things (116) 0.7 people (37) 0.5 companies (34) 8.4 Chicago (33) 8.1 buyers (18) 4.5 groups (6) 2.3 … The following can purchase: things (97) 0.3 companies (23) 8.2 people (21) 0.2 businesses (13) 4.3 governments (5) 7.6 …
The Math • Now use a similarity measure, and sum over all words… Sum [ mi(“X buy”) + mi(“X purchase”) ] Xs that can “buy” AND “purchase” sim(“X buy”,”X purchase”) = Sum mi(“X buy”) + Sum mi(“X purchase”) Xs that can “buy” Xs that can “purchase” (.7+.3)“things” + (8.4+8.2)“companies” + (.5+.2)“people” = (.7+.5+8.4+8.1+4.5+2.3) + (.3+8.2+.2+4.3+7.6) The following can buy: things (116) 0.7 people (37) 0.5 companies (34) 8.4 Chicago (33) 8.1 buyers (18) 4.5 groups (6) 2.3 … = 0.41 The following can purchase: things (97) 0.3 companies (23) 8.2 people (21) 0.2 businesses (13) 4.3 governments (5) 7.6 … (ignoring other words not shown on this page)
The Math • Then combine “X verb *” and “* verb Y” scores: (take geometric average) sim(“X buy Y”,”X purchase Y”) = sim(“X buy”,”X purchase”) sim(“buy Y”,” purchase Y”)
Use for Question-Answering …China provided Iran with decontamination materials… Question: “Iran received decontamination materials?” Answer: Yes! I have general knowledge that: IF X is provided with Y THEN X obtains Y Here: X = Iran, Y = materials Thus, here: We are told in T: Iran is provided with materials Thus it follows that: Iran obtains materials In addition, I know: "obtain" and "receive" mean roughly the same thing Hence: Iran received decontamination materials. DIRT Para- phrase WordNet
;;; T: "William Doyle works for an auction house in Manhattan." ;;; H: "The auction house employs William Doyle?" ;;; DIRT rule "N:for:V<work>V:subj:N -> N:subj:V<hire>V:obj:N" fired Yes! I have general knowledge that: IF Y works for X THEN X hires Y Here: X = the house, Y = Doyle Thus, here: We are told in T: Doyle works for the house Thus it follows that: the house hires Doyle In addition, I know: "hire" and "employ" mean roughly the same thing Hence: The auction house employs William Doyle. SCORE!!!!
;;; T: "The technician cooled the room." ;;; H: "The technician was cooled by the room?" ;;; DIRT rule "N:obj:V<cool>V:subj:N -> N:subj:V<cool>V:obj:N" fired Yes! I have general knowledge that: IF Y cools X THEN X cools Y Here: X = the room, Y = the technician Thus, here: We are told in T: the technician cools the room Thus it follows that: the room cools the technician Hence: The technician was cooled by the room. Rats, WRONG again
Comments: The good… • I simplified to just “X verb Y” patterns • In fact does general “X <syntactic path> Y” • (uses minpar parser to find paths) • Finds world knowledge, not just paraphrases • Also finds junk… • Amazingly large and (relatively) high quality • It helped us (a bit) in our question-answering task • In our (limited) system on 800 test questions • DIRT rules were used 59 times (41 right + 18 wrong)
Critique • Is limited to “X pattern1 Y” → “X pattern2 Y” rules • E.g., out of scope: • “John sold a car” → “John received money” • No word senses • “X turns Y” → “X transforms Y”, “X rolls Y”, “X sells Y” • No constraints on the classes (types) X and Y • “driven to X by Y” → “Y has a child in X”, “Y provides medical care for Y” • “The president was driven to the memorial by the general.” • Lin and Pantel point to this as a possible future direction • ~50% is noisy • Antonyms often found to correlate • “X loves Y” → “X hates Y” • No notion of time, tense, etc • “Fred bought a car” → “Fred owns a car” • “Fred sold a car” → “Fred owns a car”
Summary and Comments • Is this The Answer to AI/Machine Reading? • Highly impressive knowledge collection • Can suggest implicit (unstated) knowledge in text • but: Learns only one inference pattern • (can imagine extending to others) • Noisy • (can imagine using more data for more reliability) • More significantly…. • Rules only tell us atomic things which might happen…
Summary and Comments (cont) • Rules only tell us atomic things which might happen • Can reduce the noise, but in the end the rules are only possibilities “Illinois Senator Barack Obamavisited the state's most diverse high school to deliver a comprehensive K-12 education plan…” “Obama arrived in the high school” “Obama toured the high school” ? “Obama flew to the high school” ? “Obama told a reporter in the high school” !! “Obama inaugurated the high school” ?! “Obama headed a delegation to the high school” !! “Obama was arrested in the high school” ?! “Obama thanked the high school’s government” “Obama stopped in the high school” !! “Obama held a summit in the high school” !! “Obama was evacuated from the high school” “Obama was accompanied to the high school” …
Summary and Comments (cont) • What is still needed: • Requires notions of: • Coherence; • macro-sized structures of how things combine (eg “scripts”); • reasoning about the implications; • knowledge integration; • search; • plausibility assessment and uncertainty reasoning constructing a “most coherent representation” (model) from the pieces which the knowledge base suggests.
Summary and Comments (cont) The bottom line: • Rule learning from text: • A key bit of the puzzle of AI, overcomes part of the knowledge acquisition bottleneck • Provides much-needed bottom-up input to language understanding • BUT: only part of the solution: • provides low-level “knowledge fodder” but • not the mechanism for processing it into coherent and useful representations • Not the larger-level structures (scripts, typical scenarios) also required for understanding and “organizing the fodder”