250 likes | 380 Views
Advanced Techniques for Answer Extraction and Formulation. Language Computer Corporation www.languagecomputer.com Dallas, Texas PI: Dan Moldovan moldovan@languagecomputer.com. Tasks. Task 1. QA System Taxonomy Task 2. Answer fusion
E N D
Advanced Techniques for Answer Extraction and Formulation Language Computer Corporation www.languagecomputer.com Dallas, Texas PI: Dan Moldovan moldovan@languagecomputer.com
Tasks • Task 1. QA System Taxonomy • Task 2. Answer fusion • Task 3. Develop methods for on-line ontology construction • Task 4. Develop an inference engine capable of providing answer justification • Task 5. Formulate concise and coherent answers • Task 6. Explore new QA System Architectures
Performance Analysis Serial System Architecture Question M1: Keyword pre-processing (split/bind/spell) M2: Construction of question representation M3: Derivation of expected answer type M4: Keyword selection M5: Keyword expansion M6: Actual retrieval of documents and passages M7: Passage post-filtering M8: Identification of candidate answers M9: Answer ranking M10: Answer formulation Answer
Performance Analysis Distribution of Errors
Performance Analysis Impact of System Parameters Nd – maximum number of documents retrieved Np – maximum number of passages processed
Performance Analysis Impact of System Parameters 265 Precision (MRR) Time(sec) 0.421 0.411 Time Precision 110 0.4 0.401 59 0.387 43 32 Sp – site of retrieved passage Sp +-3 +-6 +-10 +-20 +-40 nr. extra lines
Performance Analysis Architecture with Feedbacks Question M1+M2+M3+M4 M5 + lexico-sem alternations M6 M7+M8 Logic Proving M9+M10 Loop 1 Loop 2 Loop 3 Answer
Performance Analysis Impact of System Parameters
On-line Ontology Construction • Discover Concepts • Step 1: Pick a set of related seed concepts • Step 2: Form a corpus of N sentences that contain at least one of the seeds • Step 3: Parse the sentences in the corpus and extract the NP that contain the seeds • Step 4: Apply filtering procedures that accept or reject new concepts • Step 5: Form an ontology: classify new concepts using subsumption
On-line Ontology Construction Discover Semantic Relations • Step 1: Select the semantic relation R • Step 2: Pick pairs of concepts among which R holds • Step 3: Form a corpus such that each sentence contains one pair of concepts • Step 4: Extract lexico-syntactic patterns between concepts CiPCj • Step 5: Apply semantic constraints determined a priory and decide whether or not the pattern CiPCj is a semantic relation R
“is a” terrorist group Extracting Concepts Methods: 1. From NP that contain the seed. Many of his fellow writer friends have been assassinated by islamist fundamentalist terrorist groups during the same years , in the nineties . All the suicide terrorist groups have support infrastructures in Europe and in North America .
“is a” terrorist group Extracting Concepts (cont.) 2. From lexico-syntactic patterns containing the seed. 2.1 Via subsumption Some domestic U.S. terrorist groups , including the AryanNation and the Phineas Priesthood , and some militia members are also religiously motivated in addition to being driven by a hatred of the federal government . Terrorist groups including bin Laden 's , Hamas , Hizbollah , etc. in concert with Sudan , Iran and Iraq , form alliance , to be called " Jerusalem Foundation " , to coordinate global activities . Religiously motivated terrorist groups , such as Usamabin Ladin 's group, al - Qaida , which is believed to have bombed the U.S. Embassies in Africa , represent a growing trend toward hatred of the United States .
“is a” terrorist group Extracting Concepts (cont.) 2.2 Via lexical parallelism During the same period , Erbakan and Refah leaders pledged their support for Hamas and other fundamentalist terrorist groups seeking to halt the Middle East peace process and to overthrow Egypt 's secular government.
Ontology Snapshot terrorist group fundamentalist terrorist group Islamic terrorist group American terrorist group Hamas Hizbollah islamist fundamentalist terrorist group national Islamic terrorist group Palestinian Islamic terrorist group Number of concepts automatically identified: 107 Number of concepts rejected interactively: 25 Number of concepts collected and classified: 107 - 25 = 82
Overall ResultsBuilding a Corpus from the Web • Total time • Number hits returned by search engine (3) Number of sentences retained (4) Number of base NPs containing the seed identified in documents (including duplicates) (5) Number of collected concepts
Semantic ConstraintsImposed by Causation Greenspan makes a recession Greenspan makes a mistake
Semantic ConstraintsImposed by Causation • Focus on < NP1 verb NP2 > NP1 A hyponym of causal agent verb Senses of verbs that mean causation NP2 A hyponym of a causation class - Human action - Phenomenon - State • Psychological feature - Event
causal agent make v#5: state (cause to, do, make) Greenspan makes a recession causal agent make v#1: state (make, do) Greenspan makes a mistake Semantic ConstraintsImposed by Causation
Answer Fusion • Study answer fusion at various levels of complexity • Questions asking simple facts • What countries import sugar from Cuba? • Questions that require on-line ontology development • What software products does Microsoft sell? • What causes asthma? • What are the effects of alcohol on the brain? • Speculative questions about future events • Where will Al Qaeda strike next?
overwork virus fat overindulgence obesity TV watching environmental factors exhaustion chronic fatigue syndrome alcohol alcohol dehydration laxative abuse bacteria atherosclerosis caffeine food poisoning viruses alcohol Salmonella anger high salt intake smoking hypertension high blood pressure Answer Fusion • Answers are extracted by building an ontology on-line • Cause/effect ontology • Q: What causes hypertension?
hair loss absenteeism gastrointestinal treat disorders illness nerve damage headache physical problems hyperactive behavior reading inability drug abuse, substance abuse money spending depression homelessness suicide attempt weight loss fatigue reduced resistance to disease stress, tension Answer Fusion • Cause/effect ontology • Q: What are the effects of stress?
Answer Fusion • Part-whole meronomy ontology • < NP1 have NP2 > car has clutch • < NP2’s NP1 > John’s hand • < NP1 of NP2 > leg of a table • Q: What does the AH-64A Apache helicopter consist of? Hellfire air-to-surface missile millimeter wave seeker 70mm Folding Fin Aerial rocket 30mm Cannon camera Armaments General Electric 1700-GE engine 4-rail launchers Four-bladed main rotor Anti-tank laser guided missile Longbow millimeter wave fire control radar integrated radar frequency Rotating turret interferometer Tandem cockpit Kevlar seats AH-64A Apache helicopter
Answer Fusion • Questions with multiple ontologies • Q: What terrorist groups are in Asia? • Build an ontology for terrorist groups • Build an ontology for Asian countries • Generate specific queries with combinations between two ontologies terrorist groups Asian countries
Thank you! Papers: Moldovan, Pasca, Surdeanu, Harabagiu, “Performance Issues and Error Analysis in an Open-Domain QA System”, ACL 2002. Girju, Moldovan, “Mining Answers for Causation Questions”, AAAI Spring Symposium 2002. Moldovan, Novischi, “Lexical Chains for Question Answering”, COLING 2002.