140 likes | 247 Views
Answering Questions through Understanding and Analysis (AQUA). Ralph Weischedel and Scott Miller BBN Technologies 3 December 2001. Analysts. Linguistic Processor Entity Determination Proposition Recognition Proposition Matching. Question Interpretation. User Preferences.
E N D
Answering Questions through Understanding and Analysis (AQUA) Ralph Weischedel and Scott Miller BBN Technologies 3 December 2001 1
Analysts • Linguistic Processor • Entity Determination • Proposition Recognition • Proposition Matching Question Interpretation User Preferences Analyzed Archive Catalog of Query Types Tables Answer Determination Analyzed Query History Maps J J F F M M A A M M J J J J A A Time Line Mary Smith is fluent in Mythian and Legendian Text Answer Formulation Foreign Language Payoff Treebanks and Proposition Banks under development at UPenn Vision / Goal Toward a Total System Initial Focus: English text 2
Approach • Implement all core algorithms as • Statistical language modeling, augmented by • Lexical knowledge (e.g., COMLEX, WordNet, gazetteers) • Interpret all source documents and questions as • Entities (which are mentioned in various documents) and • Propositions (which are relations among entities) • Given a proposition over entities, find related propositions • Formulate answer from retrieved entities and propositions • Support user definition of terms via examples 3
Name finding via hidden Markov models S Parsing via lexicalized probabilistic CFGs VP Co-reference via Bayesian classifier NP PP NP Person: Slobodan Milosevic Position: president Organization: Yugoslavia NPA Person: Milos Milosevic Position: president Organization: Association of Yugoslav Banks Person: Milos Milosevic Position: general director Organization: JugoBanka S SBAR VP VP NPA WHNP NP PP PP PP NPA NPA NPA NPA NPA GPE Person ORG Person ORG , , is of of of its on by the the the also who Milos Banks headed general director received Yugoslav Yugoslav Slobodan Milosevic president President Milosevic Thursday JugoBanka Association representatives Existing Elements for Document Understanding 4
Person Geo-Political Entity • Al-Qaysi • Baghdad • Berger • Iraq Organization • US • AP • Washington • UN Security Council In-Document Entity Tracking from ACE BAGHDAD, Iraq (AP) _ Iraq's deputy foreign minister attacked U.S. National Security Advisor Sandy Berger Friday, accusing him of ``lies and deception.'' Riyadh al-Qaysi picked his way through Berger's press conference in Washington hours earlier, criticizing the security advisor's assertion that Iraq had been repeatedly in ``material breach'' of U.N. Security Council resolutions. 5
To be Investigated • Recognition of all relations (the propositions) in the text • Identification of entities across documents • Recognition of related propositions • Efficient inference (from evolving state of the art) 6
To be added to UPenn Treebank: Pronoun co-reference completed by BBN Predicate-argument structure underway at UPenn Act: reach tense: present perfect Log-Subj: Log-Obj: Predicate and arguments Argument Co-reference Act: agree tense: noun Log-Subj: Log-Obj: S Act: release tense: noun Log-Subj: Log-Obj: VP NP VP NP NP NP PP NP NP PP NP for an of and spy the The the have States United China crew plane release reached agreement American From Trees to Propositions • To be developed: • A proposition recognition algorithm (both predicate-argument and co-reference) 7
Highly abridged new document <NAME type=PERSON; aliasid=“1”>Bush</NAME> announced that < NAME type=PERSON; aliasid=“2”>Tom Ridge</ NAME > will head a new <NAME type=ORGANIZATION; aliasid=“3”> Department of Homeland Defense </NAME>. … Highly abridged view of two entities in the entity database DB Entity 100347: Names:George Bush, Bush, … Descriptions:President (1989), President (1990), President (1991), President (1992), … Relation pointers: DB Entity 110300: Names:George W. Bush, Bush, Descriptions: President (2001), Governor (2000), the former governor of Texas (2001), … Relation pointers: Cross-Document Entity Tracking • Given the mentions of an entity within a document • Connect those mentions to a known data base entity or • Create a new data base entity 8
Entities E1 Type: person Name: Sandy Berger Relations Attack: E2 Type: person Name: Riyadh al-Qaysi a1: a2: date: E3 Type: GPE nation Name: Iraq Accuse: a1: a2: a3: E4 Type: GPE nation Name: United States date: lie deceive a1: a1: DATE Day: Friday Date: 12/27/91 National Security Advisor a1: a2: Deputy Foreign Minister a1: a2: Entity-Relation Example Iraq’s deputy foreign minister attacked U.S. National Security Advisor Sandy Berger Friday, accusing him of “lies and deception.” 9
Example-driven Definition (EDD) Proposition Matching (PM) Cross-document Entity Detection & Tracking (CEDT) Proposition Recognition (PR) Component Dependency • Foundation of interpretation • Cross-document entity detection and tracking • Proposition recognition • Proposition matching estimates the probability that one proposition may be expressed as another • Example-driven definition uses proposition matching as a basis for finding related examples given user-supplied ones 10
Proposition Matching [China said Wednesday] it would free the crew of a US spy plane… China crew free plane Nationality: US The US and China have reached an agreement for the release of the American spy plane crew… US reached China agreement plane Nationality: US release crew 11
Seed Trainer Examples Automatically Automatically Annotated Annotated Generalize Patterns Generalize Patterns Model Examples Examples & Find Examples & Find Examples Answers Extractor Language Input Example-Driven Definition (today) • Preliminary rapid training strategy successfully demonstrated for description classification • Job titles • Locations • Nuclear materials 12
User Review Propositions (machine-internal) Person-1 leads country led-3 Agent: Nawaz Sharif Object: Pakistan User does not see this! Internal to machine only YES Person-2 ousts Person-1 on date ousted-1 Agent: Pervez Musharraf Object: Nawaz Sharif Time: October 12 YES Military group surrounds facility-1 surrounded-1 Agent: Troops Object: home of Sharif SOMETIMES Military group surrounds facility-2 SOMETIMES seized-4 Agent: Troops Object: state television Example-Driven Definition (proposed) Define: a coup Example:Nawaz Sharif, who led Pakistan, was ousted October 12 by Pervez Musharraf, Pakistani Army General.Troops loyal to Musharraf surrounded the home of Sharif, and seized the state television network. 13
Conclusion • Goal: Develop comprehensive system • Tested on English text • Portable to other languages • Key to approach • Use statistical language models, augmented by (lexical) knowledge sources and simple formal reasoning • Represent questions and text as (cross-document) entities and relations (propositions) • Find answers (and related questions) via proposition matching • Answers will be drawn from across documents and sources 14