1.26k likes | 1.45k Views
A dvanced QU estion A nswering for INT elligence. Grammatical processing with LFG and XLE. Ron Kaplan ARDA Symposium, August 2004. Match. M. M. Layered Architecture for Question Answering. XLE/LFG Parsing. Target KRR. Text. KR Mapping. F-structure.
E N D
Advanced QUestion Answering for INTelligence Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004
Match M M Layered Architecture for Question Answering XLE/LFG Parsing Target KRR Text KR Mapping F-structure Conceptual semantics KR Sources Assertions Question Query Answers Explanations Subqueries Composed F-StructureTemplates Text to user XLE/LFG Generation Text
Layered Architecture for Question Answering XLE/LFG Parsing Target KRR Text KR Mapping F-structure Conceptual semantics KR Sources Assertions Question Query Answers Explanations Subqueries Composed F-StructureTemplates Text to user XLE/LFG Generation Text
Infrastructure XLE MaxEnt models Linear deduction Term rewriting Theories Lexical Functional Grammar Ambiguity management Glue Semantics Resources English Grammar Glue lexicon KR mapping Layered Architecture for Question Answering XLE/LFG Parsing Target KRR Text KR Mapping F-structure Conceptual semantics KR Sources Assertions Question Query
shallow but wrong delegation furthest away but Subject of flew deep and right “grammatical function” Deep analysis matters…if you care about the answer Example: A delegation led by Vice President Philips, head of the chemical division, flew to Chicago a week after the incident. Question: Who flew to Chicago? Candidate answers: division closest noun head next closest V.P. Philips next
PRED easy(SUBJ, COMP) SUBJ John PRED please(SUBJ, OBJ) SUBJ someone OBJ John COMP PRED eager(SUBJ, COMP) SUBJ John PRED please(SUBJ, OBJ) SUBJ John OBJ someone COMP F-structure: localizes arguments Was John pleased? “John was easy to please” Yes “John was eager to please” Unknown “lexical dependency”
Topics • Basic LFG architecture • Ambiguity management in XLE • Pargram project: Large scale grammars • Robustness • Stochastic disambiguation • [Shallow markup] • [Semantic interpretation] Focus on the language end, not knowledge
“Tony decided to go.” Knowledge The Language Mapping: LFG & XLE StochasticModel NamedEntities LFGGrammar English, German, etc. Parse Functional structures TokensMorphology Sentence Generate XLE XLE: Efficient ambiguity management
Why deep analysis is difficult • Languages are hard to describe • Meaning depends on complex properties of words and sequences • Different languages rely on different properties • Errors and disfluencies • Languages are hard to compute • Expensive to recognize complex patterns • Sentences are ambiguous • Ambiguities multiply: explosion in time and space
S NP V’ NP Det Adj N V Det Aux N the small children are chasing the dog S NP NP V N P Adj N P inudog oObj tiisaismall kodomotatichildren gaSbj oikaketeiruare chasing Different patterns code same meaning The small children are chasing the dog. English Group, order Japanese Group, mark
S NP V’ NP Det Adj N V Det Aux N the small children are chasing the dog Pred ‘chase<Subj, Obj>’ S Tense Present NP NP V PredMod childrensmall Subj N P Adj N P inudog oObj tiisaismall kodomotatichildren gaSbj oikaketeiruare chasing Pred dog Obj S NP Aux NP V NP Warlpiri Mark only N A N kurdujarrarluchildren-Sbj kapalaPresent malikidog-Obj wajilipinyichase witajarrarlusmall-Sbj Different patterns code same meaning The small children are chasing the dog. LFG theory: minor adjustments on universal theme English Group, order Japanese Group, mark chase(small(children), dog)
PRED ‘John’ NUM SG SUBJ PRED ‘like<SUBJ,OBJ>’ TENSE PRESENT PRED ‘Mary’ NUM SG OBJ Modularity Nearly-decomposable LFG architecture related by a piecewise correspondence C(onstituent)-structures and F(unctional)-structures S NP VP John V NP likes Mary Formal encoding of order and grouping Formal encoding of grammatical relations
LFG grammar Rules Lexical entries • Context-free rules define valid c-structures (trees). • Annotations on rules give constraints that corresponding f-structures must satisfy. • Satisfiability of constraints determines grammaticality. • F-structure is solution for constraints (if satisfied). S NP VP ( SUBJ)= = N John ( PRED)=‘John’ ( NUM)=SG V likes ( PRED)=‘like<SUBJ, OBJ>’ ( SUBJ NUM)=SG (↑ SUBJ PERS)=3 VP V (NP) = ( OBJ)= NP (Det) N = =
S NP( SUBJ)= VP= Rules as well-formedness conditions S SUBJ [ ] NP VP If * denotes a particular daughter: : f-structure of mother (M(*)) : f-structure of daughter(*) A tree containing S over NP - VP is OK if F-unit corresponding to NP node is SUBJ of f-unit corresponding to S node The same f-unit corresponds to both S and VP nodes.
f NP( SUBJ)= VP= f S s s v f v they( NUM)=PL walks( SUBJ NUM)=SG s v Let f be the (unknown) f-structure of the S Then (substituting equals for equals): s be the f-structure of the NP (fSUBJ) = s and (s NUM)=PL => (f SUBJ NUM)=PL v be the f-structure of the VP (f SUBJ NUM)=PL and (f SUBJ NUM)=SG => SG=PL =>FALSE Inconsistent equations = Ungrammatical S What’s wrong with “They walks” ? NP VP they walks f= v and (v SUBJ NUM)=SG => (f SUBJ NUM)=SG If a valid inference chain yields FALSE, the premises are unsatisfiable, no f-structure.
English: One NP before verb, one after: Subject and Object S NP( SUBJ)= V= NP( OBJ)= Japanese: Any number of NP’s before Verb Particle on each defines its grammatical function ga: ( GF)=SUBJ NP*( ( GF))= V= o: ( GF)=OBJ S English and Japanese
S … NP*… ( ( GF))= rlu: ( GF)=SUBJ ki: ( GF)=OBJ Unlike Japanese, head Noun is optional in NP A*( MOD) NP N= PRED ‘chase<Subj, Obj>’ S TENSE Present PREDMOD childrensmall SUBJ NP Aux NP V NP PRED dog OBJ N A N kurdujarrarluchildren-Sbj kapalaPresent malikidog-Obj wajilipinyichase witajarrarlusmall-Sbj Warlpiri: Discontinuous constituents Like Japanese: Any number of NP’s Particle on each defines its grammatical function
S’ NP S Q Who TENSE past PRED think<SUBJ, COMP> COMP Who Aux NP V S did Bill think NP V PRED see<SUBJ,OBJ> TENSE past SUBJ Mary OBJ Mary saw English: Discontinuity in questions Who did Mary see? Who did Bill think Mary saw? Who did Bill think saw Mary? OBJ COMP OBJ COMP SUBJ Who is understood as subject/object of distant verb.Uncertainty: which function of which verb? S’ → NP S (↑ Q)=↓ ↑=↓ (↑ COMP* SUBJ|OBJ)=↓
Summary: Lexical Functional Grammar Kaplan and Bresnan, 1982 • Modular: c-structure/f-structure in correspondence • Mathematically simple, computationally transparent • Combination of Context-free grammar, Quantifier-free equality theory • Closed under composition with regular relations: finite-state morphology • Grammatical functions are universal primitives • Subject and Object expressed differently in different languages English: Subject is first NP Japanese: Subject has ga • But: Subject and Object behave similarly in all languages Active to Passive: Object becomes Subject English: move words Japanese: move ga • Adopted by world-wide community of linguists • Large literature: papers, (text)books, conferences; reference theory • (Relatively) easy to describe all languages • Linguists contribute to practical computation • Stable: Only minor changes in 25 years
Efficient computation with LFG grammars: Ambiguity Management in XLE
Tokenization Morphology Syntax Semantics Knowledge • The sheet broke the beam. Atoms or photons? • Every proposer wants an award. The same award or each their own? • The duck is ready to eat.Cooked or hungry? • walks Noun or Verb? • untieable knot (untie)able or un(tieable)? • bankriver or financial? • I like Jan. |Jan|.| or |Jan.|.| (sentence end or abbreviation) Computation challenge: Pervasive ambiguity
Coverage vs. Ambiguity I fell in the park. + I know the girl in the park. I see the girl in the park.
Ambiguity can be explosive If alternatives multiply within or across components… Tokenize Morphology Syntax Semantics Knowledge
Computational consequences of ambiguity • Serious problem for computational systems • Broad coverage, hand written grammars frequently produce thousands of analyses, sometimes millions • Machine learned grammars easily produce hundreds of thousands of analyses if allowed to parse to completion • Three approaches to ambiguity management: • Prune: block unlikely analysis paths early • Procrastinate: do not expand alternative analysis paths until something else requires them • Also known as underspecification • Manage: compact representation and computation of all possible analyses
Oops: Strong constraints may reject the so-far-best (= only) option Statistics X Pruning ⇒ Premature Disambiguation • Conventional approach: Use heuristics to kill as soon as possible X X X Tokenize Morphology Syntax Semantics Knowledge X Fast computation, wrong result
Procrastination: Passing the Buck • Chunk parsing as an example: • Collect noun groups, verb groups, PP groups • Leave it to later processing to put these together • Some combinations are nonsense • Later processing must either: • Call (another) parser to check constraints • Have its own model of constraints (= grammar) • Solve constraints that chunker includes with output
Computational Complexity of LFG • LFG is simple combination of two simple theories • Context-free grammars for trees • Quantifier free theory of equality for f-structures • Both theories are easy to compute • Cubic CFG Parsing • Linear equation solving • Combination is difficult: Parsing problem is NP Complete • Exponential/intractible in the worst case(but computable, unlike some other linguistic theories • Can we avoid the worst case?
Some syntactic dependencies • Local dependencies: These dogs *This dogs(agreement) • Nested dependencies: The dogs[in the park] bark (agreement) • Cross-serial dependencies:Jan PietMarie zag helpenzwemmen(predicate/argument map) See(Jan, help(Piet, swim(Marie))) • Long distance dependencies: The girl who John says that Bob believes … likes Henry left. Left(girl) Says(John, believes(Bob, (…likes(girl, Henry))))
Intractable! Expressiveness vs. complexity The Chomsky Hierarchy n is length of sentence Linear Cubic Exponential But languages have mostly local and nested dependencies... so (mostly) cubic performance should be possible.
NP Complete Problems • Problems that can be solved by a Nondeterministic Turing Machine in Polynomial time • General characterization: Generate and test • Lots of candidate solutions that need to be verified for correctness • Every candidate is easy to confirm or disconfirm n elements Nondeterministic TM has an oracle that provides only the right candidates to test, doesn’t search. Deterministic TM doesn’t have oracle, must test all (exponentially many) candidates. 2n candidates
Polynomial search problems • Subparts of a candidate are independent of other parts: outcome is not influenced by other parts (context free) • The same independent subparts appear in many candidates • We can (easily) determine that this is the case • Consequence: test subparts independent of context, share results
Why is LFG parsing NP Complete? Classic generate-and-test search problem • Exponentially many tree-candidates • CFG chart parser quickly produces packed representation of all trees • CFG can be exponentiallyambiguous • Each tree must be tested for f-structure satisfiability • Boolean combinations of per-tree constraints English base verbs: Not 3rd singular ( SUBJ NUM)SG ( SUBJ PERS)3 Disjunction! Exponentially many exponential problems
Options multiplied out The sheep-sg saw the fish-sg. The sheep-pl saw the fish-sg. The sheep-sg saw the fish-pl. The sheep-pl saw the fish-pl. In principle, a verb might require agreement of Subject and Object: Have to check it out. Options packed But English doesn’t do that: Subparts are independent sgpl sgpl The sheep saw the fish XLE Ambiguity Management: The intuition How many sheep? How many fish? The sheep saw the fish. Packed representation is a “free choice” system • Encodes all dependencies without loss of information • Common items represented, computed once • Key to practical efficiency
nomacc nomacc Das Mädchen sah die Katze The girl saw the cat bad The girl saw the cat The cat saw the girl bad Das Mädchen-nom sah die Katze-nom Das Mädchen-nom sah die Katze-acc Das Mädchen-acc sah die Katze-nom Das Mädchen-acc sah die Katze-acc Dependent choices … but it’s wrong Doesn’t encode all dependencies, choices are not free. Again, packing avoids duplication Who do you want to succeed? I want to succeed John want intrans, succeed trans I want John to succeed want trans, succeed intrans
bad The girl saw the cat The cat saw the girl bad (pq) (pq) = Das Mädchen-nom sah die Katze-nom Das Mädchen-nom sah die Katze-acc Das Mädchen-acc sah die Katze-nom Das Mädchen-acc sah die Katze-acc p:nomp:acc q:nomq:acc Das Mädchen sah die Katze Solution: Label dependent choices • Label each choice with distinct Boolean variables p, q, etc. • Record acceptable combinations as a Boolean expression • Each analysis corresponds to a satisfying truth-value assignment • (free choice from the true lines of ’s truth table)
(a x c) (a x d) (b x c) (b x d) (a b) x (c d) Boolean Satisfiability Can solve Boolean formulas by multiplying out: Disjunctive Normal Form • Produces simple conjunctions of literal propositions (“facts”--equations) • Easy checks for satisfiability If ad FALSE, replace any conjunction with a and d by FALSE. • Blow-up of disjunctive structure before fact processing • Individual facts are replicated (and re-processed): Exponential
Alternative: “Contexted” normal form (a b) x (c d) pa pb x qcqd Produce a flat conjunction of contexted facts context fact
p p q q x p a p b x q c q d a b c d Alternative: “Contexted” normal form (a b) x (c d) pa pb x qcqd • Each fact is labeled with its position in the disjunctive structure • Boolean hierarchy discarded Produce a flat conjunction of contexted facts No blow-up, no duplicates • Each fact appears and can be processed once • Claims: • Checks for satisfiability still easy • Facts can be processed first, disjunctions deferred
Ambiguity-enabled inference (by trivial logic): If is a rule of inference, then so is [C1 ] [C2 ] [(C1C2) ] Valid for any theory E.g. Substition of equals for equals: x=y x/yis a rule of inferenceTherefore: (C1 x=y) (C2 ) (C1 C2 x/y) A sound and complete method Maxwell & Kaplan, 1987, 1991 Conversion to logically equivalent contexted form Lemma: iff p p (p a new Boolean variable) Proof: (If) If is true, let p be true, in which case p p is true. (Only if) If p is true, then is true, in which case is true.
Test for satisfiability Suppose R FALSE is deduced from a contexted formula . Then is satisfiable only if R. E.g. R SG=PL⇒R → FALSE. R is called a “nogood” context. • Perform all fact-inferences, conjoining contexts • If infer FALSE, add context to nogoods • Solve conjunction of nogoods • Boolean satisfiability: exponential in nogood context-Booleans • Independent facts: no FALSE, no nogoods • Implicitly notices independence/context-freeness
Example 1 “They walk” • No disjunction, all facts are in the default “True” context • No change to inference T(f SUBJ NUM)=SG T(f SUBJ NUM)=SG T SG=SG reduces to: (f SUBJ NUM)=SG (f SUBJ NUM)=SG SG=SG “They walks” • No disjunction, all facts still in the default “True” context • No change to inference: T(f SUBJ NUM)=PL T(f SUBJ NUM)=SG TPL=SG T→FALSE Satisfiable iff ¬T, so unsatisfiable
Examples 2 “The sheep walks” • Disjunction of NUM feature from sheep (f SUBJ NUM)=SG (f SUBJ NUM)=PL • Contexted facts: p(f SUBJ NUM)=SG p(f SUBJ NUM)=PL (f SUBJ NUM)=SG (from walks) • Inferences: p(f SUBJ NUM)=SG (f SUBJ NUM)=SG p SG=SG p(f SUBJ NUM)=PL (f SUBJ NUM)=SG p PL=SG p FALSE p FALSE is true iff p is false iff p is True. Conclusion: Sentence is grammatical in context p: Only 1 sheep
p SGp PL q SGq PL p SGp PL q SGq PL SUBJ OBJ NUM NUM NUM NUM SUBJ OBJ Contexts and packing: Index by facts The sheep saw the fish. Contexted unification concatenation, when choices don’t interact.
SUBJ[NUMSG]OBJ [ NUM PL] SUBJ[NUMPL]OBJ [ NUM SG] SUBJ[NUMPL]OBJ [ NUM PL] Compare: DNF unification The sheep saw the fish. SUBJ[NUMSG]OBJ [ NUM SG] [ SUBJ [NUM SG]][SUBJ [NUM PL ]] [ OBJ [NUM SG]][ OBJ [NUM PL ]] DNF cross-product of alternatives: Exponential
The XLE wager(for real sentences of real languages) • Alternatives from distant choice-sets can be freely chosen without affecting satisfiability • FALSE is unlikely to appear • Contexted method optimizes for independence • No FALSE no nogoods nothing to solve. Bet: Worst case2n reduces to k2m where m<< n
Ambiguity-enabled inference: Choice-logic common to all modules If is a rule of inference,then so is C1 C2 (C1C2) 1. Substitution of equals for equals (e.g. for LFG syntax) x=y x/y Therefore: C1x=y C2 (C1C2) x/y 2. Reasoning Cause(x,y) Prevent(y,z) Prevent(x,z) Therefore: C1Cause(x,y) C2Prevent(y,z) (C1C2)Prevent(x,z) 3. Log-linear disambiguation Prop1(x) Prop2(x) Count(Featuren)Therefore: C1 Prop1(x) C2 Prop2(x) (C1C2) Count(Featuren) Ambiguity-enabled components propagate choices, can defer choosing, enumerating
Summary: Contexted constraint satisfaction • Packed • facts not duplicated • facts not hidden in Boolean structure • Efficient • deductions not duplicated • fast fact processing (e.g. equality) can prune slow disjunctive processing • optimized for independence • General and simple • applies to any deductive system, uniform across modules • not limited to special-case disjunctions • mathematically trivial • Compositional free-choice system • enumeration of (exponentially many?) valid solutions deferred across module boundaries • enables backtrack-free, linear-time, on-demand enumeration • enables packed refinement by cross-module constraints: new nogoods
The remaining exponential • Contexted constraint satisfaction (typically) avoids the Boolean explosion in solving f-structure constraints for single trees • How can we suppress tree enumeration? (and still determine satisfiability)
Ordering strategy: Easy things first • Do all c-structure before any f-structure processing • Chart is a free choice representation, guarantees valid trees • Only produce/solve f-structure constraints for constituents in complete, well-formed trees [NB: Interleaved, bottom-up pruning is a bad idea] Bets on inconsistency, not independence
Asking the right question • How can we make it faster? • More efficient unifier: undoable operations, better indexing, clever data structures, compiling. • Reordering for more effective pruning. • Why not cubic? • Intuitively, the problem isn’t that hard. • GPSG: Natural language is nearly context free. • Surely for context-free equivalent grammars!