Data-Oriented Natural Language Processing using Lexical-Functional Grammar

Data-Oriented Natural Language Processing using Lexical-Functional Grammar Mary Hearne School of Computing, Dublin City University NCLT Seminar Series – November 2005

Data-Oriented Natural Language Processing using Lexical-Functional Grammar • Data-Oriented Parsing (DOP): A review • Parsing with Lexical-Functional Grammar: LFG-DOP • LFG-based models: what are the challenges? NCLT Seminar Series – November 2005

S NP VP S  NP VP (1) john V NP VP  V NP (1) loves mary V  loves (1) NP  john (1/2) NP  mary (1/2) S  NP^S VP^S (1) VP^S V^VP NP^VP (1) Parent-annotated PCFG V^VP loves (1) S NP^S john (1) NP^VP mary (1) NP^S VP^S john V^VP NP^VP S,loves S,loves NP VP,loves (1) loves mary Head-lexicalised PCFG VP,loves V,loves NP (1) NP VP,loves V,loves loves (1) john V,loves NP NP  john (1/2) loves mary NP  mary (1/2) Experience-Based Parsing: PCFGs Basic strategy: extractgrammar rulesand theirrelative frequenciesfrom atreebank ‘Vanilla’ PCFG NCLT Seminar Series – November 2005

S VP NP V NP NP VP V NP john loves mary john V NP loves mary S loves mary VP VP VP NP VP V NP V NP V NP S S S john V NP mary loves NP VP NP VP NP VP loves mary V NP john V NP john V NP loves mary mary loves S S S NP VP NP VP NP VP V NP V NP john V NP mary loves S S S NP VP NP VP NP VP john V NP Experience-Based Parsing: DOP Basic strategy: extractgrammar rulesand theirrelative frequenciesfrom atreebank 1/10 1/10 1/10 1/10 1/10 1/10 1/10 1/10 1/10 1/10 1/2 1/2 1 1/4 1/4 1/4 1/4 NCLT Seminar Series – November 2005

VP V NP PP keep D N P NP an eye D N on the LEDs PP P NPadj from A N PP last page P NP NUMBER N to first page Non-local dependencies “from last N to first N” “keep an eye on NP” NCLT Seminar Series – November 2005

Root operation: Frontier operation: S NP • Select any non-frontier non-terminal node to be root • Delete all except this new root and the subtree it dominates • Select a (possibly empty) set of non-root non-terminal nodes in the newly-created subtree • Delete all subtrees dominated by these nodes NP VP john john V NP S loves mary VP NP VP V NP john V NP loves mary root(fw)=VP frontiers(fw)={V,NP} : root(fy)=NP frontiers(fy)={} : root(fz)=S frontiers(fz)={} : Tree-DOP: Decomposition NCLT Seminar Series – November 2005

Composition operation (x 0 y): NP NP NP NP NP NP • Identify the leftmost open non-terminal LNT in x • Substitute y atLNT(x) if root(y) = LNT(x) john john john mary mary mary S S S S NP NP NP VP VP VP o = NP VP o = mary mary mary V V V NP NP NP mary V NP loves loves loves john john john loves S NP VP o VP o = o V NP V V V NP loves loves loves S S = o o o o NP VP NP VP V NP Tree-DOP: Parsing new input NCLT Seminar Series – November 2005

parse probability derivation probability fragment probability |e| = ΣΠ PDOP(T) Σu:r(e)=r(u) |u| t  D(T) e  d(t) Tree-DOP: Ranking output parses Relative frequency: • |e| - the number of occurrences of subtree e in the set of fragments • r(e) - the root node category of subtree e Parse probability: • Multiply fragment probabilities to calculate derivation probability • Sum derivation probabilities to calculate parse probability NCLT Seminar Series – November 2005

Tree-DOP: Some results F-Score Exact Match MPP MPD MPP MPD English; full parses only (90.83%) d = 1 94.78 d = 1 69.14 94.79 76.54 d = 2 97.55 d = 2 86.42 97.83 87.65 d = 3 96.86 d = 3 83.95 98.17 88.89 d = 4 95.43 d = 4 72.84 95.83 76.54 F-Score Exact Match MPP MPD MPP MPD French; full parses only (92.36%) d = 1 92.68 d = 1 52.94 92.22 55.29 d = 2 96.13 d = 2 72.94 96.10 68.24 d = 3 96.09 d = 3 70.59 97.06 76.47 d = 4 96.62 d = 4 70.59 96.65 74.12 NCLT Seminar Series – November 2005

PRED ‘flash<SUBJ>’ TNS-ASP f2 MOOD indicative PERF - PROG + TENSE pres S SUBJ PRED ‘LED’ CASE nom NUM sing PERS 3 NP VPaux DET NPadj AUX V the Adj N is flashing SPEC SPEC-FORM the SPEC-TYPE def f4 yellow LED PRED ‘yellow<SUBJ>’ SUBJ ADJUNCT f3 f5 f3 f1 LFG-DOP Lexical Functional Grammar (LFG): a constraint-based theory of language • c-structure: context-free phrase structure trees • f-structure: attribute-value matrix • -links: mapping from c- to f-structure NCLT Seminar Series – November 2005

F-structure unit f is -accessible from node n iff • n is -linked to f i.e. (n) = f, or • f is contained within (n) i.e. there is a chain of attributes leading from (n) to f. SUBJ PRED ‘john’ SUBJ PRED ‘john’ S f2 NUM sg S f2 NUM sg PRED ‘love<SUBJ,OBJ>’ PRED ‘love<SUBJ,OBJ>’ NP VP NP VP TNS pres TNS pres NUM sg NUM sg john V NP john V NP PRED ‘mary’ PRED ‘mary’ loves mary OBJ loves mary OBJ f3 NUM sg f3 NUM sg f1 f1 LFG-DOP: Fragmentation Root: select any non-frontier non-terminal c-structure node as root and Root and Frontier: • C-structure: delete all except this new root and the subtree it dominates • -links: delete all -links corresponding to deleted c-structure nodes • F-structure: delete all f-structure units not -accessible from the remaining c-structure nodes • Forms: delete all semantic forms corresponding to deleted terminals Frontier: select a (possibly empty) set of non-root non-terminal nodes in the root-created c-structure • C-structure: delete all c-structure subtrees dominated by frontier nodes -accessibility: NCLT Seminar Series – November 2005

SUBJ PRED ‘john’ S NUM sg PRED ‘love<SUBJ,OBJ>’ NP VP TNS pres SUBJ NUM sg SUBJ NUM sg PRED ‘john’ john V NP VP S NUM sg PRED ‘love<SUBJ,OBJ>’ PRED ‘mary’ TNS pres loves mary TNS pres OBJ V NP NP VP NUM sg NUM sg NUM sg loves mary PRED ‘mary’ john OBJ NUM sg OBJ NUM sg LFG-DOP: Parsing new input • C-structure: left-most substitution • category-matching • F-structure: unification • uniqueness, completeness, coherence LFG-DOP composition o = NCLT Seminar Series – November 2005

parse probability derivation probability fragment probability P(e) = ΣΠ PDOP(T) Σex  CSP(ex) t  D(T) e  d(t) LFG-DOP: computing probabilities Parse probability: • Multiply fragment probabilities to calculate derivation probability • Sum derivation probabilities to calculate parse probability • Normalise over the probabilities of valid parses PDOP(T) PLFG-DOP(T|T is valid) = ΣTx is validPDOP(Tx) NCLT Seminar Series – November 2005

SUBJ NUM pl SUBJ PRED ‘john’ … VP S NUM sg PRED ‘see<SUBJ,OBJ>’ TNS pres TNS pres V NP NUM sg NP VP NUM sg … NUM pl see mary PRED ‘mary’ john OBJ NUM sg OBJ NUM sg LFG-DOP: Robustness via discard o = Discard operation: • Delete attribute-value pairs from the f-structure while keeping c-structure and -links constant • Restriction: pairs whose values are -linked to remaining c-structure nodes are not deleted SUBJ PRED ‘john’ SUBJ PRED ‘john’ SUBJ PRED ‘john’ S NUM pl NUM sg PRED ‘see<SUBJ,OBJ>’ PRED ‘see<SUBJ,OBJ>’ PRED ‘see<SUBJ,OBJ>’ NP VP TNS pres TNS pres TNS pres NUM sg NUM sg NUM sg john V NP PRED ‘mary’ PRED ‘mary’ PRED ‘mary’ see mary OBJ OBJ OBJ NUM sg NUM sg NUM sg NCLT Seminar Series – November 2005

LFG-DOP: What are the challenges? • To define fragmentation operations such that: • phenomena such as recursion and re-entrancy are handled • constraints are applied appropriately • discard is used only to handle ill-formed input • To somehow distinguish between constraining and informative features • translation vs. parsing • To address the fact that substitution is local but unification is global, i.e. • to enforce LFG well-formedness conditions in an accurate and efficient manner • to sample for the best parse in an accurate and efficient manner • to define a probability model that doesn’t ‘leak’ NCLT Seminar Series – November 2005

SPEC-FORM the SPEC-TYPE def PRED ‘yellow<SUBJ>’ SUBJ PRED ‘yellow<SUBJ>’ SUBJ f3 f3 LFG-DOP: Recursion & Re-entrancy PRED ‘venir<SUBJ,XCOMP>’ TNS pres FIN + PERS 3 PRED ‘LED’ CASE nom NUM sing PERS 3 S NP NP VP PRED jean NUM sg PERS 3 DET NPadj SPEC SUBJ f2 f4 jean V V’ the Adj N PRED ‘tomber<SUBJ>’ SUBJ DE + FIN - COMP V vient yellow LED f2 ADJUNCT f5 de tomber XCOMP f3 f1 f3 • How can we adequately express the constraints on the composition of fragments such as (ADJ yellow) and (V tomber)? CASE nom NUM sing PERS 3 NUM sg PERS 3 SUBJ f2 V Adj SPEC f4 SPEC-TYPE def PRED ‘tomber<SUBJ>’ SUBJ DE + FIN - tomber f2 yellow XCOMP ADJUNCT f5 f1 f3 f3 NCLT Seminar Series – November 2005

MOOD indicative PERF - PROG + TENSE pres LFG-DOP: Constraint over-specification PRED ‘flash<SUBJ>’ TNS-ASP f2 SUBJ CASE nom NUM sing PERS 3 V SPEC f4 SPEC-TYPE def flashing ADJUNCT f5 SUBJ f3 f3 f1 • Is it appropriate to insist that the subject of flashing have an adjunct? • Is it appropriate to be forced to use discard to allow the subject of flashing have an indefinite specifier? • Important: we also want to remain language-independent… NCLT Seminar Series – November 2005

LFG-DOP: An alternative fragmentation process • Determine a c-structure fragment using root and frontier as for Tree-DOP but retain the full f-structure given in the original representation. • Delete all f-structure units (and the attributes with which they are associated) which are not -linked from one or more remaining c-structure nodes unlessthat unit is the value of an attribute subcategorised for by a PRED value whose corresponding terminal is dominated by the current fragment root node in the original representation. • Where we have floating f-structure units, also retain the minimal f-structure unit which retains them both. By minimal unit we mean the unit containing both floating f-structures along with their (nested sequence of) attributes. • Delete all semantic forms (including PRED attributes and their values) not associated with one of the remaining c-structure teriminals. NCLT Seminar Series – November 2005

LFG-DOP: constraints vs. information • To prune attribute-value pairs based on a language-specific algorithm • e.g. English: subj-verb agreement but not obj-verb agreement • To automatically learn which attribute-value pairs should be pruned for a particular dataset • To do ‘soft’ pruning: distinguish between constraining features and informative features • Account for the difference during unification best suited to translation? best suited to parsing? NCLT Seminar Series – November 2005

LFG-DOP: substitution vs. unification substitution is local but unification is global SUBJ NUM sg SUBJ NUM sg SUBJ NUM sg o NP PRED ‘john’ o VP o o S NUM sg TNS pres V PRED ‘love<SUBJ,OBJ>’ NP PRED ‘mary’ TNS pres NUM sg TNS pres NUM sg NUM sg NUM sg john V NP NP VP OBJ NUM pl loves mary OBJ NUM pl OBJ NUM pl To be enforced: category matching, uniqueness, coherence and completeness Model M1: Enforce category matching during parsing Model M2: Enforce category matching and uniqueness during parsing Model M3: Enforce category matching, uniqueness and coherence during parsing There is no Model M4: completeness can never be checked until a complete parse has been obtained NCLT Seminar Series – November 2005

LFG-DOP: sampling [i][j],VP The exact probability of sampling fx at [i][j],VP is: • PDOP(fx) • Multiplied by the sampling probability mass available at each of its substitution sites [i][k],V and [i+k][j-k],NP • And divided by the sampling probability mass available at [i][j],VP fx: VP … V NP [i][k] [i+k][j-k] [i][j],VP Problem for computing the exact probability of sampling fx at [i][j],VP: f1 SUBJ f2 NUM sg • We cannot know the sampling probability mass available at substitution site [i+k][j-k],NP until [i+k][j-k],NP is the leftmost substitution site unless we stick with Model M1 TNS pres fx: VP NUM sg … V NP OBJ f3 [i][k] [i+k][j-k] Problem for establishing when enough samples have been taken: • We cannot know how many valid parses there are until all constraints have been resolved NCLT Seminar Series – November 2005

LFG-DOP: ‘leaked’ probability mass PRED ‘john’ S NUM sg S NP PRED ‘john’ VP SUBJ NUM pl SUBJ NUM pl o o = SUBJ NUM pl NUM sg TNS pres NP VP NP VP TNS pres john left TNS pres PRED ‘leave<sUBJ>’ PRED ‘leave<sUBJ>’ john left * * = 0.05 0.007 0.001 0.00000035 • This derivation will be thrown out because it does not satisfy the uniqueness condition • Its probability is thrown out with it  ‘leaked’ probability mass • Normalisation camouflages the problem but does not solve it NCLT Seminar Series – November 2005

Data-Oriented Natural Language Processing using Lexical-Functional Grammar Questions? NCLT Seminar Series – November 2005

Data-Oriented Natural Language Processing using Lexical-Functional Grammar