1 / 23

Data-Oriented Natural Language Processing using Lexical-Functional Grammar

Explore Data-Oriented Parsing (DOP) using Lexical-Functional Grammar (LFG) and understand the challenges involved in LFG-based models. Learn about Experience-Based Parsing strategies in the NCLT Seminar Series of November 2005.

sgluck
Download Presentation

Data-Oriented Natural Language Processing using Lexical-Functional Grammar

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data-Oriented Natural Language Processing using Lexical-Functional Grammar Mary Hearne School of Computing, Dublin City University NCLT Seminar Series – November 2005

  2. Data-Oriented Natural Language Processing using Lexical-Functional Grammar • Data-Oriented Parsing (DOP): A review • Parsing with Lexical-Functional Grammar: LFG-DOP • LFG-based models: what are the challenges? NCLT Seminar Series – November 2005

  3. S NP VP S  NP VP (1) john V NP VP  V NP (1) loves mary V  loves (1) NP  john (1/2) NP  mary (1/2) S  NP^S VP^S (1) VP^S V^VP NP^VP (1) Parent-annotated PCFG V^VP loves (1) S NP^S john (1) NP^VP mary (1) NP^S VP^S john V^VP NP^VP S,loves S,loves NP VP,loves (1) loves mary Head-lexicalised PCFG VP,loves V,loves NP (1) NP VP,loves V,loves loves (1) john V,loves NP NP  john (1/2) loves mary NP  mary (1/2) Experience-Based Parsing: PCFGs Basic strategy: extractgrammar rulesand theirrelative frequenciesfrom atreebank ‘Vanilla’ PCFG NCLT Seminar Series – November 2005

  4. S VP NP V NP NP VP V NP john loves mary john V NP loves mary S loves mary VP VP VP NP VP V NP V NP V NP S S S john V NP mary loves NP VP NP VP NP VP loves mary V NP john V NP john V NP loves mary mary loves S S S NP VP NP VP NP VP V NP V NP john V NP mary loves S S S NP VP NP VP NP VP john V NP Experience-Based Parsing: DOP Basic strategy: extractgrammar rulesand theirrelative frequenciesfrom atreebank 1/10 1/10 1/10 1/10 1/10 1/10 1/10 1/10 1/10 1/10 1/2 1/2 1 1/4 1/4 1/4 1/4 NCLT Seminar Series – November 2005

  5. VP V NP PP keep D N P NP an eye D N on the LEDs PP P NPadj from A N PP last page P NP NUMBER N to first page Non-local dependencies “from last N to first N” “keep an eye on NP” NCLT Seminar Series – November 2005

  6. Root operation: Frontier operation: S NP • Select any non-frontier non-terminal node to be root • Delete all except this new root and the subtree it dominates • Select a (possibly empty) set of non-root non-terminal nodes in the newly-created subtree • Delete all subtrees dominated by these nodes NP VP john john V NP S loves mary VP NP VP V NP john V NP loves mary root(fw)=VP frontiers(fw)={V,NP} : root(fy)=NP frontiers(fy)={} : root(fz)=S frontiers(fz)={} : Tree-DOP: Decomposition NCLT Seminar Series – November 2005

  7. Composition operation (x 0 y): NP NP NP NP NP NP • Identify the leftmost open non-terminal LNT in x • Substitute y atLNT(x) if root(y) = LNT(x) john john john mary mary mary S S S S NP NP NP VP VP VP o = NP VP o = mary mary mary V V V NP NP NP mary V NP loves loves loves john john john loves S NP VP o VP o = o V NP V V V NP loves loves loves S S = o o o o NP VP NP VP V NP Tree-DOP: Parsing new input NCLT Seminar Series – November 2005

  8. parse probability derivation probability fragment probability |e| = ΣΠ PDOP(T) Σu:r(e)=r(u) |u| t  D(T) e  d(t) Tree-DOP: Ranking output parses Relative frequency: • |e| - the number of occurrences of subtree e in the set of fragments • r(e) - the root node category of subtree e Parse probability: • Multiply fragment probabilities to calculate derivation probability • Sum derivation probabilities to calculate parse probability NCLT Seminar Series – November 2005

  9. Tree-DOP: Some results F-Score Exact Match MPP MPD MPP MPD English; full parses only (90.83%) d = 1 94.78 d = 1 69.14 94.79 76.54 d = 2 97.55 d = 2 86.42 97.83 87.65 d = 3 96.86 d = 3 83.95 98.17 88.89 d = 4 95.43 d = 4 72.84 95.83 76.54 F-Score Exact Match MPP MPD MPP MPD French; full parses only (92.36%) d = 1 92.68 d = 1 52.94 92.22 55.29 d = 2 96.13 d = 2 72.94 96.10 68.24 d = 3 96.09 d = 3 70.59 97.06 76.47 d = 4 96.62 d = 4 70.59 96.65 74.12 NCLT Seminar Series – November 2005

  10. PRED ‘flash<SUBJ>’ TNS-ASP f2 MOOD indicative PERF - PROG + TENSE pres S SUBJ PRED ‘LED’ CASE nom NUM sing PERS 3 NP VPaux DET NPadj AUX V the Adj N is flashing SPEC SPEC-FORM the SPEC-TYPE def f4 yellow LED PRED ‘yellow<SUBJ>’ SUBJ ADJUNCT f3 f5 f3 f1 LFG-DOP Lexical Functional Grammar (LFG): a constraint-based theory of language • c-structure: context-free phrase structure trees • f-structure: attribute-value matrix • -links: mapping from c- to f-structure NCLT Seminar Series – November 2005

  11. F-structure unit f is -accessible from node n iff • n is -linked to f i.e. (n) = f, or • f is contained within (n) i.e. there is a chain of attributes leading from (n) to f. SUBJ PRED ‘john’ SUBJ PRED ‘john’ S f2 NUM sg S f2 NUM sg PRED ‘love<SUBJ,OBJ>’ PRED ‘love<SUBJ,OBJ>’ NP VP NP VP TNS pres TNS pres NUM sg NUM sg john V NP john V NP PRED ‘mary’ PRED ‘mary’ loves mary OBJ loves mary OBJ f3 NUM sg f3 NUM sg f1 f1 LFG-DOP: Fragmentation Root: select any non-frontier non-terminal c-structure node as root and Root and Frontier: • C-structure: delete all except this new root and the subtree it dominates • -links: delete all -links corresponding to deleted c-structure nodes • F-structure: delete all f-structure units not -accessible from the remaining c-structure nodes • Forms: delete all semantic forms corresponding to deleted terminals Frontier: select a (possibly empty) set of non-root non-terminal nodes in the root-created c-structure • C-structure: delete all c-structure subtrees dominated by frontier nodes -accessibility: NCLT Seminar Series – November 2005

  12. SUBJ PRED ‘john’ S NUM sg PRED ‘love<SUBJ,OBJ>’ NP VP TNS pres SUBJ NUM sg SUBJ NUM sg PRED ‘john’ john V NP VP S NUM sg PRED ‘love<SUBJ,OBJ>’ PRED ‘mary’ TNS pres loves mary TNS pres OBJ V NP NP VP NUM sg NUM sg NUM sg loves mary PRED ‘mary’ john OBJ NUM sg OBJ NUM sg LFG-DOP: Parsing new input • C-structure: left-most substitution • category-matching • F-structure: unification • uniqueness, completeness, coherence LFG-DOP composition o = NCLT Seminar Series – November 2005

  13. parse probability derivation probability fragment probability P(e) = ΣΠ PDOP(T) Σex  CSP(ex) t  D(T) e  d(t) LFG-DOP: computing probabilities Parse probability: • Multiply fragment probabilities to calculate derivation probability • Sum derivation probabilities to calculate parse probability • Normalise over the probabilities of valid parses PDOP(T) PLFG-DOP(T|T is valid) = ΣTx is validPDOP(Tx) NCLT Seminar Series – November 2005

  14. SUBJ NUM pl SUBJ PRED ‘john’ … VP S NUM sg PRED ‘see<SUBJ,OBJ>’ TNS pres TNS pres V NP NUM sg NP VP NUM sg … NUM pl see mary PRED ‘mary’ john OBJ NUM sg OBJ NUM sg LFG-DOP: Robustness via discard o = Discard operation: • Delete attribute-value pairs from the f-structure while keeping c-structure and -links constant • Restriction: pairs whose values are -linked to remaining c-structure nodes are not deleted SUBJ PRED ‘john’ SUBJ PRED ‘john’ SUBJ PRED ‘john’ S NUM pl NUM sg PRED ‘see<SUBJ,OBJ>’ PRED ‘see<SUBJ,OBJ>’ PRED ‘see<SUBJ,OBJ>’ NP VP TNS pres TNS pres TNS pres NUM sg NUM sg NUM sg john V NP PRED ‘mary’ PRED ‘mary’ PRED ‘mary’ see mary OBJ OBJ OBJ NUM sg NUM sg NUM sg NCLT Seminar Series – November 2005

  15. LFG-DOP: What are the challenges? • To define fragmentation operations such that: • phenomena such as recursion and re-entrancy are handled • constraints are applied appropriately • discard is used only to handle ill-formed input • To somehow distinguish between constraining and informative features • translation vs. parsing • To address the fact that substitution is local but unification is global, i.e. • to enforce LFG well-formedness conditions in an accurate and efficient manner • to sample for the best parse in an accurate and efficient manner • to define a probability model that doesn’t ‘leak’ NCLT Seminar Series – November 2005

  16. SPEC-FORM the SPEC-TYPE def PRED ‘yellow<SUBJ>’ SUBJ PRED ‘yellow<SUBJ>’ SUBJ f3 f3 LFG-DOP: Recursion & Re-entrancy PRED ‘venir<SUBJ,XCOMP>’ TNS pres FIN + PERS 3 PRED ‘LED’ CASE nom NUM sing PERS 3 S NP NP VP PRED jean NUM sg PERS 3 DET NPadj SPEC SUBJ f2 f4 jean V V’ the Adj N PRED ‘tomber<SUBJ>’ SUBJ DE + FIN - COMP V vient yellow LED f2 ADJUNCT f5 de tomber XCOMP f3 f1 f3 • How can we adequately express the constraints on the composition of fragments such as (ADJ yellow) and (V tomber)? CASE nom NUM sing PERS 3 NUM sg PERS 3 SUBJ f2 V Adj SPEC f4 SPEC-TYPE def PRED ‘tomber<SUBJ>’ SUBJ DE + FIN - tomber f2 yellow XCOMP ADJUNCT f5 f1 f3 f3 NCLT Seminar Series – November 2005

  17. MOOD indicative PERF - PROG + TENSE pres LFG-DOP: Constraint over-specification PRED ‘flash<SUBJ>’ TNS-ASP f2 SUBJ CASE nom NUM sing PERS 3 V SPEC f4 SPEC-TYPE def flashing ADJUNCT f5 SUBJ f3 f3 f1 • Is it appropriate to insist that the subject of flashing have an adjunct? • Is it appropriate to be forced to use discard to allow the subject of flashing have an indefinite specifier? • Important: we also want to remain language-independent… NCLT Seminar Series – November 2005

  18. LFG-DOP: An alternative fragmentation process • Determine a c-structure fragment using root and frontier as for Tree-DOP but retain the full f-structure given in the original representation. • Delete all f-structure units (and the attributes with which they are associated) which are not -linked from one or more remaining c-structure nodes unlessthat unit is the value of an attribute subcategorised for by a PRED value whose corresponding terminal is dominated by the current fragment root node in the original representation. • Where we have floating f-structure units, also retain the minimal f-structure unit which retains them both. By minimal unit we mean the unit containing both floating f-structures along with their (nested sequence of) attributes. • Delete all semantic forms (including PRED attributes and their values) not associated with one of the remaining c-structure teriminals. NCLT Seminar Series – November 2005

  19. LFG-DOP: constraints vs. information • To prune attribute-value pairs based on a language-specific algorithm • e.g. English: subj-verb agreement but not obj-verb agreement • To automatically learn which attribute-value pairs should be pruned for a particular dataset • To do ‘soft’ pruning: distinguish between constraining features and informative features • Account for the difference during unification best suited to translation? best suited to parsing? NCLT Seminar Series – November 2005

  20. LFG-DOP: substitution vs. unification substitution is local but unification is global SUBJ NUM sg SUBJ NUM sg SUBJ NUM sg o NP PRED ‘john’ o VP o o S NUM sg TNS pres V PRED ‘love<SUBJ,OBJ>’ NP PRED ‘mary’ TNS pres NUM sg TNS pres NUM sg NUM sg NUM sg john V NP NP VP OBJ NUM pl loves mary OBJ NUM pl OBJ NUM pl To be enforced: category matching, uniqueness, coherence and completeness Model M1: Enforce category matching during parsing Model M2: Enforce category matching and uniqueness during parsing Model M3: Enforce category matching, uniqueness and coherence during parsing There is no Model M4: completeness can never be checked until a complete parse has been obtained NCLT Seminar Series – November 2005

  21. LFG-DOP: sampling [i][j],VP The exact probability of sampling fx at [i][j],VP is: • PDOP(fx) • Multiplied by the sampling probability mass available at each of its substitution sites [i][k],V and [i+k][j-k],NP • And divided by the sampling probability mass available at [i][j],VP fx: VP … V NP [i][k] [i+k][j-k] [i][j],VP Problem for computing the exact probability of sampling fx at [i][j],VP: f1 SUBJ f2 NUM sg • We cannot know the sampling probability mass available at substitution site [i+k][j-k],NP until [i+k][j-k],NP is the leftmost substitution site unless we stick with Model M1 TNS pres fx: VP NUM sg … V NP OBJ f3 [i][k] [i+k][j-k] Problem for establishing when enough samples have been taken: • We cannot know how many valid parses there are until all constraints have been resolved NCLT Seminar Series – November 2005

  22. LFG-DOP: ‘leaked’ probability mass PRED ‘john’ S NUM sg S NP PRED ‘john’ VP SUBJ NUM pl SUBJ NUM pl o o = SUBJ NUM pl NUM sg TNS pres NP VP NP VP TNS pres john left TNS pres PRED ‘leave<sUBJ>’ PRED ‘leave<sUBJ>’ john left * * = 0.05 0.007 0.001 0.00000035 • This derivation will be thrown out because it does not satisfy the uniqueness condition • Its probability is thrown out with it  ‘leaked’ probability mass • Normalisation camouflages the problem but does not solve it NCLT Seminar Series – November 2005

  23. Data-Oriented Natural Language Processing using Lexical-Functional Grammar Questions? NCLT Seminar Series – November 2005

More Related