1 / 28

CSA350: NLP Algorithms

CSA350: NLP Algorithms. Sentence Parsing 2 Top Down Bottom-Up Left Corner BUP Implementation in Prolog. Sources. Jurafsky & Martin Chapter 10 Covington Chapter 6. Derivation top down, left-to-right, depth first. Bottom Up Filtering.

benito
Download Presentation

CSA350: NLP Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSA350: NLP Algorithms Sentence Parsing 2 Top Down Bottom-Up Left Corner BUP Implementation in Prolog csa3050: Sentence Parsing II

  2. Sources • Jurafsky & Martin Chapter 10 • Covington Chapter 6 csa3050: Sentence Parsing II

  3. Derivation top down, left-to-right, depth first

  4. Bottom Up Filtering • We know the current input word must serve as the first word in the derivation of the unexpanded node the parser is currently processing. • Therefore the parser should not consider grammar rule for which the current word cannot serve as the "left corner" • The left corner is the first preterminal node along the left edge of a derivation. csa3050: Sentence Parsing II

  5. Left Corner fl fl The node marked Verb is a left corner of VP csa3050: Sentence Parsing II

  6. Left Corner • B is a left corner of A iffA * Bαfor non-terminal A, pre-terminal B and symbol string α. • Possible left corners of all non-terminal categories can be determined in advance and placed in a table. csa3050: Sentence Parsing II

  7. Left Corner (Operational Definition) • A nonterminal B is a left-corner of another nonterminal A if: • A=B (reflexive case); • there exists a rule A → Bα for non-terminal A, pre-terminal B and symbol string α; (immediate case) • there exists a rule C such that A → Cαand B is a left-corner of C (transitive case). csa3050: Sentence Parsing II

  8. s --> np, vp. s --> aux, np, vp. s --> vp. np --> det nom. nom --> noun. nom --> noun, nom. nom --> nom, pp pp --> prep, np. np --> pn. vp --> v. vp --> v np What are the left corners of S? DCG-style Grammar/Lexicon csa3050: Sentence Parsing II

  9. Example of Left Corner Table csa3050: Sentence Parsing II

  10. How to use the Left Corner Table • If attempting to parse category A, only consider rules A → Bαfor which category(current input)  LeftCorners(B) S → NP VPS → Aux NP VPS → VP csa3050: Sentence Parsing II

  11. Prolog Implementations • Top Down: depth first recursive descent • Bottom Up: shift/reduce • Left Corner • BUP csa3050: Sentence Parsing II

  12. Top Down Implementationin Prolog • Parser takes form of predicate parse(C,S1,S) : parse a constitutent C starting with input string S1 and ending with input string S.?- parse(s,[the,dog,barked],[]). • If C is a pre-terminal category, check, use lexicon to determine that current input word has that category. • Otherwise expand C using grammar rules and parse rhs constitutents. csa3050: Sentence Parsing II

  13. % Grammar rule(s,[np,vp]). rule(np,[d,n]). rule(vp,[v]). rule(vp,[v,np]). % Lexicon word(d,the). word(n,dog). word(n,cat). word(n,dogs). word(n,cats). word(v,chase). word(v,chases). Recoding the Grammar/Lexicon csa3050: Sentence Parsing II

  14. Top Down Parser parse(C,[Word|S],S) :- word(C,Word). parse(C,S1,S) :- rule(C,Cs), parse_list(Cs,S1,S). parse_list([],S,S). parse_list([C|Cs],S1,S) :- parse(C,S1,S2), parse_list(Cs,S2,S). csa3050: Sentence Parsing II

  15. Shift/Reduce Algorithm • Two data structures • input string • stack • Repeat until input is exhausted • Shift word to stack • Reduce stack using grammar and lexicon until no further reductions • Unlike top down, algorithm does not require category to be specified in advance. It simply finds all possible trees. csa3050: Sentence Parsing II

  16. Shift/Reduce Operation →| Step Action Stack Input 0 (start) the dog barked 1 shift the dog barked 2 reduce d dog barked 3 shift dog d barked 4 reduce n d barked 5 reduce np barked 6 shift barked np 7 reduce v np 8 reduce vp np 9 reduce s csa3050: Sentence Parsing II

  17. parse(S,Res) :- sr(S,[],Res). sr(S,Stk,Res) :- shift(Stk,S,NewStk,S1), reduce(NewStk,RedStk), sr(S1,RedStk,Res). sr([],Res,Res). shift(X,[H|Y],[H|X],Y). reduce(Stk,RedStk) :- brule(Stk,Stk2), reduce(Stk2,RedStk). reduce(Stk,Stk). %grammar brule([vp,np|X],[s|X]). brule([n,d|X],[np|X]). brule([np,v|X],[vp|X]). brule([np,v|X],[vp|X]). %interface to lexicon brule([Word|X],[C|X]) :- word(C,Word). Shift/Reduce Implementationin Prolog csa3050: Sentence Parsing II

  18. Shift/Reduce Operation • Words are shifted to the beginning of the stack, which ends up in reverse order. • The reduce step is simplified if we also store the rules backward, so that the rule s → np vp is stored as the factbrule([vp,np|X],[s|X]). • The term [a,b|X] matches any list whose first and second elements are a and b respectively. • The first argument directly matches the stack to which this rule applies • The second argument is what the stack becomes after reduction. csa3050: Sentence Parsing II

  19. Left Corner Parsing • Key Idea: accept a word, identify the constituent it marks the beginning of, and parse the rest of the constituent top down. • Main Advantages: • Like a bottom-up parser, can handle left recursion without looping, since it starts each constituent by accepting a word from the input string. • Like a top-down parser, is always expecting a particular category for which only a few of the grammar rules are relevant. It is therefore more efficient than a plain shift-reduce algorithm. csa3050: Sentence Parsing II

  20. Left Corner Algorithm To parse a constituent of type C: • Accept a word W from input and determine K, its category. • Complete C: • If K=C, exit with success; otherwise • Find a constituent whose expansion begins with K. Call that CC. For instance, if K=d (determiner), CC could be Np, since we have rule(np,[d,n]) • Recursively left-corner parse all the remaining elements of the expansion of CC (in this case, [n]). • Put CC in place of K, and return to step 2 csa3050: Sentence Parsing II

  21. Left Corner Implementation parse(C,[W|Rest],P) :- word(K,W), complete(K,C,Rest,P). parse_list([],P,P). parse_list(([C|Cs],P1,P) :- parse(C,P1,P2), parse_list(Cs,P2,P). complete(C,C,P,P). % if C=W, do nothing complete(K,C,P1,P) :- rule(CC,[K|Rest]), parse_list(Rest,P1,P2), complete(CC,C,P2,P). csa3050: Sentence Parsing II

  22. Trace of Left Corner Call: ( 7) parse(np, [the, cat], []) ? creep Call: ( 8) word(_L128, the) ? creep Exit: ( 8) word(d, the) ? creep Call: ( 8) complete(d, np, [cat], []) ? creep Call: ( 9) rule(_L153, [d|_G306]) ? creep Exit: ( 9) rule(np, [d, n]) ? creep Call: ( 9) parse_list([n], [cat], _L155) ? creep Call: ( 10) parse(n, [cat], _L181) ? creep Call: ( 11) word(_L196, cat) ? creep Exit: ( 11) word(n, cat) ? creep Call: ( 11) complete(n, n, [], _L181) ? creep Exit: ( 11) complete(n, n, [], []) ? creep Exit: ( 10) parse(n, [cat], []) ? creep Call: ( 10) parse_list([], [], _L155) ? creep Exit: ( 10) parse_list([], [], []) ? creep Exit: ( 9) parse_list([n], [cat], []) ? creep Call: ( 9) complete(np, np, [], []) ? creep Exit: ( 9) complete(np, np, [], []) ? creep Exit: ( 8) complete(d, np, [cat], []) ? creep Exit: ( 7) parse(np, [the, cat], []) ? creep csa3050: Sentence Parsing II

  23. BUP: Bottom Up Parser(Matsumoto et. al. 1983) • Each PS rule goes into Prolog as a clause whose head is not the mother node but the leftmost daughter. The rule np → d n pp is translated as:d(C,S1,S) :- parse(n,s1,s2), parse(pp,S2,S3), np(C,S3,S). • i.e. if you have just completed a d, parse an n, then a pp, then call the procedure for a completed np. • In addition to a clause for each PS rule, BUP needs a terminating clause for every kind of constitutent, e.g.np(np,S,S). • i.e. if you have just accepted an np and np is what you are looking for, you are done. csa3050: Sentence Parsing II

  24. BUP - Remarks • BUP is efficient because the hard part of the search – what to do with a newly completed leftmost daughter – is handled by Prolog’s fastest search mechanism – finding a clause given the predicate. csa3050: Sentence Parsing II

  25. BUP Implementation - Parser % parse(+C,+S1,-S) % Parse a constituent of category C % starting with input string S1 and % ending up with input string S. parse(C,S1,S) :- word(W,S1,S2), P =.. [W,C,S2,S], call(P). csa3050: Sentence Parsing II

  26. BUP Implementation - Rules % PS-rules and terminating clauses np(C,S1,S) :- parse(vp,S1,S2), s(C,S2,S). % S --> NP VP np(C,S1,S) :- parse(conj,S1,S2), parse(np,S2,S3), np(C,S3,S). % NP --> NP Conj NP np(np,X,X). d(C,S1,S) :- parse(n,S1,S2), np(C,S2,S). % NP --> D N d(d,X,X). v(C,S1,S) :- parse(np,S1,S2), vp(C,S2,S). % VP --> V NP v(C,S1,S) :- parse(np,S1,S2), parse(pp,S2,S3), vp(C,S3,S). % VP --> V NP PP v(v,X,X). p(C,S1,S) :- parse(np,S1,S2), pp(C,S2,S). % PP --> P NP p(p,X,X). % Terminating clauses for all other categories s(s,X,X). vp(vp,X,X). pp(pp,X,X). n(n,X,X). conj(conj,X,X). csa3050: Sentence Parsing II

  27. BUP Implementation - Lexicon % Lexicon word(conj,[and|X],X). word(p,[near|X],X). word(d,[the|X],X). word(n,[dog|X],X). word(n,[dogs|X],X). word(n,[cat|X],X). word(n,[cats|X],X). word(n,[elephant|X],X). word(n,[elephants|X],X). word(v,[chase|X],X). word(v,[chases|X],X). word(v,[see|X],X). word(v,[sees|X],X). word(v,[amuse|X],X). word(v,[amuses|X],X). csa3050: Sentence Parsing II

  28. Principles for success • Left recursive structures must be found, not predicted • Empty categories must be predicted, not found • An alternative way to fix things is to tranform the grammar into an equivalent grammar. • Grammar transformations can fix both left-recursion and epsilon productions • But then you parse the same language but with different trees csa3050: Sentence Parsing II

More Related