1 / 35

Chapter 4: Top-Down Parsing

Chapter 4: Top-Down Parsing. Objectives of Top-Down Parsing. an attempt to find a leftmost derivation for an input string. an attempt to construct a parse tree for the input string starting from the root and creating the nodes of the parse tree in preorder. Input String :. lm. lm. lm. >.

Download Presentation

Chapter 4: Top-Down Parsing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 4: Top-Down Parsing

  2. Objectives of Top-Down Parsing • an attempt to find a leftmost derivation for an input string. • an attempt to construct a parse tree for the input string starting from the root and creating the nodes of the parse tree in preorder.

  3. Input String : lm lm lm > > >

  4. Approaches of Top-Down Parsing • 1. with backtracking (making repeated scans of the input, a general form of top-down parsing) • Methods: To create a procedure for each nonterminal.

  5. L = { cabd, cad } e.g.S -> cAd A -> ab | a S( ) { if input symbol == ‘c’ A( ) { isave= input-pointer; { Advance(); if input-symbol == ‘a’ if A() { Advance(); if input-symbol == ‘d’ if input-symbol == ‘b’ { Advance(); { Advance(); return true; return true; } } } } return false; input-pointer = isave; } if input-symbol == ‘a’ { Advance(); return true; } else return false; } c a d

  6. Problems for top-down parsing with backtracking : (1) left-recursion (can cause a top-down parser to go into an infinite loop)   Def. A grammar is said to be left-recursive if it has a nonterminal A s.t. there is a derivation A => A  for some  . (2) backtracking - undo not only the movement but also the semantics entering in symbol table. (3) the order the alternatives are tried (For the grammar shown above, try w = cabd where A -> a is applied first) +

  7. Elimination of Left-Recursion With immediate left recursion: A -> A  |  ==> transform into A ->  A' A' ->  A' |  A A  A'  A  A' ===> A  . .  A' . A .   A' A   … 

  8. e.g. E -> E + T | T T -> T * F | F F -> (E) | id After transformation: E -> TE' E' -> +TE' |  T -> FT' T' -> *FT' |  F -> (E) | id

  9. General form (with left recursion): A -> A 1 | A 2 | ... | A n | 1 | 2 | ... | m After transformation: ==> A -> 1 A' | 2 A' | ... | m A' A' -> 1 A' | 2 A' | ... | n A' | 

  10. How about left recursion occurred for derivation with more than two steps? e.g., S -> Aa | b A -> Ac | Sd | e where S => Aa => Sda

  11. Algorithm: Eliminating left recursion + Input Context-free Grammar G with no cycles (i.e., A => A ) or -production Methods: 1. Arrange the nonterminals in some order A1, A2, ... , An 2. for i = 1 to n do { for j = 1 to i -1 do replace each production of the form Ai -> Aj  by the production Ai -> 1  | 2  | ... | k  , where Aj -> 1 | 2 | ... | k are all current Aj-production; eliminate the immediate left-recursion among the Ai- production; }

  12. An Example e.g. S -> Aa | b A -> Ac | Sd | e Step 1: ==> S -> Aa | b Step 2: ==>  A -> Ac | Aad | bd | e Step 3: ==>  A -> bdA' |eA' A' -> cA' |adA' | 

  13. 2. Non-backtracking (recursive-descent) parsing recursive descent : use a collection of mutually recursive routines to perform the syntax analysis. Left Factoring : A -> 1 | 2 ==> A ->  A' A' -> 1 | 2 Methods: • For each nonterminal A find the longest prefix  common to two or more of its alternatives. If  replace all the A productions A -> 1 | 2 | ... | n | others by A ->  A‘ | others A' -> 1 | 2 | ... | n 2. Repeat the transformation until no more found e.g. S -> iCtS | iCtSeS | a C -> b ==> S -> iCtSS' | a S' -> eS |  C -> b

  14. Predicative Parsing Features: - maintains a stack rather than recursive calls - table-driven Components: 1. An input buffer with end marker ($) 2. A stack with endmarker ($) on the bottom 3. A parsing table, a two-dimensional array M[A,a], where ‘A’ is a nonterminal symbol and ‘a’ is the current input symbol (terminal/token).

  15. Parsing Table ( ) $ M[A,a] S  ( S ) S S  ε S  ε S

  16. Algorithm: Input: An input string w and a parsing table M for grammar G. Output: A leftmost derivation of w or an error indication.

  17. Starting Symbol of the grammar Initially w$ is in input buffer and S$ is in the stack. Method: do { Let a of w be the next input symbol and X be the top stack symbol; if X is a terminal { if X == a then pop X from stack and remove a from input; else ERROR();} else { if M[X, a] = X -> Y1Y2...Yn then 1. pop X from the stack; 2. push YnYn-1...Y1 onto the stack with Y1 on top; else ERROR(); } } while (X ≠ $) if (X == $) and (the next input symbol == $) then accept else error();

  18. An Example

  19. Construction of the parsing table for predictive parser First and Follow Def. First() /* denotes grammar symbol*/ is the set of terminals that begin the string derived from . If  => , then  is also in First(). Def. Follow(A), A is a nonterminal, is the set of terminals a that can appear immediately to the right of A in some sentential form, that is, the set of terminals 'a' s.t. there exists a derivation of the form S =>*  A a for some  and . If A can be the rightmost symbol in some sentential form, then  is in Follow(A). *

  20. Compute First(X) for all grammar symbols X: 1. If X is terminal, then First(X) = {X}. 2. If X ->  is a production then  is in First(X). 3. If X is nonterminal and X -> Y1Y2...Yk is a production, then place 'a' in First(X) if for some i, a is in First(Yi), and  is in all of First(Y1), ... , First(Yi-1); that is Y1 ... Yi-1 => . If  is in First(Yj) for all j = 1,2,...,k, then add  in First(X). *

  21. An Example E -> TE' E' -> +TE'|  T -> FT' T' -> *FT‘ |  F -> (E) | id First(E) = First(T) = First(F) = {(, id} First(E') = {+, } First(T') = {*,  }

  22. Compute Follow(A) for all nonterminals A 1. Place $ in Follow(S), where S is the start symbol and $ is the input buffer endmarker. 2. If there is a production A ->  B , then everything in First() except for  is placed in Follow(B). 3. If there is a production A ->  B, or a production A ->  B  where First() contains , then everything in Follow(A) is in Follow(B).

  23. An Example E -> TE' E' -> +TE'|  T -> FT' T' -> *FT' |  F -> (E) | id /* E is the start symbol */ Follow(E) = { $,) } // rules 1 & 2 Follow(E') = { $,) } // rule 3 Follow(T) = { +,$,) } // rules 2 & 3 Follow(T') = { +,$,) } // rule 3 Follow(F) = { *,+,$,) } // rules 2 & 3

  24. E -> TE' E' -> +TE'|  T -> FT' T' -> *FT‘ |  F -> (E) | id First(E) = First(T) = First(F) = {(, id} First(E') = {+, } First(T') = {*,  }

  25. Construct a Predicative Parsing Table 1. For each production A ->  of the grammar, do steps 2 and 3. 2. For each terminal a in First(), add A ->  to M[A, a]. 3. If  is in First(), add A ->  to M[A, b] for each terminal b in Follow(A). If  is in First() and $ is in Follow(A), add A ->  to M[A, $]. 4. Make each undefined entry of M be error.

  26. LL(1) grammar A grammar whose parsing table has no multiply-defined entries is said to be LL(1). First 'L' : scan the input from left to right. Second 'L': produce a leftmost derivation. '1' : use one input symbol to determine parsing action. * No ambiguous or left-recursive grammar can be LL(1).

  27. Properties of LL(1) grammar A grammar G is LL(1) iff whenever A ->  |  are two distinct productions of G, the following conditions hold: (1) For no terminal a do both  and  derive strings beginning with a. (based on method 2)  First() ∩ First() = ψ (2) At most one of  and  can derive the empty string  (based on method 3). (3) if  =>  then  does not derive any string beginning with a terminal in Follow (A) (based on methods 2 and 3).  First() ∩ Follow(A) = ψ (i.e. If First(A) contains  then First(A) ∩ Follow(A) = ψ) *

  28. Def. for Multiply-defined entry If G is left-recursive or ambiguous, then M will have at least one multiply-defined entry.   e.g. S -> iCtSS'| a S' -> eS |  C -> b generates: M[S',e] = { S' -> , S' -> eS} with multiply- defined entry.

  29. Parsing table with multiply-defined entry

  30. Difficulty in predictive parsing • Left recursion elimination and left factoring make the resulting grammar hard to read and difficult to use for translation purpose. Thus: * Use predictive parser for control constructs * Use operator precedence for expressions.

  31. Assignment #3b Do exercises 4.3, 4.10, 4.13, 4.15

More Related