1 / 92

Understanding Top-Down Parsing for Compiler Design

Learn the implementation of parsers, distinctions between top-down and bottom-up approaches, and how top-down parsing constructs parse trees. Explore recursive descent parsing and predictive parsers.

Download Presentation

Understanding Top-Down Parsing for Compiler Design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 6: Top-Down Parsing CS5363 Programming Languages and Compilers Xiaoyin Wang

  2. Lecture Outline • Implementation of parsers • Two approaches • Top-down • Bottom-up • Today: Top-Down • Easier to understand and program manually • Then: Bottom-Up • More powerful and used by most parser generators

  3. Token Stream Input Abstract Syntax Tree Parser Example IF LPAREN ID(x) ifStmt RPAREN GEQ ID(y) >= assign ID(y) BECOMES INT(42) SCOLON ID(x) ID(y) ID(y) INT(42)

  4. 1 t2 3 t9 4 7 t6 t5 t8 Intro to Top-Down Parsing • The parse tree is constructed • From the top • From left to right • Terminals are seen in order of appearance in the token stream: t2 t5 t6 t8 t9

  5. Recursive Descent Parsing • Consider the grammar E  T + E | T T  int | int * T | ( E ) • Token stream is: int5 * int2 • Start with top-level non-terminal E • Try the rules for E in order

  6. Recursive Descent Parsing. Example (Cont.) • Try E0 T1 + E2 => (E3)+ E2 • Then try a rule for T1  ( E3 ) • But ( does not match input token int5 • TryT1  int . Token matches. => int + E2 • But + after T1 does not match input token * • Try T1  int * T2 • This will match but + after T1 will be unmatched • Has exhausted the choices for T1 • Backtrack to choice for E0

  7. E0 T1 int5 T2 * int2 Recursive Descent Parsing. Example (Cont.) • Try E0 T1 • Follow same steps as before for T1 • And succeed with T1  int * T2 and T2  int • Withthe following parse tree

  8. A Recursive Descent Parser. Preliminaries • Let TOKEN be the type of tokens • Special tokens INT, LPARA, RPARA, PLUS, TIMES • Let the global next point to the next token

  9. A Recursive Descent Parser (2) • Define boolean functions that check the token string for a match of • A given token terminal bool term(TOKEN tok) { return *next++ == tok; } • A given production of S (the nth) bool Sn() { … } • Any production of S: bool S() { … } • These functions advance next

  10. A Recursive Descent Parser (3) • For production E  T + E bool E1() { return T() && term(PLUS) && E(); } • For production E  T bool E2() { return T(); } • For all productions of E (with backtracking) bool E() { TOKEN *save = next; return (next = save, E1()) || (next = save, E2()); }

  11. A Recursive Descent Parser (4) • Functions for non-terminal T bool T1() { return term(INT); } bool T2() { return term(INT) && term(TIMES) && T(); } bool T3() { return term(LPARA) && E() && term(RPARA); } bool T() { TOKEN *save = next; return (next = save, T1()) || (next = save, T2()) || (next = save, T3()); }

  12. Recursive Descent Parsing. Notes. • To start the parser • Initialize next to point to first token • Invoke E() • Notice how this simulates our previous example • Easy to implement by hand • But does not always work …

  13. When Recursive Descent Does Not Work • Consider a production S  S a | a bool S1() { return S() && term(a); } bool S() { return S1(); } • S() will get into an infinite loop • A left-recursive grammar has a non-terminal S S + S for some  • Recursive descent does not work in such cases

  14. Elimination of Left Recursion • Consider the left-recursive grammar S S  |  • S generates all strings starting with a  and followed by a number of  • Can rewrite using right-recursion S  S’ S’   S’ | 

  15. More Elimination of Left-Recursion • In general S S 1 | … | S n | 1 | … | m • All strings derived from S start with one of 1,…,mand continue with several instances of1,…,n • Rewrite as S  1 S’ | … | m S’ S’  1 S’ | … | n S’ | 

  16. General Left Recursion • The grammar S  A  |  A  S  is also left-recursive because S+ S  

  17. Summary of Recursive Descent • Simple and general parsing strategy • Left-recursion must be eliminated first • … but that can be done automatically • Used in gcc, chrome, roslyn, … • Limitation: backtracking • Thought to be too inefficient • In practice, backtracking is eliminated by restricting the grammar

  18. Predictive Parsers • Like recursive-descent but parser can “predict” which production to use • By looking at the next few tokens • No backtracking • Predictive parsers accept LL(k) grammars • L means “left-to-right” scan of input • L means “leftmost derivation” • k means “predict based on k tokens of lookahead” • In practice, LL(1) is usually used

  19. LL(1) Languages • In recursive-descent, for each non-terminal and input token there may be a choice of production • LL(1) means that for each non-terminal and token there is only one production • Can be specified via 2D tables • One dimension for current non-terminal to expand • One dimension for next token • A table entry contains one production

  20. An example • S -> *BC | abc • B -> bC | * • C -> aCc | c • input: *baccc Step (1): S -> *BC Step (2): B -> bC Step (3): C -> aCc Step (4): C -> c Step (5): C -> c *BC (*baccc) *bCC (*baccc) *baCcC (*baccc) *baccC (*baccc) *baccc (*baccc)

  21. Left Factoring • Recall the grammar E  T + E | T T  int | int * T | ( E ) • Hard to predict because • For T two productions start with int • For E it is not clear how to predict • A grammar must be left-factored before use for predictive parsing

  22. Left-Factoring Example • Recall the grammar E  T + E | T T  int | int * T | ( E ) • Factor out common prefixes of productions E  T X X  + E |  T  ( E ) | int Y Y  * T | 

  23. Checking whether we are fine • E  T X Only 1 production • X  + E |  Different 1st token • T  ( E ) | int Y Different 1st token • Y  * T |  Different 1st token Then we can simply perform predictive parsing as in the example before.

  24. We may still have some issues: Non-Terminals • Consider this grammar E  T X | X * X Which one to choose? X  + E |  T  ( E ) | int Y Y  * T |  if +... ? if (... ? if int ... ?

  25. We may still have some issues: Epsilon • Consider this grammar E  T X | X * X Which one to choose? X  + E |  T  ( E ) | int Y Y  * T |  What about if *…. ?

  26. Constructing Parsing Tables • If A , whether we should select  ? • Given a token t where t can start a string derived from  •  * t  • We say that t  First() • Given a token t if  is  and t can follow an A • S *  A t  • We say t  Follow(A)

  27. Computing First Sets Definition: First(X) = { t | X * t}  { | X * } Algorithm sketch (see book for details): • for all terminals t do First(t)  { t } • for each production X   do First(X)  {  } • if X  A1 … An  and   First(Ai), 1  i  n do • add First() - {} to First(X) • for each X  A1 … An s.t.   First(Ai), 1  i  n do • add  to First(X) • repeat steps 3 & 4 until no First set can be grown

  28. First Sets. Example • Recall the grammar E  X T X  + E |  T  ( E ) | int Y Y  * T |  • First sets First( ( ) = { ( } First( T ) = {int, ( } First( ) ) = { ) } First( E ) = {+, int, (} First( int) = { int } First( X ) = {+,  } First( + ) = { + } First( Y ) = {*,  } First( * ) = { * }

  29. Computing Follow Sets • Definition: Follow(X) = { t | S *  X t  } • Intuition • If S is the start symbol then $  Follow(S) • If X  A B then First(B)  Follow(A) and Follow(X)  Follow(B) • Also if B *  then Follow(X)  Follow(A)

  30. Computing Follow Sets (Cont.) Algorithm sketch: • Follow(S)  { $ } • For each production A   X  • add First() - {} to Follow(X) • For each A   X  where   First() • add Follow(A) to Follow(X) • repeat step(s) until no Follow set grows

  31. Follow Sets. Example • Recall the grammar E  T X X  + E |  T  ( E ) | int Y Y  * T |  • Follow sets Follow( + ) = { int, ( } Follow( * ) = { int, ( } Follow( ( ) = { int, ( } Follow( E ) = {), $} Follow( X ) = {$, ) } Follow( T ) = {+, ) , $} Follow( ) ) = {+, ) , $} Follow( Y ) = {+, ) , $} Follow( int) = {*, +, ) , $}

  32. Constructing LL(1) Parsing Tables • Construct a parsing table T for CFG G • For each production A   in G do: • For each terminal t  First() do • T[A, t] =  • If   First(), for each t  Follow(A) do • T[A, t] =  • If   First() and $  Follow(A) do • T[A, $] = 

  33. LL(1) Parsing Table Example • Left-factored grammar E  T X | X X X  + E |  T  ( E ) | int Y Y  * T |  • The LL(1) parsing table:

  34. LL(1) Parsing Tables. Errors • Blank entries indicate error situations • Consider the [E,*] entry • “There is no way to derive a string starting with * from non-terminal E”

  35. Notes on LL(1) Parsing Tables • If any entry is multiply defined then G is not LL(1) • If G is ambiguous • If G is left recursive • If G is not left-factored • And in other cases as well • Most programming language grammars are not LL(1) • There are tools that build LL(1) tables

  36. More Examples on First Set and Follow Set {'x', ε} A -> ‘x’ B | ε B-> ‘y’ | ε C -> ‘z’ E -> ABC F -> CBA S -> E | F First (A) = ? {‘y', ε} First (B) = ? {‘z’} First (C) = ? First (E) = ? First (A)/{ε} U First (B)/{ε} U First (C) ={‘x’, ‘y’, ‘z’} First (F) = ? First (C) ={‘z’} First (E) U First (F) = {‘x’, ‘y’, ‘z’} First (S) = ?

  37. C -> ‘z’ B-> ‘y’ | ε A -> ‘x’ B | ε Example: Follow Set • S -> E | F Follow (S) = {$}, Follow (E) = Follow (F) = Follow (S) = {$} • F -> CBA Follow (A) = Follow (F) = {$} Follow (B) = First (A) U Follow (F) / {ε} = {‘x’, $} Follow (C) = First (A) U First (B) U Follow (F) / {ε} = {‘x’, ‘y’, $} • E -> ABC Follow (A) U= (First (B) U First (C) / {ε}) = {‘y’, ‘z’, ‘$’} Follow (B) U= First (C) / {ε} = {‘x’, ‘z’, $} Follow (C) U= Follow (E) = {‘x’, ‘y’, $}

  38. S -> E | F E -> ABC F -> CBA Example: Follow Set • C -> ‘z’ • B-> ‘y’ | ε • A -> ‘x’ B | ε Follow (B) U= Follow (A) = {‘x’, ‘y’, ‘z’, $}

  39. A -> ‘x’ B | ε E -> ABC B-> ‘y’ | ε F -> CBA C -> ‘z’ S -> E | F Summary • First (A) = {‘x’, ε} • First (B) = {‘y’, ε} • First (C) = {‘z’} • First (E) = {‘x’, ‘y’, ‘z’} • First (F) = {‘z’} • First (S) = {‘x’, ‘y’, ‘z’} Follow (A) = {‘y’, ‘z’, ‘$’} Follow (B) = {‘x’, ‘y’, ‘z’, $} Follow (C) = {‘x’, ‘y’, $} Follow (E) = {$} Follow (F) = {$} Follow (S) = {$}

  40. A -> ‘x’ B | ε E -> ABC B-> ‘y’ | ε F -> CBA C -> ‘z’ S -> E | F Table Construction S -> E, First (E) = {'x', 'y', 'z'} and ε not in First (S)

  41. A -> ‘x’ B | ε E -> ABC B-> ‘y’ | ε F -> CBA C -> ‘z’ S -> E | F Table Construction S -> F, First (F) = {'z'} and ε not in First (S)

  42. A -> ‘x’ B | ε E -> ABC B-> ‘y’ | ε F -> CBA C -> ‘z’ S -> E | F Table Construction F -> CBA, First (CBA) = {'z'} and ε not in First (F)

  43. A -> ‘x’ B | ε E -> ABC B-> ‘y’ | ε F -> CBA C -> ‘z’ S -> E | F Table Construction E -> ABC, First (ABC) = {‘x’, ‘y’, 'z'} and ε not in First (F)

  44. A -> ‘x’ B | ε E -> ABC B-> ‘y’ | ε F -> CBA C -> ‘z’ S -> E | F Table Construction C -> ‘z’, First (‘z’) = {'z'} and ε not in First (C)

  45. A -> ‘x’ B | ε E -> ABC B-> ‘y’ | ε F -> CBA C -> ‘z’ S -> E | F Table Construction B -> ‘y’, First (‘y’) = {‘y'} and ε in First (B)

  46. A -> ‘x’ B | ε E -> ABC B-> ‘y’ | ε F -> CBA C -> ‘z’ S -> E | F Table Construction B -> ε, Follow (B) = {‘x’, ‘y‘, ‘z’, $} and ε in First (B)

  47. A -> ‘x’ B | ε E -> ABC B-> ‘y’ | ε F -> CBA C -> ‘z’ S -> E | F Table Construction A -> ‘x’ B, First (A) = {‘x’} and ε in First (A)

  48. A -> ‘x’ B | ε E -> ABC B-> ‘y’ | ε F -> CBA C -> ‘z’ S -> E | F Table Construction A -> ε, Follow (A) = {‘y’, ‘z’, $} and ε in First (A)

  49. A -> ‘x’ B | ε E -> ABC B-> ‘y’ | ε F -> CBA C -> ‘z’ S -> E | F Table Construction Not a LL(1), Conflicts!!!

  50. The process of Recursive Descent Parsing • Consider the grammar E  T X X  + E |  T  ( E ) | int Y Y  * T |  • Token stream is: int* int

More Related