1 / 35

Mastering Top-Down Parsing in Compilers

Understand top-down parsing, LL(1) grammars, eliminating ambiguity, and transforming grammars for efficient parsing with recursive-descent methods. Learn how to generate abstract syntax trees.

bhibbard
Download Presentation

Mastering Top-Down Parsing in Compilers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 4: Top-down parsing

  2. Outline • Eliminating ambiguity in CFGs • Top-down parsing • LL(1) grammars • Transforming a grammar into LL form • Recursive-descent parsing - parsing made simple CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  3. Where we are Source code (character stream) Lexical analysis if ( b == 0 ) a = b ; Token stream Syntactic Analysis Parsing/build AST if ; == = Abstract syntax tree (AST) b 0 a b Semantic Analysis CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  4. Review of CFGs • Context-free grammars can describe programming-language syntax • Power of CFG needed to handle common PL constructs (e.g., parens) • String is in language of a grammar if derivation from start symbol to string • Top-down and bottom-up parsing correspond to left-most and right-most derivations • Ambiguous grammars a problem CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  5. if-then-else • How to write a grammar for if stmts? S  if (E) S S if (E) S else S S other Is this grammar ok? CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  6. No—Ambiguous! S  if (E) S S if (E) S else S S other • How to parse if (E) if (E)Selse S Which “if” is the “else” attached to? S if (E)S  if (E)if (E) S else S S if (E)Selse S  if (E)if (E) S else S CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  7. Grammar for Closest-if Rule • Want to rule out if (E)if (E) S else S • Problem: unmatched if may not occur as the “then” clause of a containing “if” statement  matched | unmatched matched  if (E) matched else matched | other unmatched  if (E) statement | if (E) matched else unmatched CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  8. Top-down Parsing • Grammars for top-down parsing • Implementing a top-down parser (recursive descent parser) • Generating an abstract syntax tree CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  9. S  S + E | E E number | ( S ) Parsing a String Top-down Partly-derived String Lookahead String S( (1+2+(3+4))+5  S+E ( (1+2+(3+4))+5  E+E ( (1+2+(3+4))+5  (S)+E 1 (1+2+(3+4))+5  (S+E)+E 1 (1+2+(3+4))+5  (S+E+E)+E 1 (1+2+(3+4))+5  (E+E+E)+E 1 (1+2+(3+4))+5  (1+E+E)+E 2 (1+2+(3+4))+5  (1+2+E)+E ( (1+2+(3+4))+5 parsed partunparsed part CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  10. S  S + E | E E number | ( S ) Problem • Want to decide which production to apply based on next symbol (1) S E (S) (E) (1) (1)+2 S S+E E+E  (S)+E (E)+E  (E)+E  (1)+E (1)+2 • Why is this hard? CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  11. S  S + E | E E number | ( S ) Top-down parsing (1+2+(3+4))+5 S S+E E+E  (S)+E (S+E)+E (S+E+E)+E (E+E+E)+E (1+E+E)+E(1+2+E)+E ... (1+2+(3+4))+5 • Entire tree above a token (2) has been expanded when encountered S S + E E 5 ( S ) S + E ( S ) S + E E S + E 2 4 1 E 3 CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  12. Grammar is Problem • This grammar cannot be parsed top-down with only a single look-ahead symbol • NotLL(1) • Left-to-right-scanning, Left-most derivation, 1 look-ahead symbol • Can rewrite grammar to allow top-down parsing: create LL(1) grammar for same language CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  13. Making an LL(1) grammar S  S + E S  E E number E ( S ) • Problem: can’t decide which S production to apply until we see symbol after first expression • Solution: Add new non-terminal S’ at decision point. S’ derives (+E)* S  ES’ S’   S’ + S E number E ( S ) CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  14. S  E S’ S’   | + S E number | ( S ) Parsing with new grammar S( (1+2+(3+4))+5  ES’ ( (1+2+(3+4))+5  (S) S’ 1 (1+2+(3+4))+5  (ES’) S’1 (1+2+(3+4))+5  (1 S’) S’ + (1+2+(3+4))+5  (1+ES’) S’2 (1+2+(3+4))+5  (1+2 S’) S’ + (1+2+(3+4))+5  (1+2 + S) S’ ( (1+2+(3+4))+5  (1+2 + ES’) S’ ( (1+2+(3+4))+5  (1+2 + (S) S’ ) S’ 3 (1+2+(3+4))+5  (1+2 + (ES’) S’ ) S’ 3 (1+2+(3+4))+5  (1+2 + (3 S’) S’ ) S’ + (1+2+(3+4))+5  (1+2 + (3 + E) S’ ) S’ 4 (1+2+(3+4))+5 CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  15. Predictive Parsing Table • LL(1) grammar: • for a given non-terminal, the look-ahead symbol uniquely determines the production to apply • Can write as a table of non-terminals x input symbols  productions • predictive parsing CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  16. S  ES’ S’   | + S E number | ( S ) Using Table S( (1+2+(3+4))+5  ES’ ( (1+2+(3+4))+5  (S) S’ 1 (1+2+(3+4))+5  (ES’) S’1 (1+2+(3+4))+5  (1 S’) S’ + (1+2+(3+4))+5  (1 + S) S’ 2 (1+2+(3+4))+5  (1+ES’) S’2 (1+2+(3+4))+5  (1+2 S’) S’ + (1+2+(3+4))+5 number + ( ) $ S E S’ E S’ S’  +S  E  number  ( S ) EOF CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  17. How to Implement? • Table can be converted easily into a recursive-descent parser number + ( ) $ S E S’ E S’ S’  +S  E  number  ( S ) • Three procedures: parse_S, parse_S’, parse_E CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  18. Recursive-Descent Parser void parse_S () { switch (token) { case number: parse_E(); parse_S’(); return; case ‘(’: parse_E(); parse_S’(); return; default: throw new ParseError(); } } number + ( ) $ S ES’ ES’ S’  +S  E  number  ( S ) CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  19. Recursive-Descent Parser void parse_S’() { switch (token) { case ‘+’: token = input.read(); parse_S(); return; case ‘)‘: return; case EOF: return; default: throw new ParseError(); } } number + ( ) $ S ES’ ES’ S’  +S  E  number  ( S ) CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  20. Recursive-Descent Parser void parse_E() { switch (token) { case number: token = input.read(); return; case ‘(‘: token = input.read(); parse_S(); if (token != ‘)’) throw new ParseError(); token = input.read(); return; default: throw new ParseError(); } } number + ( ) $ S ES’ ES’ S’  +S  E  number  ( S ) CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  21. N + ( ) $ S ES’ ES’ S’  +S  E  N  ( S ) Call Tree = Parse Tree S  ES’ S’   | + S E number | ( S ) S (1 + 2 + (3 + 4)) + 5 E S’ ( S ) + S E S’ 5 1 + S E S’ 2 + S E S’  ( S ) E S’ + S 3 E 4 CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  22. N + ( ) $ S ES’ ES’ S’  +S  E  N  ( S ) How to Construct Parsing Tables • Needed: algorithm for automatically generating a predictive parse table from a grammar ? S  ES’ S’   | + S E number | ( S ) CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  23. Constructing Parse Tables • Can construct predictive parser if: For every non-terminal, every look-ahead symbol can be handled by at most one production • FIRST() for arbitrary string of terminals and non-terminals  is: • set of symbols that might begin the fully expanded version of  • FOLLOW(X) for a non-terminal X is: • set of symbols that might follow the derivation of X in the input stream CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  24. N+( ) $ S ES’ ES’ S’ +S  E N  ( S ) Parse Table Entries • Consider a production X   • Add   to the X row for each symbol in FIRST() • If  can derive  ( is nullable), add   for each symbol in FOLLOW(X) • Grammar is LL(1) if no conflicts CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  25. Computing nullable, FIRST • X is nullable if • it derives  directly • it has a production X YZ... where all RHS symbols (Y, Z) are nullable • Algorithm: assume not nullable, apply rules repeatedly until no change in status • Determining FIRST() • FIRST(a ) = { a } • FIRST(X)  FIRST(X) • FIRST(X)  FIRST() if X is nullable • Algorithm: Assume FIRST() = {} for all , apply rules repeatedly CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  26. Computing FOLLOW • FOLLOW(S)  { $ } • If X  Y, FOLLOW(Y)  FIRST() • If X  Y and  is nullable (or non-existent), FOLLOW(Y)  FOLLOW(X) • Algorithm: Assume FOLLOW(X) = { } for all X, apply rules repeatedly • Common theme: iterative analysis. Start with initial assignment, apply rules until no change CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  27. num+( ) $ S ES’ ES’ S’ +S  E num  ( S ) S  ES’ S’   | + S E number | ( S ) Applying Rules • nullable • only S’ is nullable • FIRST • FIRST(E S’ ) = { + , ( } • FIRST(+S) = { + } • FIRST(number) = { number } • FIRST( (S) ) = { ( } • FOLLOW • FOLLOW(S) = { $, ), + } • FOLLOW(S’) = { ), $} • FOLLOW(E) = { +, ) } CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  28. Completing the parser Now we know how to construct a recursive-descent parser for an LL(1) grammar. Can we use recursive descent to build an abstract syntax tree too? CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  29. Creating the AST Expr abstract class Expr { } class Add extends Expr { Expr left, right; Add(Expr L, Expr R) { left = L; right = R; } } class Num extends Expr { int value; Num (int v) { value = v); } Add Num CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  30. AST Representation (1 + 2 + (3 + 4)) + 5 + Add + 5 Add Num (5) 1 + 2 + Num(1) Add 3 4 Num(2) Add Num(3) Num(4) How can we generate this structure during recursive-descent parsing? CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  31. Creating the AST • Just add code to each parsing routine to create the appropriate nodes! • Works because parse tree and call tree have same shape • parse_S, parse_S’, parse_E all return an Expr CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  32. AST creation code Expr parse_E() { switch(token) { // E  number case number: Expr result = Num (token.value); token = input.read(); return result; case ‘(‘: // E  ( S ) token = input.read(); Exprresult= parse_S(); if (token != ‘)’) throw new ParseError(); token = input.read(); return result; default: throw new ParseError(); } } CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  33. parse_S S  ES’ S’   | + S E number | ( S ) Expr parse_S() { switch (token) { case number: case ‘(’: Exprleft=parse_E(); Expr right =parse_S’(); if (right == null) return left; else return new Add(left, right); default: throw new ParseError(); } } CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  34. An Interpreter! int parse_E() { switch(token) { case number: int result = token.value; token = input.read(); return result; case ‘(‘: token = input.read(); int result = parse_S(); if (token != ‘)’) throw new ParseError(); token = input.read(); return result; default: throw new ParseError(); }} int parse_S() { switch (token) { case number: case ‘(’: int left =parse_E(); int right =parse_S’(); if (right == 0) return left; else return left + right; default: throw new ParseError(); } } CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

  35. Summary • We can build a recursive-descent parser for LL(1) grammars • Construct parsing table using FIRST, &c • Translate to recursive-descent code • Systematic approach avoids errors, detects ambiguities • Next time: converting a grammar to LL(1) form, bottom-up parsing CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

More Related