350 likes | 359 Views
Understand top-down parsing, LL(1) grammars, eliminating ambiguity, and transforming grammars for efficient parsing with recursive-descent methods. Learn how to generate abstract syntax trees.
E N D
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 4: Top-down parsing
Outline • Eliminating ambiguity in CFGs • Top-down parsing • LL(1) grammars • Transforming a grammar into LL form • Recursive-descent parsing - parsing made simple CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Where we are Source code (character stream) Lexical analysis if ( b == 0 ) a = b ; Token stream Syntactic Analysis Parsing/build AST if ; == = Abstract syntax tree (AST) b 0 a b Semantic Analysis CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Review of CFGs • Context-free grammars can describe programming-language syntax • Power of CFG needed to handle common PL constructs (e.g., parens) • String is in language of a grammar if derivation from start symbol to string • Top-down and bottom-up parsing correspond to left-most and right-most derivations • Ambiguous grammars a problem CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
if-then-else • How to write a grammar for if stmts? S if (E) S S if (E) S else S S other Is this grammar ok? CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
No—Ambiguous! S if (E) S S if (E) S else S S other • How to parse if (E) if (E)Selse S Which “if” is the “else” attached to? S if (E)S if (E)if (E) S else S S if (E)Selse S if (E)if (E) S else S CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Grammar for Closest-if Rule • Want to rule out if (E)if (E) S else S • Problem: unmatched if may not occur as the “then” clause of a containing “if” statement matched | unmatched matched if (E) matched else matched | other unmatched if (E) statement | if (E) matched else unmatched CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Top-down Parsing • Grammars for top-down parsing • Implementing a top-down parser (recursive descent parser) • Generating an abstract syntax tree CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
S S + E | E E number | ( S ) Parsing a String Top-down Partly-derived String Lookahead String S( (1+2+(3+4))+5 S+E ( (1+2+(3+4))+5 E+E ( (1+2+(3+4))+5 (S)+E 1 (1+2+(3+4))+5 (S+E)+E 1 (1+2+(3+4))+5 (S+E+E)+E 1 (1+2+(3+4))+5 (E+E+E)+E 1 (1+2+(3+4))+5 (1+E+E)+E 2 (1+2+(3+4))+5 (1+2+E)+E ( (1+2+(3+4))+5 parsed partunparsed part CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
S S + E | E E number | ( S ) Problem • Want to decide which production to apply based on next symbol (1) S E (S) (E) (1) (1)+2 S S+E E+E (S)+E (E)+E (E)+E (1)+E (1)+2 • Why is this hard? CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
S S + E | E E number | ( S ) Top-down parsing (1+2+(3+4))+5 S S+E E+E (S)+E (S+E)+E (S+E+E)+E (E+E+E)+E (1+E+E)+E(1+2+E)+E ... (1+2+(3+4))+5 • Entire tree above a token (2) has been expanded when encountered S S + E E 5 ( S ) S + E ( S ) S + E E S + E 2 4 1 E 3 CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Grammar is Problem • This grammar cannot be parsed top-down with only a single look-ahead symbol • NotLL(1) • Left-to-right-scanning, Left-most derivation, 1 look-ahead symbol • Can rewrite grammar to allow top-down parsing: create LL(1) grammar for same language CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Making an LL(1) grammar S S + E S E E number E ( S ) • Problem: can’t decide which S production to apply until we see symbol after first expression • Solution: Add new non-terminal S’ at decision point. S’ derives (+E)* S ES’ S’ S’ + S E number E ( S ) CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
S E S’ S’ | + S E number | ( S ) Parsing with new grammar S( (1+2+(3+4))+5 ES’ ( (1+2+(3+4))+5 (S) S’ 1 (1+2+(3+4))+5 (ES’) S’1 (1+2+(3+4))+5 (1 S’) S’ + (1+2+(3+4))+5 (1+ES’) S’2 (1+2+(3+4))+5 (1+2 S’) S’ + (1+2+(3+4))+5 (1+2 + S) S’ ( (1+2+(3+4))+5 (1+2 + ES’) S’ ( (1+2+(3+4))+5 (1+2 + (S) S’ ) S’ 3 (1+2+(3+4))+5 (1+2 + (ES’) S’ ) S’ 3 (1+2+(3+4))+5 (1+2 + (3 S’) S’ ) S’ + (1+2+(3+4))+5 (1+2 + (3 + E) S’ ) S’ 4 (1+2+(3+4))+5 CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Predictive Parsing Table • LL(1) grammar: • for a given non-terminal, the look-ahead symbol uniquely determines the production to apply • Can write as a table of non-terminals x input symbols productions • predictive parsing CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
S ES’ S’ | + S E number | ( S ) Using Table S( (1+2+(3+4))+5 ES’ ( (1+2+(3+4))+5 (S) S’ 1 (1+2+(3+4))+5 (ES’) S’1 (1+2+(3+4))+5 (1 S’) S’ + (1+2+(3+4))+5 (1 + S) S’ 2 (1+2+(3+4))+5 (1+ES’) S’2 (1+2+(3+4))+5 (1+2 S’) S’ + (1+2+(3+4))+5 number + ( ) $ S E S’ E S’ S’ +S E number ( S ) EOF CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
How to Implement? • Table can be converted easily into a recursive-descent parser number + ( ) $ S E S’ E S’ S’ +S E number ( S ) • Three procedures: parse_S, parse_S’, parse_E CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Recursive-Descent Parser void parse_S () { switch (token) { case number: parse_E(); parse_S’(); return; case ‘(’: parse_E(); parse_S’(); return; default: throw new ParseError(); } } number + ( ) $ S ES’ ES’ S’ +S E number ( S ) CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Recursive-Descent Parser void parse_S’() { switch (token) { case ‘+’: token = input.read(); parse_S(); return; case ‘)‘: return; case EOF: return; default: throw new ParseError(); } } number + ( ) $ S ES’ ES’ S’ +S E number ( S ) CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Recursive-Descent Parser void parse_E() { switch (token) { case number: token = input.read(); return; case ‘(‘: token = input.read(); parse_S(); if (token != ‘)’) throw new ParseError(); token = input.read(); return; default: throw new ParseError(); } } number + ( ) $ S ES’ ES’ S’ +S E number ( S ) CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
N + ( ) $ S ES’ ES’ S’ +S E N ( S ) Call Tree = Parse Tree S ES’ S’ | + S E number | ( S ) S (1 + 2 + (3 + 4)) + 5 E S’ ( S ) + S E S’ 5 1 + S E S’ 2 + S E S’ ( S ) E S’ + S 3 E 4 CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
N + ( ) $ S ES’ ES’ S’ +S E N ( S ) How to Construct Parsing Tables • Needed: algorithm for automatically generating a predictive parse table from a grammar ? S ES’ S’ | + S E number | ( S ) CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Constructing Parse Tables • Can construct predictive parser if: For every non-terminal, every look-ahead symbol can be handled by at most one production • FIRST() for arbitrary string of terminals and non-terminals is: • set of symbols that might begin the fully expanded version of • FOLLOW(X) for a non-terminal X is: • set of symbols that might follow the derivation of X in the input stream CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
N+( ) $ S ES’ ES’ S’ +S E N ( S ) Parse Table Entries • Consider a production X • Add to the X row for each symbol in FIRST() • If can derive ( is nullable), add for each symbol in FOLLOW(X) • Grammar is LL(1) if no conflicts CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Computing nullable, FIRST • X is nullable if • it derives directly • it has a production X YZ... where all RHS symbols (Y, Z) are nullable • Algorithm: assume not nullable, apply rules repeatedly until no change in status • Determining FIRST() • FIRST(a ) = { a } • FIRST(X) FIRST(X) • FIRST(X) FIRST() if X is nullable • Algorithm: Assume FIRST() = {} for all , apply rules repeatedly CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Computing FOLLOW • FOLLOW(S) { $ } • If X Y, FOLLOW(Y) FIRST() • If X Y and is nullable (or non-existent), FOLLOW(Y) FOLLOW(X) • Algorithm: Assume FOLLOW(X) = { } for all X, apply rules repeatedly • Common theme: iterative analysis. Start with initial assignment, apply rules until no change CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
num+( ) $ S ES’ ES’ S’ +S E num ( S ) S ES’ S’ | + S E number | ( S ) Applying Rules • nullable • only S’ is nullable • FIRST • FIRST(E S’ ) = { + , ( } • FIRST(+S) = { + } • FIRST(number) = { number } • FIRST( (S) ) = { ( } • FOLLOW • FOLLOW(S) = { $, ), + } • FOLLOW(S’) = { ), $} • FOLLOW(E) = { +, ) } CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Completing the parser Now we know how to construct a recursive-descent parser for an LL(1) grammar. Can we use recursive descent to build an abstract syntax tree too? CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Creating the AST Expr abstract class Expr { } class Add extends Expr { Expr left, right; Add(Expr L, Expr R) { left = L; right = R; } } class Num extends Expr { int value; Num (int v) { value = v); } Add Num CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
AST Representation (1 + 2 + (3 + 4)) + 5 + Add + 5 Add Num (5) 1 + 2 + Num(1) Add 3 4 Num(2) Add Num(3) Num(4) How can we generate this structure during recursive-descent parsing? CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Creating the AST • Just add code to each parsing routine to create the appropriate nodes! • Works because parse tree and call tree have same shape • parse_S, parse_S’, parse_E all return an Expr CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
AST creation code Expr parse_E() { switch(token) { // E number case number: Expr result = Num (token.value); token = input.read(); return result; case ‘(‘: // E ( S ) token = input.read(); Exprresult= parse_S(); if (token != ‘)’) throw new ParseError(); token = input.read(); return result; default: throw new ParseError(); } } CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
parse_S S ES’ S’ | + S E number | ( S ) Expr parse_S() { switch (token) { case number: case ‘(’: Exprleft=parse_E(); Expr right =parse_S’(); if (right == null) return left; else return new Add(left, right); default: throw new ParseError(); } } CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
An Interpreter! int parse_E() { switch(token) { case number: int result = token.value; token = input.read(); return result; case ‘(‘: token = input.read(); int result = parse_S(); if (token != ‘)’) throw new ParseError(); token = input.read(); return result; default: throw new ParseError(); }} int parse_S() { switch (token) { case number: case ‘(’: int left =parse_E(); int right =parse_S’(); if (right == 0) return left; else return left + right; default: throw new ParseError(); } } CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Summary • We can build a recursive-descent parser for LL(1) grammars • Construct parsing table using FIRST, &c • Translate to recursive-descent code • Systematic approach avoids errors, detects ambiguities • Next time: converting a grammar to LL(1) form, bottom-up parsing CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers