1 / 64

LL(k) Parsing

Learn about context-free grammars for parsing in compilers, derivation techniques, parse trees, ambiguous grammars, and parser implementation with a focus on LL(k) parsing.

jaddison
Download Presentation

LL(k) Parsing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LL(k) Parsing Compiler Baojian Hua bjhua@ustc.edu.cn

  2. Front End lexical analyzer source code tokens abstract syntax tree parser semantic analyzer IR

  3. Parsing • The parser translates the source program into abstract syntax trees • Token sequence: • from the lexer • abstract syntax trees: • compiler internal data structures for programs • check (syntactic) validity of programs • Must take account the program syntax

  4. Conceptually parser token sequence abstract syntax tree language syntax

  5. Syntax: Context-free Grammar • Context-free grammars are (often) given by BNF expressions (Backus-Naur Form) • read dragon-book sec 2.2 • More powerful than RE in theory • Good for defining language syntax

  6. Context-free Grammar (CFG) • A CFG consists of 4 components: • a set of terminals (tokens): T • a set of nonterminals: N • a set of production rules: P • s -> t1 t2 … tn • with sN, and t1, …, tn (T ∪N) • a unique start nonterminal: S

  7. Example // SLP as in Tiger book chap. 1 (simplified): N = {S, E} T = {SEMICOLON, ID, IF, ASSIGN, …} S = S S -> S SEMICOLON S | ID ASSIGN E | PRINT LPAREN E RPAREN E -> ID | NUM | E PLUS E | E TIMES E

  8. Derivation • A derivation: • Starts with the unique start nonterminal S • repeatedly replacing a right-hand nonterminal s by the body of a production rule of the nonterminal s • stop when right-hand are all terminals • The final string consists of terminals only and is called a sentence (program)

  9. Example S -> S ; S | id := E | print (E) E -> … S -> … (a choice) derive me x := 5; print (x)

  10. Example S -> S ; S | id := E | print (E) E -> id | num | E + E | E * E S -> S ; S -> x := E ; S -> x := 5 ; S -> x := 5 ; print (E) -> x := 5 ; print (x) derive me x := 5; print (x)

  11. Another Try to Derive the same Program S -> S ; S | id := E | print (E) E -> … S -> x := E -> x := 5 -> // stuck! :-( derive me x := 5; print (x)

  12. Derivation • For same string, there may exist many different derivations • left-most derivation • right-most derivation • Parsing is the problem of taking a string of terminals and figure out whether it could be derived from a CFG • error-detection

  13. Parse Trees • Derivation can also be represented as trees • useful to understand AST (discussed later) • Idea: • each internal node is labeled with a nonterminal • each leaf node is labeled with a terminal • each use of a rule in a derivation explains how to generate children in the parse tree from the parent

  14. Example S -> S ; S | … S S S ; derive me := E print ( E ) x x := 5; print (x) 5 x

  15. Parse Tree has Meanings:post-order traversal S -> S ; S | … S S S ; derive me := E print ( E ) x x := 5; print (x) 5 x

  16. Ambiguous Grammars • A grammar is ambiguous if the same sequence of tokens can give rise to two or more different parse trees

  17. Example E -> num | id | E + E | E * E E -> E + E -> 3 + E -> 3 + E * E -> 3 + 4 * E -> 3 + 4 * 5 derive me E -> E * E -> E + E * E -> 3 + E * E -> 3 + 4 * E -> 3 + 4 * 5 3+4*5

  18. Example E E + E 3 E * E E -> num | id | E + E | E * E 4 5 E -> E + E -> 3 + E -> 3 + E * E -> 3 + 4 * E -> 3 + 4 * 5 E E * E E -> E * E -> E + E * E -> 3 + E * E -> 3 + 4 * E -> 3 + 4 * 5 5 E + E 3 4

  19. Ambiguous Grammars • Problem: compilers make use of parse trees to interpret the meaning of parsed programs • different parse trees have different meanings • eg: 4 + 5 * 6 is not (4 + 5) * 6 • languages with ambiguous grammars are DISASTROUS; the meaning of programs isn’t well-defined! You can’t tell what your program might do! • Solution: rewrite grammar to equivalent forms

  20. Eliminating ambiguity • In programming language syntax, ambiguity often arises from missing operator precedence or associativity • * is of high precedence than + • both + and * are left-associative • Why or why not? • Rewrite grammar to take account of this

  21. Example E -> num | id | E + E | E * E E -> E + T | T T -> T * F | F F -> num | id Q: is the right grammar ambiguous? Why or why not?

  22. Parser • A program to check whether a program is derivable from a given grammar • expensive in general • must be fast • to compile a 2000k lines of kernel • even for small application code, speed may be a concern • Theorists have developed specialized kind of grammar which may be parsed efficiently • LL(k) and LR(k)

  23. Recursive Decedent Parsing

  24. Predictive parsing • A.K.A: Recursive descent parsing, top-down parsing • simple to code by hand • efficient • can parse a large set of grammar • your Tiger compiler will use this • Key idea: • one (recursive) function for each nonterminal • one clause for each right-hand production rule

  25. Connecting with the lexer (* step #1: represent tokens *) token = ID | IF | NUM | ASSIGN | SEMICOLON | LPAREN | RPAREN | … (* step #2: connect with lexer *) token current_token; /* external var */ void eat (token t) = if (current_token = t) current_token = Lex_nextToken (); else error (“want “, t, “but got”, current_token)

  26. parse_stm() parse_exp() (* step #1: cook a lexer, including tokens *) struct token current_token = lex (); (* step #2: build the parser *) void parse_stm () = switch (current_token) case ID => eat (ID); eat (ASSIGN); parse_exp (); case PRINT => eat (PRINT); eat (LPAREN); parse_exp (); eat (RPAREN); default =>error(“want ID, PRINT”); void parse_exp () = switch (current_token) case ID: ??? case NUM: ??? // backtracking!! stm -> stm ; stm | id := exp | print (exp) exp -> ID | NUM | exp + exp | exp * exp

  27. How to handle precedence? void parse_stm_all () parse_stm(); while (current_token == “;”) eat (;); parse_stm (); void parse_exp_plus () parse_exp_times(); while (current_token == “+”) eat (+); parse_exp_times(); stm ; stm ; stm ; stm 2+3*4+5*7 Generally: if there are n level of precedence, one may write n parsing functions.

  28. Moral • The key point in predicative parsing is to determine the production rule to use (recursive function to call) • must know the “start” symbols of each rule • “start” symbol must not overlap • e.g.: • exp -> NUM | ID • This motivates the idea of first and follow sets

  29. First & Follow

  30. Moral • For nonterminal S, and current input token t • if wk starts with t, then choose wk, or • if wk derives empty string, and the string follow S starts with t • First symbol sets of wi (1<=i<=n) don’t overlap to avoid backtracking S -> w1 -> w2 -> … -> wn

  31. Nullable, First and Follow sets • To use predicative parsing, we must compute: • Nullable: nonterminals that derive empty string • First(ω) : set of terminals that can begin any string derivable from ω • Follow(X): set of terminals that can immediately follow any string derivable from nonterminal X • Read tiger sec 3.2 • Fixpoint algorithms

  32. Nullable, First and Follow sets Z -> d -> X Y Z Y -> c -> X -> Y -> a • Which symbol X, Y and Z can derive empty string? • What terminals may the string derived from X, Y and Z begin with? • What terminals may follow X, Y and Z?

  33. Nullable • If X can derive an empty string, iff: • base case: • X -> • inductive case: • X -> Y1 … Yn • Y1, …, Yn are n nonterminals and may all derive empty strings

  34. Computing Nullable /* Nullable: a set of nonterminals */ Nullable <- {}; while (Nullable still changes) for (each production X -> α) switch (α) case : Nullable ∪= {X}; break; case Y1 … Yn: if (Y1Nullable && … && YnNullable) Nullable ∪= {X}; break;

  35. Example: Nullables Z -> d -> X Y Z Y -> c -> X -> Y -> a

  36. Example: Nullables Z -> d -> X Y Z Y -> c -> X -> Y -> a

  37. Example: Nullables Z -> d -> X Y Z Y -> c -> X -> Y -> a

  38. First(X) • Set of terminals that X begins with: • X => a… • Rules • base case: • X -> a • First (X) ∪= {a} • inductive case: • X -> Y1 Y2 … Yn • First (X) ∪= First(Y1) • if Y1Nullable, First (X) ∪= First(Y2) • if Y1,Y2 Nullable, First (X) ∪= First(Y3) • …

  39. Computing First // Suppose Nullable set has been computed foreach (nonterminal X) First(X) <- {}; while (some First set still changes) for (each production X -> α) switch (α) case a: First(X) ∪= {a}; break; case Y1 … Yn: First(X) ∪= First(Y1); if (Y1 \not\in Nullable) break; First(X) ∪= First(Y2); …; // Similar as above

  40. Example: First Z -> d -> X Y Z Y -> c -> X -> Y -> a Nullable = {X, Y}

  41. Example: First Z -> d -> X Y Z Y -> c -> X -> Y -> a Nullable = {X, Y}

  42. Example: First Z -> d -> X Y Z Y -> c -> X -> Y -> a Nullable = {X, Y}

  43. Example: First Z -> d -> X Y Z Y -> c -> X -> Y -> a Nullable = {X, Y}

  44. Parsing with First Z -> d {d} -> X Y Z {a, c, d} Y -> c {c} -> {} X -> Y {c} -> a {a} Now consider this string: d Suppose we choose the production: Z -> X Y Z But we get stuck at: X -> Y -> a But neither can accept d! What’s the problem? Nullable = {X, Y}

  45. Follow(X) • Set of terminals that may follow X: • S => … X a… • Rules: • Base case: • Follow (X) = {} • inductive case: • Y -> ω1 X ω2 • Follow(X) ∪= Fisrt(ω2) • if ω2 is Nullable, Follow(X) ∪= Follow(Y)

  46. Computing Follow(X) foreach (nonterminal X) Follow(X) <- {}; while (some Follow still changes) { for (each production Y -> ω1 X ω2 ) Follow(X) ∪= First (ω2); if (ω2 is Nullable) Follow(X) ∪= Follow (Y);

  47. Example: Follow Z -> d -> X Y Z Y -> c -> X -> Y -> a Nullable = {X, Y}

  48. Example: Follow Z -> d -> X Y Z Y -> c -> X -> Y -> a Nullable = {X, Y}

  49. Example: Follow Z -> d -> X Y Z Y -> c -> X -> Y -> a Nullable = {X, Y}

  50. Predicative Parsing Table • With Nullables, First(), and Follow(), we can make a parsing table P(N,T) • each entry contains a set of productions t1 t2 t3 t4 … $(EOF) N1 ri N2 rk N3 rj …

More Related