Compiler Construction

Compiler Construction Syntax Analysis Top-down parsing

Syntax Analysis, continued

Syntax analysis • Last week we covered • The goal of syntax analysis • Context-free grammars • Top-down parsing (a simple but weak parsing method) • Today, we will • Wrap up top-down parsing, including LL(1) • Start on bottom-up parsing • Shift-reduce parsers • LR parsers: SLR(1), LR(1), LALR(1)

Top-Down Parsing

Recursive descent (Last Week) • Recursive descent parsers simply try to build a parse tree, top-down, and BACKTRACK on failure. • Recursion and backtracking are inefficient. • It would be better if we always knew the correct action to take. • It would be better if we could avoid recursive procedure calls during parsing. • PREDICTIVE PARSERS can solve both problems.

Predictive parsers • A predictive parser always knows which production to use, so backtracking is not necessary. • Example: for the productionsstmt -> if ( expr ) stmt else stmt | while ( expr ) stmt | for ( stmt expr stmt ) stmt • a recursive descent parser would always know which production to use, depending on the input token.

Transition diagrams • Transition diagrams can describe recursive parsers, just like they can describe lexical analyzers, but the diagrams are slightly different. • Construction: • Eliminate left recursion from G • Left factor G • For each non-terminal A, do • Create an initial and final (return) state • For each production A -> X1 X2 … Xn, create a path from the initial to the final state with edges X1 X2 … Xn.

Using transition diagrams • Begin in the start state for the start symbol • When we are in state s with edge labeled by terminal a to state t, if the next input symbol is a, move to state t and advance the input pointer. • For an edge to state t labeled with non-terminal A, jump to the transition diagram for A, and when finished, return to state t • For an edge labeled ε, move immediately to t. • Example (4.15 in text): parse the string “id + id * id”

Example transition diagrams • An expression grammar with left recursion and ambiguity removed: • E -> T E’ • E’ -> + T E’ | ε • T -> F T’ • T’ -> * F T’ | ε • F -> ( E ) | id Corresponding transition diagrams:

Predictive parsing without recursion • To get rid of the recursive procedure calls, we maintain our own stack.

The parsing table and parsing program • The table is a 2D array M[A,a] where A is a nonterminal symbol and a is a terminal or $. • At each step, the parser considers the top-of-stack symbol X and input symbol a: • If both are $, accept • If they are the same (nonterminals), pop X, advance input • If X is a nonterminal, consult M[X,a]. If M[X,a] is “ERROR” call an error recovery routine. Otherwise, if M[X,a] is a production of he grammar X -> UVW, replace X on the stack with WVU (U on top)

Example • Use the table-driven predictive parser to parseid + id * id • Assuming parsing table Initial stack is $E Initial input is id + id * id $

Building a predictive parse table • We still don’t know how to create M, the parse table. • The construction requires two functions: FIRST and FOLLOW. • For a string of grammar symbols α, FIRST(α) is the set of terminals that begin all possible strings derived from α. If α =*> ε, then ε is also in FIRST(α). • FOLLOW(A) for nonterminal A is the set of terminals that can appear immediately to the right of A in some sentential form. If A can be the last symbol in a sentential form, then $ is also in FOLLOW(A).

How to compute FIRST(α) • If X is a terminal, FIRST(X) = X. • Otherwise (X is a nonterminal), • 1. If X -> ε is a production, add ε to FIRST(X) • 2. If X -> Y1… Yk is a production, then place a in FIRST(X) if for some i, a is in FIRST(Yi) and Y1…Yi-1 =*> ε. • Given FIRST(X) for all single symbols X, • Let FIRST(X1…Xn) = FIRST(X1) • If ε ∈ FIRST(X1), then add FIRST(X2), and so on…

How to compute FOLLOW(A) • Place $ in FOLLOW(S) (for S the start symbol) • If A -> α B β, then FIRST(β)-ε is placed in FOLLOW(B) • If there is a production A -> α B or a production A -> α B β where β =*> ε, then everything in FOLLOW(A) is in FOLLOW(B). • Repeatedly apply these rules until no FOLLOW set changes.

Example FIRST and FOLLOW • For our favorite grammar:E -> TE’E’ -> +TE | εT -> FT’T’ -> *FT’ | εF -> (E) | id • What is FIRST() and FOLLOW() for all nonterminals?

Parse table construction withFIRST/FOLLOW • Basic idea: if A -> α and a is in FIRST(α), then we expand A to α any time the current input is a and the top of stack is A. • Algorithm: • For each production A -> α in G, do: • For each terminal a in FIRST(α) add A -> α to M[A,a] • If ε ∈ FIRST(α), for each terminal b in FOLLOW(A), do: • add A -> α to M[A,b] • If ε ∈ FIRST(α) and $ is in FOLLOW(A), add A -> α to M[A,$] • Make each undefined entry in M[ ] an ERROR

Example predictive parse table construction • For our favorite grammar:E -> TE’E’ -> +TE | εT -> FT’T’ -> *FT’ | εF -> (E) | id • What the predictive parsing table?

LL(1) grammars • The predictive parser algorithm can be applied to ANY grammar. • But sometimes, M[ ] might have multiply defined entries. • Example: for if-else statements and left factoring:stmt -> if ( expr ) stmt optelseoptelse -> else stmt | ε • When we have “optelse” on the stack and “else” in the input, we have a choice of how to expand optelse (“else” is in FOLLOW(optelse) so either rule is possible)

LL(1) grammars • If the predictive parsing construction for G leads to a parse table M[ ] WITHOUT multiply defined entries,we say “G is LL(1)” 1 symbol of lookahead Leftmost derivation Left-to-right scan of the input

LL(1) grammars • Necessary and sufficient conditions for G to be LL(1): • If A -> α | β • There does not exist a terminal a such thata ∈ FIRST(α) and a ∈ FIRST(β) • At most one of α and β derive ε • If β =*> ε, then FIRST(α) does not intersect with FOLLOW(β). This is the same as saying the predictive parser always knows what to do!

Fig. 4.13. Model of a nonrecursive predictive parser.

Nonrecursive Predictive Parsing • 1. If X = a = $, the parser halts and announces successful completion of parsing. • 2. If X = a $, the parser pops X off the stack and advances the input pointer to the next input symbol. • 3. If X is a nonterminal, the program consults entry M[X, a] of the parsing table M. This entry will be either an X-production of the grammar or an error entry. If, for example, M[X, a] = {X UVW}, the parser replaces X on top of the stack by WVU (with U on top). As output, we shall assume that the parser just prints the production used; any other code could be executed here. If M[X, a] = error, the parser calls an error recovery routine.

Fig. 4.15. Parsing table M for grammar (4.11).

Fig. 4.16. Moves made by predictive parser on input id + id * id.

Fig. 4.17. Parsing table M for grammar (4.13). (Grammar 4.13 )

Fig. 4.18. Synchronizing tokens added to parsing table of Fig. 4.15.

Fig. 4.19. Parsing and error recovery moves made by predictive parser.

Top-down parsing summary • RECURSIVE DESCENT parsers are easy to build, but inefficient, and might require backtracking. • TRANSITION DIAGRAMS help us build recursive descent parsers. • For LL(1) grammars, it is possible to build PREDICTIVE PARSERS with no recursion automatically. • Compute FIRST() and FOLLOW() for all nonterminals • Fill in the predictive parsing table • Use the table-driven predictive parsing algorithm

Bottom-Up Parsing

Bottom-up parsing • Now, instead of starting with the start symbol and working our way down, we will start at the bottom of the parse tree and work our way up. • The style of parsing is called SHIFT-REDUCE • SHIFT refers to pushing input symbols onto a stack. • REDUCE refers to “reduction steps” during a parse: • We take a substring matching the RHS of a rule • Then replace it with the symbol on the LHS of the rule • If you can reduce until you have just the start symbol, you have succeeded in parsing the input string.

Reduction example • S -> aABe • Grammar: A -> Abc | b Input: abbcbcde • B -> d • Reduction steps: abbcbcde • aAbcbcde • aAbcde • aAde • aABe • S <-- SUCCESS! In reverse, the reduction traces out a rightmost derivation.

Handles • The HANDLE is the part of a sentential form that gets reduced in a backwards rightmost derivation. • Sometimes part of a sentential form will match a RHS in G, but if that string is NOT reduced in the backwards rightmost derivation, it is NOT a handle. • Shift-reduce parsing, then, is really all about finding the handle at each step then reducing the handle. • If we can always find the handle, we never have to backtrack. • Finding the handle is called HANDLE PRUNING.

Fig. 4.23. Operator-precedence relations.

Operator-Precedence Relations from Associativity and Precedence (1/2) • 1. If operator θ1 has higher precedence than operator θ2, make θ1·> θ2 and θ2 <·θ1 . For example, if * has higher precedence than +, make * ·> + and + <· *. These relations ensure that, in an expression of the form E+E*E+E, the central E*E is the handle that will be reduced first. • 2. If θ1 and θ2 are operators of equal precedence (they may in fact be the same operator), then make θ1·> θ2 and θ2·> θ1 if the operators are left-associative, or make θ1 <·θ2 and θ2 <·θ1 if they are right-associative. For example, if + and – are left-associative, then make + ·> +, + ·> -, - ·> - and - ·> +. If  is right associative, then make  <· . These relations ensure that E-E+E will have handle E-E selected and EEE will have the last EE selected.

Operator-Precedence Relations from Associativity and Precedence (2/2) • 3. Make θ <·id, id·> θ, θ <· (, ( <·θ , ) ·> θ, θ·> ), θ·> $, and $ <·θ for all operators θ. Also, let • These rules ensure that both id and (E) will be reduced to E. Also, $ serves as both the left and right endmarker, causing handles to be found between $’s wherever possible. ·

· Fig. 4.25. Operator-precedence relations.

Precedence Functions • Example 4.29 The Precedence table of Fig. 4.25 has the following pair of precedence functions, • For example, * <· id and, f(*) < g(id). Note that f(id) > g(id) suggests that id ·> id; but, in fact, no precedence relation holds between id and id. Other error entries in Fig. 4.25 are similarly replaced by one or another precedence relation.

Fig. 4.26. Graph representing precedence functions.

Fig. 4.28. Operator-precedence matrix with error entries ·

Shift-reduce parsing with a stack • A stack helps us find the handle for each reduction step. • The stack holds grammar symbols. • An input buffer holds the input string. • $ marks the bottom of the stack and the end of input. • Algorithm: • Shift 0 or more input symbols onto the stack, until a handle β is on top of the stack. • Reduce β to the LHS of the appropriate production. • Repeat until we see $S on stack and $ in input.

Shift-reduce example • E -> E + E • Grammar: E -> E * E w = id + id * id • E -> ( E ) • E -> id • STACK INPUT ACTION • 1. $ id+id*id$ shift

Shift-reduce parsing actions • SHIFT: The next input symbol is pushed onto the stack. • REDUCE: When the parser knows the right end of a handle is on the stack, the handle is replaced with the corresponding LHS. • ACCEPT: Announce success (input is $, stack is $S) • ERROR: The input contained a syntax error; call an error recovery routine.

Conflicts during shift/reduce parsing • Like predictive parsers, sometimes a shift-reduce parser won’t know what to do. • A SHIFT/REDUCE conflict occurs when the parser can’t decide whether to shift the input symbol or reduce the current top of stack. • A REDUCE/REDUCE conflict occurs when the parser doesn’t know which of two or more rules to use for reduction. • A grammar whose shift-reduce parser contains errors is said to be “Not LR”

Example shift/reduce conflict • Ambiguous grammars are NEVER LR. • stmt -> if ( expr ) stmt • | if ( expr ) stmt else stmt • | other • If we have a shift-reduce parser in configuration • STACK INPUT • … if ( expr ) stmt else … $ • what to do? • We could reduce “if ( expr ) stmt” to “stmt” (assuming the else is part of a different surrounding if-else statement) • We could also shift the “else” (assuming this else goes with the current if)

Example reduce/reduce conflict • Some languages use () for function calls AND array refs. • stmt -> id ( parameter_list ) • stmt -> expr := expr • parameter_list -> parameter_list , parameter • parameter_list -> parameter • parameter -> id • expr -> id ( expr_list ) • expr -> id • expr_list -> expr_list , expr • expr_list -> expr

Example reduce/reduce conflict • For input A(I,J) we would get token stream id(id,id) • The first three tokens would certainly be shifted: • STACK INPUT • … id ( id , id ) … • The id on top of the stack needs to be reduced, but we have two choices: parameter -> id OR expr -> id • The stack gives no clues. To know which rule to use, we need to look up the first ID in the symbol table to see if it is a procedure name or an array name. • One solution is to have the lexer return “procid” for procedure names. Then the shift-reduce parser can look into the stack to decide which reduction to use.

LR (Bottom-Up) Parsers

Relationship between parser types

LR parsing • A major type of shift-reduce parsing is called LR(k). • “L” means left-to-right scanning of the input • “R” means rightmost derivation • “k” means lookahead of k characters (if omitted, assume k=1) • LR parsers have very nice properties: • They can recognize almost all programming language constructs for which we can write a CFG • They are the most powerful type of shift-reduce parser, but they never backtrack, and are very efficient • They can parse a proper superset of the languages parsable by predictive parsers • They tell you as soon as possible when there’s a syntax error. • DISADVANTAGE: hard to build by hand (we need something like yacc)

Compiler Construction