580 likes | 597 Views
Learn about top-down parsing in compiler construction, including LL(1) parsing and the transition from top-down to bottom-up parsing. Explore shift-reduce and LR parsers: SLR(1), LR(1), and LALR(1).
E N D
Compiler Construction Syntax Analysis Top-down parsing
Syntax analysis • Last week we covered • The goal of syntax analysis • Context-free grammars • Top-down parsing (a simple but weak parsing method) • Today, we will • Wrap up top-down parsing, including LL(1) • Start on bottom-up parsing • Shift-reduce parsers • LR parsers: SLR(1), LR(1), LALR(1)
Recursive descent (Last Week) • Recursive descent parsers simply try to build a parse tree, top-down, and BACKTRACK on failure. • Recursion and backtracking are inefficient. • It would be better if we always knew the correct action to take. • It would be better if we could avoid recursive procedure calls during parsing. • PREDICTIVE PARSERS can solve both problems.
Predictive parsers • A predictive parser always knows which production to use, so backtracking is not necessary. • Example: for the productionsstmt -> if ( expr ) stmt else stmt | while ( expr ) stmt | for ( stmt expr stmt ) stmt • a recursive descent parser would always know which production to use, depending on the input token.
Transition diagrams • Transition diagrams can describe recursive parsers, just like they can describe lexical analyzers, but the diagrams are slightly different. • Construction: • Eliminate left recursion from G • Left factor G • For each non-terminal A, do • Create an initial and final (return) state • For each production A -> X1 X2 … Xn, create a path from the initial to the final state with edges X1 X2 … Xn.
Using transition diagrams • Begin in the start state for the start symbol • When we are in state s with edge labeled by terminal a to state t, if the next input symbol is a, move to state t and advance the input pointer. • For an edge to state t labeled with non-terminal A, jump to the transition diagram for A, and when finished, return to state t • For an edge labeled ε, move immediately to t. • Example (4.15 in text): parse the string “id + id * id”
Example transition diagrams • An expression grammar with left recursion and ambiguity removed: • E -> T E’ • E’ -> + T E’ | ε • T -> F T’ • T’ -> * F T’ | ε • F -> ( E ) | id Corresponding transition diagrams:
Predictive parsing without recursion • To get rid of the recursive procedure calls, we maintain our own stack.
The parsing table and parsing program • The table is a 2D array M[A,a] where A is a nonterminal symbol and a is a terminal or $. • At each step, the parser considers the top-of-stack symbol X and input symbol a: • If both are $, accept • If they are the same (nonterminals), pop X, advance input • If X is a nonterminal, consult M[X,a]. If M[X,a] is “ERROR” call an error recovery routine. Otherwise, if M[X,a] is a production of he grammar X -> UVW, replace X on the stack with WVU (U on top)
Example • Use the table-driven predictive parser to parseid + id * id • Assuming parsing table Initial stack is $E Initial input is id + id * id $
Building a predictive parse table • We still don’t know how to create M, the parse table. • The construction requires two functions: FIRST and FOLLOW. • For a string of grammar symbols α, FIRST(α) is the set of terminals that begin all possible strings derived from α. If α =*> ε, then ε is also in FIRST(α). • FOLLOW(A) for nonterminal A is the set of terminals that can appear immediately to the right of A in some sentential form. If A can be the last symbol in a sentential form, then $ is also in FOLLOW(A).
How to compute FIRST(α) • If X is a terminal, FIRST(X) = X. • Otherwise (X is a nonterminal), • 1. If X -> ε is a production, add ε to FIRST(X) • 2. If X -> Y1… Yk is a production, then place a in FIRST(X) if for some i, a is in FIRST(Yi) and Y1…Yi-1 =*> ε. • Given FIRST(X) for all single symbols X, • Let FIRST(X1…Xn) = FIRST(X1) • If ε ∈ FIRST(X1), then add FIRST(X2), and so on…
How to compute FOLLOW(A) • Place $ in FOLLOW(S) (for S the start symbol) • If A -> α B β, then FIRST(β)-ε is placed in FOLLOW(B) • If there is a production A -> α B or a production A -> α B β where β =*> ε, then everything in FOLLOW(A) is in FOLLOW(B). • Repeatedly apply these rules until no FOLLOW set changes.
Example FIRST and FOLLOW • For our favorite grammar:E -> TE’E’ -> +TE | εT -> FT’T’ -> *FT’ | εF -> (E) | id • What is FIRST() and FOLLOW() for all nonterminals?
Parse table construction withFIRST/FOLLOW • Basic idea: if A -> α and a is in FIRST(α), then we expand A to α any time the current input is a and the top of stack is A. • Algorithm: • For each production A -> α in G, do: • For each terminal a in FIRST(α) add A -> α to M[A,a] • If ε ∈ FIRST(α), for each terminal b in FOLLOW(A), do: • add A -> α to M[A,b] • If ε ∈ FIRST(α) and $ is in FOLLOW(A), add A -> α to M[A,$] • Make each undefined entry in M[ ] an ERROR
Example predictive parse table construction • For our favorite grammar:E -> TE’E’ -> +TE | εT -> FT’T’ -> *FT’ | εF -> (E) | id • What the predictive parsing table?
LL(1) grammars • The predictive parser algorithm can be applied to ANY grammar. • But sometimes, M[ ] might have multiply defined entries. • Example: for if-else statements and left factoring:stmt -> if ( expr ) stmt optelseoptelse -> else stmt | ε • When we have “optelse” on the stack and “else” in the input, we have a choice of how to expand optelse (“else” is in FOLLOW(optelse) so either rule is possible)
LL(1) grammars • If the predictive parsing construction for G leads to a parse table M[ ] WITHOUT multiply defined entries,we say “G is LL(1)” 1 symbol of lookahead Leftmost derivation Left-to-right scan of the input
LL(1) grammars • Necessary and sufficient conditions for G to be LL(1): • If A -> α | β • There does not exist a terminal a such thata ∈ FIRST(α) and a ∈ FIRST(β) • At most one of α and β derive ε • If β =*> ε, then FIRST(α) does not intersect with FOLLOW(β). This is the same as saying the predictive parser always knows what to do!
Nonrecursive Predictive Parsing • 1. If X = a = $, the parser halts and announces successful completion of parsing. • 2. If X = a $, the parser pops X off the stack and advances the input pointer to the next input symbol. • 3. If X is a nonterminal, the program consults entry M[X, a] of the parsing table M. This entry will be either an X-production of the grammar or an error entry. If, for example, M[X, a] = {X UVW}, the parser replaces X on top of the stack by WVU (with U on top). As output, we shall assume that the parser just prints the production used; any other code could be executed here. If M[X, a] = error, the parser calls an error recovery routine.
Fig. 4.16. Moves made by predictive parser on input id + id * id.
Fig. 4.17. Parsing table M for grammar (4.13). (Grammar 4.13 )
Fig. 4.18. Synchronizing tokens added to parsing table of Fig. 4.15.
Fig. 4.19. Parsing and error recovery moves made by predictive parser.
Top-down parsing summary • RECURSIVE DESCENT parsers are easy to build, but inefficient, and might require backtracking. • TRANSITION DIAGRAMS help us build recursive descent parsers. • For LL(1) grammars, it is possible to build PREDICTIVE PARSERS with no recursion automatically. • Compute FIRST() and FOLLOW() for all nonterminals • Fill in the predictive parsing table • Use the table-driven predictive parsing algorithm
Bottom-up parsing • Now, instead of starting with the start symbol and working our way down, we will start at the bottom of the parse tree and work our way up. • The style of parsing is called SHIFT-REDUCE • SHIFT refers to pushing input symbols onto a stack. • REDUCE refers to “reduction steps” during a parse: • We take a substring matching the RHS of a rule • Then replace it with the symbol on the LHS of the rule • If you can reduce until you have just the start symbol, you have succeeded in parsing the input string.
Reduction example • S -> aABe • Grammar: A -> Abc | b Input: abbcbcde • B -> d • Reduction steps: abbcbcde • aAbcbcde • aAbcde • aAde • aABe • S <-- SUCCESS! In reverse, the reduction traces out a rightmost derivation.
Handles • The HANDLE is the part of a sentential form that gets reduced in a backwards rightmost derivation. • Sometimes part of a sentential form will match a RHS in G, but if that string is NOT reduced in the backwards rightmost derivation, it is NOT a handle. • Shift-reduce parsing, then, is really all about finding the handle at each step then reducing the handle. • If we can always find the handle, we never have to backtrack. • Finding the handle is called HANDLE PRUNING.
Operator-Precedence Relations from Associativity and Precedence (1/2) • 1. If operator θ1 has higher precedence than operator θ2, make θ1·> θ2 and θ2 <·θ1 . For example, if * has higher precedence than +, make * ·> + and + <· *. These relations ensure that, in an expression of the form E+E*E+E, the central E*E is the handle that will be reduced first. • 2. If θ1 and θ2 are operators of equal precedence (they may in fact be the same operator), then make θ1·> θ2 and θ2·> θ1 if the operators are left-associative, or make θ1 <·θ2 and θ2 <·θ1 if they are right-associative. For example, if + and – are left-associative, then make + ·> +, + ·> -, - ·> - and - ·> +. If is right associative, then make <· . These relations ensure that E-E+E will have handle E-E selected and EEE will have the last EE selected.
Operator-Precedence Relations from Associativity and Precedence (2/2) • 3. Make θ <·id, id·> θ, θ <· (, ( <·θ , ) ·> θ, θ·> ), θ·> $, and $ <·θ for all operators θ. Also, let • These rules ensure that both id and (E) will be reduced to E. Also, $ serves as both the left and right endmarker, causing handles to be found between $’s wherever possible. ·
· Fig. 4.25. Operator-precedence relations.
Precedence Functions • Example 4.29 The Precedence table of Fig. 4.25 has the following pair of precedence functions, • For example, * <· id and, f(*) < g(id). Note that f(id) > g(id) suggests that id ·> id; but, in fact, no precedence relation holds between id and id. Other error entries in Fig. 4.25 are similarly replaced by one or another precedence relation.
Shift-reduce parsing with a stack • A stack helps us find the handle for each reduction step. • The stack holds grammar symbols. • An input buffer holds the input string. • $ marks the bottom of the stack and the end of input. • Algorithm: • Shift 0 or more input symbols onto the stack, until a handle β is on top of the stack. • Reduce β to the LHS of the appropriate production. • Repeat until we see $S on stack and $ in input.
Shift-reduce example • E -> E + E • Grammar: E -> E * E w = id + id * id • E -> ( E ) • E -> id • STACK INPUT ACTION • 1. $ id+id*id$ shift
Shift-reduce parsing actions • SHIFT: The next input symbol is pushed onto the stack. • REDUCE: When the parser knows the right end of a handle is on the stack, the handle is replaced with the corresponding LHS. • ACCEPT: Announce success (input is $, stack is $S) • ERROR: The input contained a syntax error; call an error recovery routine.
Conflicts during shift/reduce parsing • Like predictive parsers, sometimes a shift-reduce parser won’t know what to do. • A SHIFT/REDUCE conflict occurs when the parser can’t decide whether to shift the input symbol or reduce the current top of stack. • A REDUCE/REDUCE conflict occurs when the parser doesn’t know which of two or more rules to use for reduction. • A grammar whose shift-reduce parser contains errors is said to be “Not LR”
Example shift/reduce conflict • Ambiguous grammars are NEVER LR. • stmt -> if ( expr ) stmt • | if ( expr ) stmt else stmt • | other • If we have a shift-reduce parser in configuration • STACK INPUT • … if ( expr ) stmt else … $ • what to do? • We could reduce “if ( expr ) stmt” to “stmt” (assuming the else is part of a different surrounding if-else statement) • We could also shift the “else” (assuming this else goes with the current if)
Example reduce/reduce conflict • Some languages use () for function calls AND array refs. • stmt -> id ( parameter_list ) • stmt -> expr := expr • parameter_list -> parameter_list , parameter • parameter_list -> parameter • parameter -> id • expr -> id ( expr_list ) • expr -> id • expr_list -> expr_list , expr • expr_list -> expr
Example reduce/reduce conflict • For input A(I,J) we would get token stream id(id,id) • The first three tokens would certainly be shifted: • STACK INPUT • … id ( id , id ) … • The id on top of the stack needs to be reduced, but we have two choices: parameter -> id OR expr -> id • The stack gives no clues. To know which rule to use, we need to look up the first ID in the symbol table to see if it is a procedure name or an array name. • One solution is to have the lexer return “procid” for procedure names. Then the shift-reduce parser can look into the stack to decide which reduction to use.
LR parsing • A major type of shift-reduce parsing is called LR(k). • “L” means left-to-right scanning of the input • “R” means rightmost derivation • “k” means lookahead of k characters (if omitted, assume k=1) • LR parsers have very nice properties: • They can recognize almost all programming language constructs for which we can write a CFG • They are the most powerful type of shift-reduce parser, but they never backtrack, and are very efficient • They can parse a proper superset of the languages parsable by predictive parsers • They tell you as soon as possible when there’s a syntax error. • DISADVANTAGE: hard to build by hand (we need something like yacc)