700 likes | 921 Views
Bottom-up Parsing. Reading Sections 4.5 and 4.7 from ASU. Predictive Parsing Summary. First and Follow sets are used to construct predictive tables For non-terminal A and input t, use a production A a where t First( a )
E N D
Bottom-up Parsing Reading Sections 4.5 and 4.7 from ASU CPSC4600
Predictive Parsing Summary • First and Follow sets are used to construct predictive tables • For non-terminal A and input t, use a production A a where t First(a) • For non-terminal A and input t, if e First(A) and t Follow(a), then use a production A a where e First(a) • Recursive-descent without backtracking do not need the parse table explicitly CPSC4600
Bottom-Up Parsing(1) • Bottom-up parsing is more general than top-down parsing • And just as efficient • Builds on ideas in top-down parsing • Bottom-up is the preferred method in practice CPSC4600
Bottom-Up Parsing(2) • Table-driven using an explicit stack (non-recursive) • Stack can be viewed as containing both terminals and nonterminals • Basic operations: • Shift: Move terminals from input stream to the stack until the right-hand side of an appropriate production rule has been identified in the stack • Reduce: Replace the sentential form appearing on the stack (considered from top that matched the right-hand side of an appropriate production rule) with the nonterminal appearing on the left-hand side of the production. CPSC4600
An Introductory Example • Bottom-up parsers don’t need left-factored grammars • Hence we can revert to the “natural” grammar for our example: E T + E | T T num * T | num | (E) • Consider the string: num * num + num CPSC4600
num * num + num T num num * T + num T num * T T + num T num T + T E T T + E E T + E E The Idea Bottom-up parsing reduces a string to the start symbol by inverting productions: Input Productions Used CPSC4600
Right-most Derivation • In a right-most derivation, the rightmost nonterminal of a sentential form is replaced at each derivation step. • Question: find the rightmost derivation of the string num* num + num CPSC4600
num * num + num T num num * T + num T num * T T + num T num T + T E T T + E E T + E E Observation • Read the productions found by bottom-up parse in reverse (i.e., from bottom to top) • This is a rightmost derivation! CPSC4600
Important Facts A bottom-up parser traces a rightmost derivation in reverse CPSC4600
num * num + num num * T + num T + num T + T T + E E A Bottom-up Parse E T E T T + num num num * CPSC4600
num * num + num A Bottom-up Parse in Detail (1) + num num num * CPSC4600
num * num + num num * T + num A Bottom-up Parse in Detail (2) T + num num num * CPSC4600
num * num + num num * T + num T + num A Bottom-up Parse in Detail (3) T T + num num num * CPSC4600
num * num + num num * T + num T + num T + T A Bottom-up Parse in Detail (4) T T T + num num num * CPSC4600
num * num + num num * T + num T + num T + T T + E A Bottom-up Parse in Detail (5) T E T T + num num num * CPSC4600
num * num + num num * T + num T + num T + T T + E E A Bottom-up Parse in Detail (6) E T E T T + num num num * CPSC4600
Bottom-up Parsing A trivial bottom-up parsing algorithm Let I = input string repeat pick a non-empty substring of I where X is a production if no such , backtrack replace one by X in I until I = “S” (the start symbol) or all possibilities are exhausted CPSC4600
Observations • The termination of the algorithm (when/if) • Running time of the algorithm • If there are more than one choices for the sub-string to be replaced (reduce) which one to choose? CPSC4600
Where Do Reductions Happen Recall A bottom-up parser traces a rightmost derivation in reverse • Let be a rightmost sentential form • Assume the next reduction is by X • Then is a string of terminals Why? Because X is a step in a right-most derivation CPSC4600
Shift-Reduce Parsing Bottom-up parsing uses only two kinds of actions: • Shift • Reduce CPSC4600
Shift • Shift: Move #(marking the part of the input that has been processed) one place to the right • Shifts a terminal to the left string ABC#xyz ABCx#yz CPSC4600
Reduce • Apply an inverse production at the right end of the left string • If A xy is a production, then Cbxy#ijk CbA#ijk CPSC4600
#num * num + num shift num # * num + num shift num * # num + num shift num * num # + num reduce T num num * T # + num reduce T num * T T # + num shift T + # num shift T + num # reduce T num T + T # reduce E T T + E # reduce E T + E E # The Example with Shift-Reduce Parsing CPSC4600
#num * num + num A Shift-Reduce Parse in Detail (1) + num num num * CPSC4600
#num * num + num num # * num + num A Shift-Reduce Parse in Detail (2) + num num num * CPSC4600
#num * num + num num # * num + num num * # num + num A Shift-Reduce Parse in Detail (3) num + num num * CPSC4600
#num * num + num num # * num + num num * # num + num num * num # + num A Shift-Reduce Parse in Detail (4) + num num num * CPSC4600
#num * num + num num # * num + num num * # num + num num * num # + num num * T # + num A Shift-Reduce Parse in Detail (5) T + num num * num CPSC4600
#num * num + num num # * num + num num * # num + num num * num # + num num * T # + num T # + num A Shift-Reduce Parse in Detail (6) T T + num num num * CPSC4600
#num * num + num num # * num + num num * # num + num num * num # + num num * T # + num T # + num T + # num A Shift-Reduce Parse in Detail (7) T T num + num num * CPSC4600
#num * num + num num # * num + num num * # num + num num * num # + num num * T # + num T # + num T + # num T + num # A Shift-Reduce Parse in Detail (8) T T num + num num * CPSC4600
#num * num + num num # * num + num num * # num + num num * num # + num num * T # + num T # + num T + # num T + num # T + T # A Shift-Reduce Parse in Detail (9) T T T + num num num * CPSC4600
#num * num + num num # * num + num num * # num + num num * num # + num num * T # + num T # + num T + # num T + num # T + T # T + E # A Shift-Reduce Parse in Detail (10) T E T T num + num num * CPSC4600
#num * num + num num # * num + num num * # num + num num * num # + num num * T # + num T # + num T + # num T + num # T + T # T + E # E # A Shift-Reduce Parse in Detail (11) E T E T T + num num num * CPSC4600
The Stack • Left string can be implemented by a stack • Top of the stack is the # • Shift pushes a terminal on the stack • Reduce pops 0 or more symbols off of the stack (production rhs) and pushes a non-terminal on the stack (production lhs) CPSC4600
Key Issue (will be resolved by algorithms) • How do we decide when to shift or reduce? • Consider step: num # * num + num • We could reduce by T num giving T # * num + num • A fatal mistake: No way to reduce to the start symbol E CPSC4600
Conflicts • Generic shift-reduce strategy: • If there is a handle on top of the stack, reduce • Otherwise, shift • But what if there is a choice? • If it is legal to shift or reduce, there is a shift-reduce conflict • If it is legal to reduce by two different productions, there is a reduce-reduce conflict CPSC4600
E E + E | E * E | (E) | num Conflict Example Consider the ambiguous grammar: CPSC4600
#num * num + num shift . . . . . . E * E # + num reduce E E * E E # + num shift E + # num shift E + num# reduce E num E + E # reduce E E + E E # One Shift-Reduce Parse Input Action CPSC4600
#num * num + num shift . . . . . . E * E # + num shift E * E + # num shift E * E + num # reduce E num E * E + E# reduce E E + E E * E # reduce E E * E E # Another Shift-Reduce Parse Action Input CPSC4600
Observations • In the second step E * E # + numwe can either shift or reduce by E E * E • Choice determines associativity of + and * • As noted previously, grammar can be rewritten to enforce precedence • Precedence declarations are an alternative CPSC4600
Overview • LR(k) parsing • L: scan input Left to right • R: produce rightmost derivation • k tokens of lookahead • LR(0) • zero tokens of look-ahead • SLR • Simple LR: like LR(0) but uses FOLLOW sets to build more “precise” parsing tables CPSC4600
Basic Terminologies • Handle • A substring that matches the right side of a production whose reductionwith that production’s left side constitutes one step of the rightmost derivation of the string from the start nonterminal of the grammar CPSC4600
Model of Shift-Reduce Parsing • Stack + input = current right-sentential form. • Locate the handle during parsing: • shift zero or more terminals (tokens) onto the stack until a handle b is on top of the stack. • Replace the handle with a proper non-terminal (Handle Pruning): • reduceb to A where A b CPSC4600
Model of an LR Parser CPSC4600
Problem: when to shift, when to reduce? • Recall grammar: E T + E | T T num * T | num | (E) • how to know when to reduce and when to shift? CPSC4600
Model of Shift-Reduce Parsing • Stack + input = current right-sentential form. • Locate the handle during the parsing: • shift zero or terminals onto the stack until a handle is b on top of the stack. • Replace the handle with a proper non-terminal (Handle Pruning) CPSC4600
What we need to know to do LR parsing • LR(0) states • describe states in which the parser can be • Note: LR(0) states are used by both LR(0) and SLR parsers • Parsing tables • transitions between LR(0) states, • actions to take at transition: • shift, reduce, accept, error • How to construct LR(0) states • How to construct parsing tables • How to drive the parser CPSC4600
An LR(0) state = a set of LR(0) items • An LR(0) item [X --> a.b] says that • the parser is looking for an X • it has an aon top of the stack • expects to find in the input a string derived from b. • Notes: • [X --> a.ab] means that if a is on the input, it can be shifted. That is: • a is a correct token to see on the input, and • shifting a would not “over-shift” (still a viable prefix). • [X -->a.] means that we could reduce X CPSC4600
E --> T + . E E --> .T E --> .T + E T --> .(E) T --> .num * T T --> .num LR(0) states E --> T + E. E T + E --> T. E --> T. + E S’ --> E . ( T num E T --> (. E) E --> .T E --> .T + E T --> .(E) T --> .num * T T --> .num T --> num. * T T --> num. ( S’ --> . E E --> . T E --> .T + E T --> .(E) T --> .num * T T --> .num num num num * T --> num * T. E T --> num * .T T --> .(E) T --> .num * T T --> .num T T --> (E.) ) T --> (E). ( CPSC4600 (