200 likes | 359 Views
Lecture #8, Feb. 7, 2007. Shift-reduce parsing, Shift-reduce and reduce-reduce conflicts, Precedence parsing, LR parsers, LR parsers and FSAs, LR parsing Tables, Sets of Items construction. Invitation. If you have not handed in a homework.
E N D
Lecture #8, Feb. 7, 2007 • Shift-reduce parsing, • Shift-reduce and reduce-reduce conflicts, • Precedence parsing, • LR parsers, • LR parsers and FSAs, • LR parsing Tables, • Sets of Items construction.
Invitation • If you have not handed in a homework. • And you would like another chance (a sort of amnesty) • Please see me about making arrangements.
Assignments • Reading • Chapter 3, Sections 3.4 (Bottom up parsing) and 3.5 (LR(1) tables) • Page 107-133 • Possible Quiz on Monday • Homework • Homework 7, will be assigned Monday, but it is posted already. • Please take a look if you want to. • Project 1 • Recall Project #1, the scanner is due Feb. 14th • Midterm (sort of) Exam – Monday Feb 19
Bottom Up Parsing Shift-Reduce parsing • data structures: input-string and stack. • look at symbols on top of stack, and the input-string and decide: • shift (move first input to stack) • reduce (replace top n symbols on stack by a non-terminal) • accept (declare victory) • error (be gracious in defeat)
Example Bottom up Parse Consider the grammar: (note: left recursion is NOT a problem, but the grammar is still layered to prevent ambiguity) E ::= E + T1 | T2 T ::= T * F3 | F4 F ::= ( E )5 | id6 stack Input Action x + y shift x + y reduce 6 F + y reduce 4 T + y reduce 2 E + y shift E + y shift E + y reduce 6 E + F reduce 4 E + T reduce 1 E accept
Problems (ambiguous grammars) 1) shift reduce conflicts: stack Input Action x + y + z ? stack Input Action if x t if y t s2 e s3 ? 2) reduce reduce conflicts: suppose both procedure call and array reference have similar syntax: • x(2) := 6 • f(x) stack Input Action id ( id ) id ? Should id reduce to a parameter or an expression. Depends on whether the bottom most id is an array or a procedure.
Using ambiguity to your advantage • Shift-Reduce and Reduce-Reduce errors are caused by ambiguous grammars. • We can use resolution mechanisms to our advantage. Use an ambiguous grammar (smaller more concise, more natural parse trees) but resolve ambiguity using rules. • Operator Precedence • Every operator is given a precedence • Precedence of the operator closest to the top of the stack and the precedence of operator next on the input decide shift or reduce. • Sometimes the precedence is the same. Need more information: Associativity information.
( ) id * + $ : > < : < : : > < : : > + : > : > < : < : < : : > * = < : < : < : < : ( ) : > : > : > : > id : > : > : > : > < : $ < : < : < : Example Precedence Parser accept input : x * x + y stack Input Action $ E * E + y $ reduce! topmost terminal next input
Precedence parsers • Precedence parsers have limitations • No production can have two consecutive non-terminals • Parse only a small subset of the Context Free Grammars • Need a more robust version of shift- reduce parsing. • LR - parsers • State based - finite state automatons (w / stack) • Accept the widest range of grammars • Easily constructed (by a machine) • Can be modified to accept ambiguous grammars by using precedence and associativity information.
LR Parsers • Table Driven Parsers • Table is indexed by state and symbols (both term and non-term) • Table has two components. • ACTION part • GOTO part terminals non-terminals state id + * ( ) $ E T F 0 shift (state = 5) 1 2 goto(state = 2) reduce(prod = 12) ACTION GOTO
+ 1 T 6 9 id E F 5 ( + id 0 id * F id 3 ( F T ) 8 11 E ( 4 ( 7 T * 2 F 10 LR Table encodes FSA transition on terminal is a shift in action table, on nonterminal is a goto entry E -> E + T | T T -> T * F | F F -> ( E ) | id
Table vs FSA • The Table encodes the FSA • The action part encodes • Transitions on terminal symbols (shift) • Finding the end of a production (reduce) • The goto part encodes • Tracing backwards the symbols on the RHS • Transition on non-terminal, the LHS • Tables can be quite compact
LR Table terminals non-terminals state F T E ( ) $ + id * 2 1 s5 s4 3 0 1 s6 acc 2 r2 r2 r2 s7 3 r4 r4 r4 r4 s5 2 3 4 s4 8 r6 5 r6 r6 r6 s5 9 6 s4 3 7 10 s5 s4 8 s6 s11 9 r1 r1 s7 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5
Reduce Action • If the top of the stack is the rhs for some production n • And the current action is “reduce n” • We pop the rhs, then look at the state on the top of the stack, and index the goto-table with this state and the LHS non-terminal. • Then push the lhs onto the stack in the new s found in the goto-table. (?,0)(id,5) * id + id $ Where: Action(5,*) = reduce 6 Production 6 is: F ::= id And: GOTO(0,F) = 3 (?,0)(F,3) * id + id $
Example Parse • E -> E + T • E -> T • T -> T * F • T -> F • F -> ( E ) • F -> id Stack Input (?,0) id * id + id $ (?,0)(id,5) * id + id $ (?,0)(F,3) * id + id $ (?,0)(T,2) * id + id $ (?,0)(T,2)(*,7) id + id $ (?,0)(T,2)(*,7)(id,5) + id $ (?,0)(T,2)(*,7)(F,10) + id $ (?,0)(T,2) + id $ (?,0)(E,1) + id $ (?,0)(E,1)(+,6) id $ (?,0)(E,1)(+,6)(id,5) $ (?,0)(E,1)(+,6)(F,3) $ (?,0)(E,1)(+,6)(T,9) $ (?,0)(E,1) $
Constructing LR Tables • A state represents some local state of the parse. • It encodes “how far” we have progressed in recognizing some RHS of a production. • We will represent this by an “item” • An item is a production with a dot • A -> . X Y Z • A -> X . Y Z • A -> X Y . Z • A -> X Y Z. • An item with a dot at the end is called a completed item • An item with a dot at the beginning is called an initial item • A state in the FSA is represented by a set of items.
Constructing Sets of items • To construct the set of items that a state represents we need to know if we’re in some state how could we get there? • This is done by a procedure called taking the closure of an item. • Algorithm for computing the closure of an item. • let I be a set of items then • every item in I is in closure(I) • if (A -> a . B b ) is in closure(I) and (B -> c ) is a production, then add the item (B -> . c ) if it is not already there.
Example closure construction Start of with the single item derived from the start symbol. I = { E’ -> . E } Then the closure of I contains: { E’ -> . E, E -> . E + T, E -> .T T -> . T * F T -> . F F -> . ( E ) F -> . id } • 0) E’ -> E • 1) E -> E + T • E -> T • T -> T * F • T -> F • F -> ( E ) • F -> id
The GOTO Construction • Goto(I,X) • I is a set of items • X is a grammar symbol • This discovers new sets of items in the FSA given an old set. • GOTO(I,X) = closure(J) such that • [A -> a X . b ] in J when [A -> a . X b ] is in I • GOTO(I,X) will be {} if there is no item in I with a dot before an X • Example: I = { E’-> . E, E-> E . + T} GOTO(I,+) = { E-> E + . T, (initial item) T-> . T * F, (plus the closure of it) T-> . F, F-> . ( E ), F-> . id }
The Sets of items algorithm • Let G’ be an augmented grammar • G’ = G U S’ -> S The productionwhere S is the start symbol of G procedure items(G’) = begin C := closure( { S’ -> . S } ); repeat for each set of items I in C for each symbol X in G’ Do if GOTO(I,X) is not empty and is not already in C then add it to C until no more sets of items can be added to C.