960 likes | 1.21k Views
Chap6 LR Parsing. Recall some terminologies: phrase: a (Vt Vn)*, A Vn, A a simple phrase: a (Vt Vn)*, A Vn, A a handle of a sentential form: leftmost simple phrase of the sentential form. e.g. S consider the
E N D
Chap6 LR Parsing • Recall some terminologies: • phrase: • a (VtVn)*, A Vn, Aa • simple phrase: • a (Vt Vn)*, A Vn, A a • handle of a sentential form: • leftmost simple phrase of the • sentential form. • e.g. • S consider the • sentential form • A a B • aAb a bB • a A b b B • handle simple • phrase *
shift-reduce parser: two operations: • shift input into the parse stack • until the handle is identified; then • reduce the handle to the LHS • nonterminal • e.g. Given the grammar • Pbegin S end $ • S a ; S • S begin S end ; S • S • begin a ; begin a ; end ; end $ • input • stack • handles: l (trace this example) • a;S • l • begin S end; S • a; S • begin S end$
LOOKING AT THE TREE P begin S end $ a ; S begin S end ; S a ; S l l (bottom-up tree construction)
LOOKING AT THE DERIVATION P begin S end $ begin a ; S end $ begin a ; begin S end ; S end $ begin a ; begin S end ; l end $ begin a ; begin a ; S end ; end $ begin a ; begin a ; l end; end $ handles (This is a rightmost derivation.)
Two questions: • 1. Have we reached the end of handles • and how long is the handle? • 2. Which nonterminal does the handle • reduce to? • We use tables to answer the questions. • ACTION table • GOTO table • We first show how to use the table, • then how to construct the table.
How does the LR(0) parser • work? • 2. How are the action and • goto tables constructed? • 3. Is LR(0) parsing correct? • 4. LR(1) • 5. SLR(1) • 6. LALR(1)
LR parsers are driven by: • action table, which specifies • what actions to take • (shift, reduce,accept or error) • goto table, which specifies state • transition • and we push states, rather than • symbols, onto the stack. • Each state represents a subtree • of the parse tree.
shift-reduce-driver /*look ahead 1 token*/ { push( start_state ); T := scanner(); do { S := state on top of stack switch( action(S,T) ) case shift: push( state(S,T) ); T := scanner(); break; case reduce i: m := length of RHS of prod. i; pop( m ); S := state on top of stack after poping; X := LHS of prod. i; push( state(S,X) ); break; case accept: ... case error: ... end } forever }
Shift-Reduce Parsers • The grammar G0 • A shift-reduce parser
Shift-Reduce Parsers • The driver utilizes a parse stack that contains parse states, usually coded as integers. • The driver uses two tables, action and go_to. • The action tells the parser whether to shift, reduce, terminate successfully, or signal a syntax error. • The go_to table defines successor states after a token or LHS is matched and shifted.
Shift-Reduce Parsers • The action
Shift-Reduce Parsers • The go_to
6.2 LR parsers • LR(1): left-to-right scanning • rightmost derivation(reverse) • 1-token lookahead • LR parsers are deterministic • no backup, no retry • LR(k) parsers decide the next • action by examining the tokens • already shifted and • at most k lookahead tokens. • LR (k) is the most powerful of • deterministic bottom-up parsers • with at most k lookahead tokens.
G is LR(k) iff the three conditions: • (1) S aAw abw • (2) S gBx aby • (3) Firstk(w)=Firstk(y) • implies that aAw = gBx. • That is, • a = g • A = B • w = x * rm rm * rm rm
Suppose in some LR(k) grammar, there • are two rightmost sentential forms: • abw (w and y are strings of • aby terminals.) • such that Firstk(w)=Firstk(y) • If abw aAw, then aby aAy • (That is, the same reduction of b is • applied to both abw and aby.) • On the other hand, if the same reduction • of b is always applied to such abw and • aby in the rightmost derivation, then • the grammar is LR(k). rm rm
Use the four small grammars to motivate the construction of LR tables.
6.2.1 LR(0) tables • A production has the form AX1X2...Xj. • By adding a dot, we get an item • (configuration) • e.g. A·X1 X2 ... Xj • AX1 ·X2 ... Xj • ... ... • AX1 X2 ... Xj · • The· indicates how much of a RHS has • been shifted into the stack.
An item with the · at the end of the RHS, • AX1 X2 ... Xj · • indicates (or recognized) that RHS • should be reduced to LHS. • An item with the · at the beginning of • RHS, i.e. • A·X1 X2 ... Xj • predicts that RHS will be shifted into the • stack.
An LR(0) state is a set of items. • This means that the actual state of LR(0) • parsers is denoted by one of the items. • The close operation: • if there is an item Aa·Bb in the set • then add all items of the form B·g • to the set. • The initial state is • close( { S·a$ } ) • where S is the start symbol. • Show the construction for grammar • S E $ • E E + T • E T • T id • T ( E )
Constructing the LR(0) machine • S E $ • E E + T • E T • T id • T ( E ) • The initial state is • close( { E ·E $ } )
The state diagram is called the • characteristic finite state • machine • (CFSM) of the grammar. • CFSM is the goto table of • LR(0) parsers. • To construct the action table • of LR(0) parsers, • we use the following rules:
Action table of LR(0) • 1. S action[S] = • ... reduce with B r • B r· • ... • 2. S action[S] = shift • ... where a Vt • B a·ab • ... • 3. S action[S] = accept • ... • S a$· • ... • 4. otherwise, action[S] = error • * Show the action table for the previous • example.
0 id+id$ 0------->5 +id$ id T 0 T+id$ 0-------->9 +id$ T E 0 E+id$ 0-------->1 +id$ 0------>1------->3 id$ 0------->1---->3------->5 $ id T 0------->1----->3 T$ 0------>1----->3------>4 $ E+T E 0 E$ 0------->1 $ 0------>1---->2 accept id T E E + + E id E + E + T E E $
Two kinds of conflicts: • 1. shift-reduce conflict • if there exist S such that action(S) • can be either shift or reduce • In this case, the parser does not • know whether to shift or to reduce. • 2. reduce-reduce conflict • if there exist S such that action(S) • contains two reduce entries. • In this case, the parser does not • know which production to use in • reduction. • A grammar is LR(0) iff there is no • conflict in the action table.
Few practical grammars are LR(0). • 1. For instance, consider any • l-production A . • If A can generate any terminal string, • then there must be a shift-reduce conflict • Suppose b First(A), • a·by----------------> aA·by or • a·by----------------> ab·y • 2. Also consider operator precedence: • id + id ·+ id-----------------> E·+ id or • id + id ·* id------------------> id + id * ·id • (remember no lookahead!) reduce to shift reduce to shift
Few practical grammars are LR(0). • Every practical language has • an LR(0) grammar. • Contradiction? • Show LR(0) and LR(1) grammars for • Pascal.
6.2.2 Correctness of LR(0) • variable prefix • CFSM accepts all and only viable prefix. • Action table is correct.
correctness of LR(0) • viable prefix: • Let abg be a right sentential form • (i.e. Sabg) • and b is the handle. • Then any prefix of ab is a viable prefix. • Lemma 1 (Viable Prefix Property). • The LR(0) machine accepts all and • only viable prefixes. • Lemma 2. • The parser performs correct shift • and reduction actions at all steps. * rm
Proof. Consider how the LR(0) machine is constructed: S A a B $ A d D e B g G h S·AaB$ d Ad·De D AdD·e A·dDe e A reduction S-->A·aB$ AdDe· a g G AAa·B$ Bg·Gh BgG·h B h reduction SAaB·$ BgGh· $ SAaB$·
6.3 LR(1) parsing • An LR(1) item has the form • A X1X2... Xi·Xi+1... Xj, l • l { l } Vt • l is the set of terminals that may • follow A in some context. • close(S) • { for each item in S do • if the item is B d·Ar,l • then add A ·g, First(rl) • for each production with • LHS A (i.e. A g ) • }
Ex. S E $ E E + T E T T ID T ( E ) First(S) = First(E) = First(T) = { ID, ( } close( S ·E$, { l } ) ={ S ·E$, { l } E ·E+T, { $+ } E ·T, { $+ } T ·ID, { $+ } T ·(E), { $+ } }
Constructing the LR(1) machine • A state is a set of items. • A b·xr,l x A bx·r,l • ...... ...... • (then close the set) • Starting state is • close(S ·E$ {l}) • Ex. • S E $ • E E + T • E T • T T * P • T P • P ID • P ( E )
Action table of LR(1) • 1. S action[S,a] = • ... reduce with B r • B r·, { a } • ... • 2. S action[S,a] = shift • ... where a Vt • B a·ab, l • ... • 3. S action[S,$] = accept • ... • S a·$, { l } • ... • 4. otherwise, action[S,x] = error • * Show the action table for the previous • example.
6.4.1 correctness of LR(1) • Lemma. • A state S contains an item A a·,a, • iff there exists SbAawbaaw • where S is the state of LR(1) machine • when ba is shifted. • Implication: • The lookahead in LR(1) is exact. * rm rm
Proof of the Lemma. S0 S·b1X1g1,l ... b1 Sb1·X1g1,l X1b2·X2g2,l1 ... b2 ... X1·b2X2g2,l1 X2·b3X3g3,l2 ... b3 X2b3·X3g3,l2 X3·b4X4g4,l3 ... ... ... Xk·bk+1Agk+1,lk A·a,lk+1 a Aa·,lk+1
a lk+1 = First(gk+1lk) = First(gk+1gklk-1) = First(gk+1gkgk-1lk-2) ... ... = First(gk+1gk...g1l) S b1X1g1 b1b2X2g2r1 b1b2b3X3g3r2r1 ... ... b1b2 ... bk+1Ark+1rk ... r1 (that is, b1b2 ... bk+1Aaw) b1b2 ... bk+1aaw (that is, baaw) rm * rm * rm * rm rm
Theorem (Correctness of LR(1)). • Let G be an LR(1) grammar. • z is a sentence of G iff z can be • parsed by LR(1) parser. • pf.() stack input • S0 Z • ... • after a few steps • S • r y • we may write y as • X1... Xjw • We may write rX1... Xj as ab, where b is • the handle of abw. • Due to the viable prefix lemma, the • parser will shift X1... Xj into stack. • S • r X1... Xj w • a b • b is the handle. The parser is about to • perform a reduction.
Consider one reduction step of the parser Assume SaAw abw Z * rm rm * rm
We may write as (assume w = aw¢) S0 S¢’ aw¢ a b (since b is the S0a Bu·As,b b handle) ... ... A·b,a Ab·,a A perform a reduction BuA·s, b ... ...