Chap6 LR Parsing

Chap6 LR Parsing • Recall some terminologies: • phrase: • a (VtVn)*, A  Vn, Aa • simple phrase: • a  (Vt  Vn)*, A  Vn, A  a • handle of a sentential form: • leftmost simple phrase of the • sentential form. • e.g. • S consider the • sentential form • A a B • aAb a bB • a A b b B • handle simple • phrase *

shift-reduce parser: two operations: • shift input into the parse stack • until the handle is identified; then • reduce the handle to the LHS • nonterminal • e.g. Given the grammar • Pbegin S end $ • S  a ; S • S  begin S end ; S • S  • begin a ; begin a ; end ; end $ • input • stack • handles: l (trace this example) • a;S • l • begin S end; S • a; S • begin S end$

LOOKING AT THE TREE P begin S end $ a ; S begin S end ; S a ; S l l (bottom-up tree construction)

LOOKING AT THE DERIVATION P  begin S end $  begin a ; S end $  begin a ; begin S end ; S end $  begin a ; begin S end ; l end $  begin a ; begin a ; S end ; end $  begin a ; begin a ; l end; end $ handles (This is a rightmost derivation.)

Two questions: • 1. Have we reached the end of handles • and how long is the handle? • 2. Which nonterminal does the handle • reduce to? • We use tables to answer the questions. • ACTION table • GOTO table • We first show how to use the table, • then how to construct the table.

How does the LR(0) parser • work? • 2. How are the action and • goto tables constructed? • 3. Is LR(0) parsing correct? • 4. LR(1) • 5. SLR(1) • 6. LALR(1)

LR parsers are driven by: • action table, which specifies • what actions to take • (shift, reduce,accept or error) • goto table, which specifies state • transition • and we push states, rather than • symbols, onto the stack. • Each state represents a subtree • of the parse tree.

shift-reduce-driver /*look ahead 1 token*/ { push( start_state ); T := scanner(); do { S := state on top of stack switch( action(S,T) ) case shift: push( state(S,T) ); T := scanner(); break; case reduce i: m := length of RHS of prod. i; pop( m ); S := state on top of stack after poping; X := LHS of prod. i; push( state(S,X) ); break; case accept: ... case error: ... end } forever }

Shift-Reduce Parsers • The grammar G0 • A shift-reduce parser

Shift-Reduce Parsers

Shift-Reduce Parsers • The driver utilizes a parse stack that contains parse states, usually coded as integers. • The driver uses two tables, action and go_to. • The action tells the parser whether to shift, reduce, terminate successfully, or signal a syntax error. • The go_to table defines successor states after a token or LHS is matched and shifted.

Shift-Reduce Parsers • The action

Shift-Reduce Parsers • The go_to

6.2 LR parsers • LR(1): left-to-right scanning • rightmost derivation(reverse) • 1-token lookahead • LR parsers are deterministic • no backup, no retry • LR(k) parsers decide the next • action by examining the tokens • already shifted and • at most k lookahead tokens. • LR (k) is the most powerful of • deterministic bottom-up parsers • with at most k lookahead tokens.

LR Parsers: LR(0)

G is LR(k) iff the three conditions: • (1) S  aAw  abw • (2) S  gBx  aby • (3) Firstk(w)=Firstk(y) • implies that aAw = gBx. • That is, • a = g • A = B • w = x * rm rm * rm rm

Suppose in some LR(k) grammar, there • are two rightmost sentential forms: • abw (w and y are strings of • aby terminals.) • such that Firstk(w)=Firstk(y) • If abw  aAw, then aby  aAy • (That is, the same reduction of b is • applied to both abw and aby.) • On the other hand, if the same reduction • of b is always applied to such abw and • aby in the rightmost derivation, then • the grammar is LR(k). rm rm

Use the four small grammars to motivate the construction of LR tables.

6.2.1 LR(0) tables • A production has the form AX1X2...Xj. • By adding a dot, we get an item • (configuration) • e.g. A·X1 X2 ... Xj • AX1 ·X2 ... Xj • ... ... • AX1 X2 ... Xj · • The· indicates how much of a RHS has • been shifted into the stack.

An item with the · at the end of the RHS, • AX1 X2 ... Xj · • indicates (or recognized) that RHS • should be reduced to LHS. • An item with the · at the beginning of • RHS, i.e. • A·X1 X2 ... Xj • predicts that RHS will be shifted into the • stack.

An LR(0) state is a set of items. • This means that the actual state of LR(0) • parsers is denoted by one of the items. • The close operation: • if there is an item Aa·Bb in the set • then add all items of the form B·g • to the set. • The initial state is • close( { S·a$ } ) • where S is the start symbol. • Show the construction for grammar • S  E $ • E  E + T • E  T • T  id • T  ( E )

Constructing the LR(0) machine • S  E $ • E  E + T • E  T • T  id • T  ( E ) • The initial state is • close( { E ·E $ } )

LR Parsers: LR(0)

The state diagram is called the • characteristic finite state • machine • (CFSM) of the grammar. • CFSM is the goto table of • LR(0) parsers. • To construct the action table • of LR(0) parsers, • we use the following rules:

Action table of LR(0) • 1. S action[S] = • ... reduce with B  r • B  r· • ... • 2. S action[S] = shift • ... where a  Vt • B  a·ab • ... • 3. S action[S] = accept • ... • S  a$· • ... • 4. otherwise, action[S] = error • * Show the action table for the previous • example.

0 id+id$ 0------->5 +id$ id  T 0 T+id$ 0-------->9 +id$ T  E 0 E+id$ 0-------->1 +id$ 0------>1------->3 id$ 0------->1---->3------->5 $ id  T 0------->1----->3 T$ 0------>1----->3------>4 $ E+T  E 0 E$ 0------->1 $ 0------>1---->2 accept id T E E + + E id E + E + T E E $

Two kinds of conflicts: • 1. shift-reduce conflict • if there exist S such that action(S) • can be either shift or reduce • In this case, the parser does not • know whether to shift or to reduce. • 2. reduce-reduce conflict • if there exist S such that action(S) • contains two reduce entries. • In this case, the parser does not • know which production to use in • reduction. • A grammar is LR(0) iff there is no • conflict in the action table.

Few practical grammars are LR(0). • 1. For instance, consider any • l-production A . • If A can generate any terminal string, • then there must be a shift-reduce conflict • Suppose b  First(A), • a·by----------------> aA·by or • a·by----------------> ab·y • 2. Also consider operator precedence: • id + id ·+ id-----------------> E·+ id or • id + id ·* id------------------> id + id * ·id • (remember no lookahead!) reduce to shift reduce to shift

Few practical grammars are LR(0). • Every practical language has • an LR(0) grammar. • Contradiction? • Show LR(0) and LR(1) grammars for • Pascal.

6.2.2 Correctness of LR(0) • variable prefix • CFSM accepts all and only viable prefix. • Action table is correct.

correctness of LR(0) • viable prefix: • Let abg be a right sentential form • (i.e. Sabg) • and b is the handle. • Then any prefix of ab is a viable prefix. • Lemma 1 (Viable Prefix Property). • The LR(0) machine accepts all and • only viable prefixes. • Lemma 2. • The parser performs correct shift • and reduction actions at all steps. * rm

Proof. Consider how the LR(0) machine is constructed: S  A a B $ A  d D e B  g G h S·AaB$ d Ad·De D AdD·e A·dDe e A reduction S-->A·aB$ AdDe· a g G AAa·B$ Bg·Gh BgG·h B h reduction SAaB·$ BgGh· $ SAaB$·

6.3 LR(1) parsing • An LR(1) item has the form • A  X1X2... Xi·Xi+1... Xj, l • l  { l }  Vt • l is the set of terminals that may • follow A in some context. • close(S) • { for each item in S do • if the item is B  d·Ar,l • then add A  ·g, First(rl) • for each production with • LHS A (i.e. A  g ) • }

Ex. S  E $ E  E + T E  T T  ID T  ( E ) First(S) = First(E) = First(T) = { ID, ( } close( S  ·E$, { l } ) ={ S  ·E$, { l } E  ·E+T, { $+ } E  ·T, { $+ } T  ·ID, { $+ } T  ·(E), { $+ } }

Constructing the LR(1) machine • A state is a set of items. • A b·xr,l x A bx·r,l • ...... ...... • (then close the set) • Starting state is • close(S  ·E$ {l}) • Ex. • S  E $ • E  E + T • E  T • T  T * P • T  P • P  ID • P  ( E )

Fig. 6.16, pp.160LR(1) machine for G3

Figure 6.17 LR(1) action table for G3

Action table of LR(1) • 1. S action[S,a] = • ... reduce with B  r • B  r·, { a } • ... • 2. S action[S,a] = shift • ... where a  Vt • B  a·ab, l • ... • 3. S action[S,$] = accept • ... • S  a·$, { l } • ... • 4. otherwise, action[S,x] = error • * Show the action table for the previous • example.

6.4.1 correctness of LR(1) • Lemma. • A state S contains an item A  a·,a, • iff there exists SbAawbaaw • where S is the state of LR(1) machine • when ba is shifted. • Implication: • The lookahead in LR(1) is exact. * rm rm

Proof of the Lemma. S0 S·b1X1g1,l ... b1 Sb1·X1g1,l X1b2·X2g2,l1 ... b2 ... X1·b2X2g2,l1 X2·b3X3g3,l2 ... b3 X2b3·X3g3,l2 X3·b4X4g4,l3 ... ... ... Xk·bk+1Agk+1,lk A·a,lk+1 a Aa·,lk+1

a  lk+1 = First(gk+1lk) = First(gk+1gklk-1) = First(gk+1gkgk-1lk-2) ... ... = First(gk+1gk...g1l) S b1X1g1 b1b2X2g2r1 b1b2b3X3g3r2r1 ... ... b1b2 ... bk+1Ark+1rk ... r1 (that is, b1b2 ... bk+1Aaw) b1b2 ... bk+1aaw (that is, baaw) rm * rm * rm * rm rm

Theorem (Correctness of LR(1)). • Let G be an LR(1) grammar. • z is a sentence of G iff z can be • parsed by LR(1) parser. • pf.() stack input • S0 Z • ... • after a few steps • S • r y • we may write y as • X1... Xjw • We may write rX1... Xj as ab, where b is • the handle of abw. • Due to the viable prefix lemma, the • parser will shift X1... Xj into stack. • S • r X1... Xj w • a b • b is the handle. The parser is about to • perform a reduction.

Consider one reduction step of the parser Assume SaAw abw Z * rm rm * rm

We may write as (assume w = aw¢) S0 S¢’ aw¢ a b (since b is the S0a Bu·As,b b handle) ... ... A·b,a Ab·,a A perform a reduction BuA·s, b ... ...

Chap6 LR Parsing

Chap6 LR Parsing

Presentation Transcript

LR(k) Parsing

Lecture 5: LR Parsing

Error detection in LR parsing

Canonical LR Parsing Tables

A little bit about LR Parsing

LR(k) Parsing

LR Parsing Table Costruction

LR Parsing

LR Parsing

Chap6 LR Parsing

Introduction to LR Parsing

LR Parsing – The Items

LR parsing techniques

LR(k) Parsing

LR Parsing – The Tables

LR parsing techniques

More LR Parsing and Bison

LR Parsing

LR Parsing

LR Parsing