More on LR Parsing

More on LR Parsing Aggelos Kiayias Computer Science & Engineering Department The University of Connecticut 191 Auditorium Road, Box U-155 Storrs, CT 06269-3155 akiayias@cse.uconn.edu http://www.cse.uconn.edu/~akiayias

Picture So Far • SLR construction:based on canonical collection of LR(0) items – gives rise to canonical LR(0) parsing table. • No multiply defined labels => Grammar is called “SLR(1)” • More general class: LR(1) grammars.Using the notion of LR(1) item and the canonical LR(1) parsing table.

LR(1) Items • DEF. A LR(1) item is a production with a marker together with a terminal:E.g. [S  aA.Be, c]intuition: it indicates how much of a certain production we have seen already (aA) + what we could expect next (Be) + a lookahead that agrees with what should follow in the input if we ever do Reduce by the production S  aABeBy incorporating such lookahead information into the item concept we will make more wise reduce decisions. • Direct use of lookahead in an LR(1) item is only performed in considering reduce actions. (I.e. when marker is in the rightmost). • Core of an LR(1) item [S  aA.Be, c] is the LR(0) item S  aA.Be • Different LR(1) items may share the same core.

Usefulness of LR(1) items • E.g. if we have two LR(1) items of the form • [ A  . , a ][B  . , b ] we will take advantage of the lookahead to decide which reduction to use (the same setting would perhaps produce a reduce/reduce conflict in the SLR approach). • How the Notion of Validity changes: • An item [ A  1.2 , a ] is valid for a viable prefix 1 if we have a rightmost derivation that yields Aaw which in one step yields 12aw

Constructing the Canonical Collection of LR(1) items • Initial item: [ S’  .S , $] • Closure. (more refined) if[A.B , a]belongs to the set of items, and B   is a production of the grammar, then:we add the item [B  . , b]for all bFIRST(a) • Goto. (the same)A state containing [A.X , a] will move to a state containing [AX. , a] with label X • Every state is closed according to Closure. • Every state has transitions according to Goto.

Constructing the LR(1) Parsing Table • Shift actions: (same)If [A.b , a] is in state Ik and Ik moves to state Imwith label b then we add the actionaction[k, b] = “shift m” • Reduce actions: (more refined)If [A. , a] is in state Ik then we add the action:“Reduce A”into action[A, a]Observe that we don’t use information from FOLLOW(A) anymore. • Goto part of the table is as before.

Example I construction S’  SS  CC C  c C | d FIRST S c d C c d

Example II S’  SS  L = R | R L  * R | id R  L FIRST S *id L *id R *id

LR(1) more general to SLR(1): action[2, = ] ? s6 (because of S  L . = R ) THERE IS NO CONFLICT ANYMORE I0 = { [S’  .S , $ ] [S  .L = R , $ ] [S  .R , $ ] [L  .* R , = / $ ] [L . id , = / $ ] [R  .L , $ ] } I1 = {[S’  S . , $ ]} S’  SS  L = R | R L  * R | id R  L I2 = { [S  L . = R , $ ] [R  L . , $ ] } I3 = { [S  R. , $ ]} I4 = { [L  *.R , = / $ ] [R  .L , = / $ ] [L  .* R , = / $ ] [L . id , = / $ ] } I5 = {[L  id. , = / $ ]}I6 = { [S  L = . R , $ ] [R  .L , $ ] [L  .* R , $ ] [L . id , $ ] } I7 = {[L  *R. , = / $ ]} I8 = {[R  L. , = / $ ]} I9 = {[L  *.R , $ ][R  .L , $ ][L  .* R , $ ] [L . id , $ ] } I10 = {[L  *R. , $ ]} I11 = {[L  id. , $ ]} I12 = {[R  L. , $ ]}

LALR Parsing • Canonical sets of LR(1) items • Number of states much larger than in the SLR construction • LR(1) = Order of thousands for a standard prog. Lang. • SLR(1) = order of hundreds for a standard prog. Lang. • LALR(1) (lookahead-LR) • A tradeoff: • Collapse states of the LR(1) table that have the same core (the “LR(0)” part of each state) • LALR never introduces a Shift/Reduce Conflict if LR(1) doesn’t. • It might introduce a Reduce/Reduce Conflict (that did not exist in the LR(1))… • Still much better than SLR(1) (larger set of languages) • … but smaller than LR(1), actually ~ SLR(1) • What Yacc and most compilers employ.

Collapsing states with the same core. • E.g., If I3I6 collapse then whenever the LALR(1) parser puts I36 into the stack, the LR(1) parser would have either I3or I6 • A shift/reduce action would not be introduced by the LALR “collapse” • Indeed if the LALR(1) has a Shift/Reduce conflict this conflict should also exist in the LR(1) version: this is because two states with the same core would have the same outgoing arrows. • On the other hand a reduce/reduce conflict may be introduced. • Still LALR(1) preferred: table proportional to SLR(1) • Direct construction is also possible.

Error Recovery in LR Parsing • For a given stack $...Ii and input symbols s…s’…$ it holds that action[i,s] = empty • Panic-mode error recovery.

Panic Recovery Strategy I • Scan down the stack till a state Ij is found • Ij moves with the non-terminal A to some state Ik • Ik moves with s’ to some state Ik’ • Proceed as follows: • Pop all states till Ij • Push A and state Ik • Discard all symbols from the input till s’ • There may be many choices as above. • [essentially the parser in this way determines that a string that is produced by A has an error; it assumes it is correct and advances] • Error message: construct of type “A” has error at location X

Panic Recovery Strategy II • Scan down the stack till a state Ij is found • Ij moves with the terminal t to some state Ik • Ik with s’ has a valid action. • Proceed as follows: • Pop all states till Ij • Push t and state Ik • Discard all symbols from the input till s’ • There may be many choices as above. • Error message: “missing t”

Example E’  EE  E + E | | E * E | ( E) | id goto action

Collection of LR(0) items E’  EE  E + E | | E * E | ( E) | id I0 I2 I5 I8 E’  .E E  (. E ) E  E * . E E  E * E . E  .E + E E  .E + E E  .E + E E  E . + E E  .E * E E  .E * E E  .E * E E  E . * E E  .( E ) E  .( E ) E  .( E ) E  .id E  .id E  .id I1 I3 I6 I9 E’  E. E  id. E  ( E . ) E  ( E ) . E  E . + E E  E . + E E  E . * E I4 E  E . * E E  E + . E E  .E + E I7 E  .E * E E  E + E . E  .( E ) E  E . + E E  .id E  E . * E Follow(E’)=$ Follow(E)=+*)$

The parsing table id + * ( ) $ E 0 s3 s2 1 1 s4 s5 acc 2 s3 s2 6 3 r4 r4 r4 r4 4 s3 s2 7 5 s3 s2 8 6 s4 s5 s9 7 s4/r1s5/r1 r1 r1 8 s4/r2 s5/r2 r2 r2 9 r3 r3 r3 r3

Error-handling id + * ( ) $ E 0 s3 e1 s2 1 1 s4 s5 acc 2 s3 s2 6 3 r4 r4 r4 r4 4 s3 s2 7 5 s3 s2 8 6 s4 s5 s9 7 s4/r1s5/r1 r1 r1 8 s4/r2 s5/r2 r2 r2 9 r3 r3 r3 r3

Error-handling I0 I2 I5 I8 E’  .E E  (. E ) E  E * . E E  E * E . E  .E + E E  .E + E E  .E + E E  E . + E E  .E * E E  .E * E E  .E * E E  E . * E E  .( E ) E  .( E ) E  .( E ) E  .id E  .id E  .id e1 Push E into the stack and move to state 1 “missing operand” : e1 Push id into the stack and change to state 3 “missing operand”

Error-handling id + * ( ) $ E 0 s3 e1e1 s2 e1 1 1 s4 s5 acc 2 s3 s2 6 3 r4 r4 r4 r4 4 s3 s2 7 5 s3 s2 8 6 s4 s5 s9 7 s4/r1s5/r1 r1 r1 8 s4/r2 s5/r2 r2 r2 9 r3 r3 r3 r3

Error-handling id + * ( ) $ E 0 s3 e1e1 s2 e2e1 1 1 s4 s5 e2 acc 2 s3 s2 6 3 r4 r4 r4 r4 4 s3 e1 s2 7 5 s3 s2 8 6 s4 s5 s9 7 s4/r1s5/r1 r1 r1 8 s4/r2 s5/r2 r2 r2 9 r3 r3 r3 r3

Error-handling e2 remove “)” from input. “unbalanced right parenthesis” Try the input id+)

Error-handling state 1 id + * ( ) $ E 0 s3 e1e1 s2 e2e1 1 1 e3 s4 s5 acc 2 s3 s2 6 3 r4 r4 r4 r4 4 s3 s2 7 5 s3 s2 8 6 s4 s5 s9 7 s4/r1s5/r1 r1 r1 8 s4/r2 s5/r2 r2 r2 9 r3 r3 r3 r3

Error-Handling I1 I3 I6 I9 E’  E. E  id. E  ( E . ) E  ( E ) . E  E . + E E  E . + E E  E . * E I4 E  E . * E E  E + . E E  .E + E I7 E  .E * E E  E + E . E  .( E ) E  E . + E E  .id E  E . * E e3 Push + into the stack and change to state 4 “missing operator”

Intro to Translation • Side-effects and Translation Schemes. • Do the construction as before but: • Side-effect in front of a symbol will be executed in a state when we make the move following that symbol to another state. • Side-effects on the rightmost end are executed during reduce actions. side-effects attached to the symbols to the right of them. E’  EE  E + E {print(+)} | E * E {print(*)} | {parenthesis++}( E) {parenthesis--} | id { print(id); print(parenthesis); } Do for example id*(id+id)$

More on LR Parsing

More on LR Parsing

Presentation Transcript

LR(k) Parsing

Lecture 5: LR Parsing

Error detection in LR parsing

Canonical LR Parsing Tables

A little bit about LR Parsing

LR(k) Parsing

LR Parsing Table Costruction

Chap6 LR Parsing

LR Parsing

LR Parsing

Chap6 LR Parsing

Introduction to LR Parsing

More SLR /LR(1)

LR Parsing – The Items

More Parsing

LR parsing techniques

LR(k) Parsing

LR Parsing – The Tables

LR parsing techniques

More LR Parsing and Bison

LR Parsing

LR Parsing