1 / 25

More on LR Parsing

More on LR Parsing. Aggelos Kiayias Computer Science & Engineering Department The University of Connecticut 191 Auditorium Road, Box U-155 Storrs, CT 06269-3155. akiayias@cse.uconn.edu http://www.cse.uconn.edu/~akiayias. Picture So Far.

ellie
Download Presentation

More on LR Parsing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. More on LR Parsing Aggelos Kiayias Computer Science & Engineering Department The University of Connecticut 191 Auditorium Road, Box U-155 Storrs, CT 06269-3155 akiayias@cse.uconn.edu http://www.cse.uconn.edu/~akiayias

  2. Picture So Far • SLR construction:based on canonical collection of LR(0) items – gives rise to canonical LR(0) parsing table. • No multiply defined labels => Grammar is called “SLR(1)” • More general class: LR(1) grammars.Using the notion of LR(1) item and the canonical LR(1) parsing table.

  3. LR(1) Items • DEF. A LR(1) item is a production with a marker together with a terminal:E.g. [S  aA.Be, c]intuition: it indicates how much of a certain production we have seen already (aA) + what we could expect next (Be) + a lookahead that agrees with what should follow in the input if we ever do Reduce by the production S  aABeBy incorporating such lookahead information into the item concept we will make more wise reduce decisions. • Direct use of lookahead in an LR(1) item is only performed in considering reduce actions. (I.e. when marker is in the rightmost). • Core of an LR(1) item [S  aA.Be, c] is the LR(0) item S  aA.Be • Different LR(1) items may share the same core.

  4. Usefulness of LR(1) items • E.g. if we have two LR(1) items of the form • [ A  . , a ][B  . , b ] we will take advantage of the lookahead to decide which reduction to use (the same setting would perhaps produce a reduce/reduce conflict in the SLR approach). • How the Notion of Validity changes: • An item [ A  1.2 , a ] is valid for a viable prefix 1 if we have a rightmost derivation that yields Aaw which in one step yields 12aw

  5. Constructing the Canonical Collection of LR(1) items • Initial item: [ S’  .S , $] • Closure. (more refined) if[A.B , a]belongs to the set of items, and B   is a production of the grammar, then:we add the item [B  . , b]for all bFIRST(a) • Goto. (the same)A state containing [A.X , a] will move to a state containing [AX. , a] with label X • Every state is closed according to Closure. • Every state has transitions according to Goto.

  6. Constructing the LR(1) Parsing Table • Shift actions: (same)If [A.b , a] is in state Ik and Ik moves to state Imwith label b then we add the actionaction[k, b] = “shift m” • Reduce actions: (more refined)If [A. , a] is in state Ik then we add the action:“Reduce A”into action[A, a]Observe that we don’t use information from FOLLOW(A) anymore. • Goto part of the table is as before.

  7. Example I construction S’  SS  CC C  c C | d FIRST S c d C c d

  8. Example II S’  SS  L = R | R L  * R | id R  L FIRST S *id L *id R *id

  9. LR(1) more general to SLR(1): action[2, = ] ? s6 (because of S  L . = R ) THERE IS NO CONFLICT ANYMORE I0 = { [S’  .S , $ ] [S  .L = R , $ ] [S  .R , $ ] [L  .* R , = / $ ] [L . id , = / $ ] [R  .L , $ ] } I1 = {[S’  S . , $ ]} S’  SS  L = R | R L  * R | id R  L I2 = { [S  L . = R , $ ] [R  L . , $ ] } I3 = { [S  R. , $ ]} I4 = { [L  *.R , = / $ ] [R  .L , = / $ ] [L  .* R , = / $ ] [L . id , = / $ ] } I5 = {[L  id. , = / $ ]}I6 = { [S  L = . R , $ ] [R  .L , $ ] [L  .* R , $ ] [L . id , $ ] } I7 = {[L  *R. , = / $ ]} I8 = {[R  L. , = / $ ]} I9 = {[L  *.R , $ ][R  .L , $ ][L  .* R , $ ] [L . id , $ ] } I10 = {[L  *R. , $ ]} I11 = {[L  id. , $ ]} I12 = {[R  L. , $ ]}

  10. LALR Parsing • Canonical sets of LR(1) items • Number of states much larger than in the SLR construction • LR(1) = Order of thousands for a standard prog. Lang. • SLR(1) = order of hundreds for a standard prog. Lang. • LALR(1) (lookahead-LR) • A tradeoff: • Collapse states of the LR(1) table that have the same core (the “LR(0)” part of each state) • LALR never introduces a Shift/Reduce Conflict if LR(1) doesn’t. • It might introduce a Reduce/Reduce Conflict (that did not exist in the LR(1))… • Still much better than SLR(1) (larger set of languages) • … but smaller than LR(1), actually ~ SLR(1) • What Yacc and most compilers employ.

  11. Collapsing states with the same core. • E.g., If I3I6 collapse then whenever the LALR(1) parser puts I36 into the stack, the LR(1) parser would have either I3or I6 • A shift/reduce action would not be introduced by the LALR “collapse” • Indeed if the LALR(1) has a Shift/Reduce conflict this conflict should also exist in the LR(1) version: this is because two states with the same core would have the same outgoing arrows. • On the other hand a reduce/reduce conflict may be introduced. • Still LALR(1) preferred: table proportional to SLR(1) • Direct construction is also possible.

  12. Error Recovery in LR Parsing • For a given stack $...Ii and input symbols s…s’…$ it holds that action[i,s] = empty • Panic-mode error recovery.

  13. Panic Recovery Strategy I • Scan down the stack till a state Ij is found • Ij moves with the non-terminal A to some state Ik • Ik moves with s’ to some state Ik’ • Proceed as follows: • Pop all states till Ij • Push A and state Ik • Discard all symbols from the input till s’ • There may be many choices as above. • [essentially the parser in this way determines that a string that is produced by A has an error; it assumes it is correct and advances] • Error message: construct of type “A” has error at location X

  14. Panic Recovery Strategy II • Scan down the stack till a state Ij is found • Ij moves with the terminal t to some state Ik • Ik with s’ has a valid action. • Proceed as follows: • Pop all states till Ij • Push t and state Ik • Discard all symbols from the input till s’ • There may be many choices as above. • Error message: “missing t”

  15. Example E’  EE  E + E | | E * E | ( E) | id goto action

  16. Collection of LR(0) items E’  EE  E + E | | E * E | ( E) | id I0 I2 I5 I8 E’  .E E  (. E ) E  E * . E E  E * E . E  .E + E E  .E + E E  .E + E E  E . + E E  .E * E E  .E * E E  .E * E E  E . * E E  .( E ) E  .( E ) E  .( E ) E  .id E  .id E  .id I1 I3 I6 I9 E’  E. E  id. E  ( E . ) E  ( E ) . E  E . + E E  E . + E E  E . * E I4 E  E . * E E  E + . E E  .E + E I7 E  .E * E E  E + E . E  .( E ) E  E . + E E  .id E  E . * E Follow(E’)=$ Follow(E)=+*)$

  17. The parsing table id + * ( ) $ E 0 s3 s2 1 1 s4 s5 acc 2 s3 s2 6 3 r4 r4 r4 r4 4 s3 s2 7 5 s3 s2 8 6 s4 s5 s9 7 s4/r1s5/r1 r1 r1 8 s4/r2 s5/r2 r2 r2 9 r3 r3 r3 r3

  18. Error-handling id + * ( ) $ E 0 s3 e1 s2 1 1 s4 s5 acc 2 s3 s2 6 3 r4 r4 r4 r4 4 s3 s2 7 5 s3 s2 8 6 s4 s5 s9 7 s4/r1s5/r1 r1 r1 8 s4/r2 s5/r2 r2 r2 9 r3 r3 r3 r3

  19. Error-handling I0 I2 I5 I8 E’  .E E  (. E ) E  E * . E E  E * E . E  .E + E E  .E + E E  .E + E E  E . + E E  .E * E E  .E * E E  .E * E E  E . * E E  .( E ) E  .( E ) E  .( E ) E  .id E  .id E  .id e1 Push E into the stack and move to state 1 “missing operand” : e1 Push id into the stack and change to state 3 “missing operand”

  20. Error-handling id + * ( ) $ E 0 s3 e1e1 s2 e1 1 1 s4 s5 acc 2 s3 s2 6 3 r4 r4 r4 r4 4 s3 s2 7 5 s3 s2 8 6 s4 s5 s9 7 s4/r1s5/r1 r1 r1 8 s4/r2 s5/r2 r2 r2 9 r3 r3 r3 r3

  21. Error-handling id + * ( ) $ E 0 s3 e1e1 s2 e2e1 1 1 s4 s5 e2 acc 2 s3 s2 6 3 r4 r4 r4 r4 4 s3 e1 s2 7 5 s3 s2 8 6 s4 s5 s9 7 s4/r1s5/r1 r1 r1 8 s4/r2 s5/r2 r2 r2 9 r3 r3 r3 r3

  22. Error-handling e2 remove “)” from input. “unbalanced right parenthesis” Try the input id+)

  23. Error-handling state 1 id + * ( ) $ E 0 s3 e1e1 s2 e2e1 1 1 e3 s4 s5 acc 2 s3 s2 6 3 r4 r4 r4 r4 4 s3 s2 7 5 s3 s2 8 6 s4 s5 s9 7 s4/r1s5/r1 r1 r1 8 s4/r2 s5/r2 r2 r2 9 r3 r3 r3 r3

  24. Error-Handling I1 I3 I6 I9 E’  E. E  id. E  ( E . ) E  ( E ) . E  E . + E E  E . + E E  E . * E I4 E  E . * E E  E + . E E  .E + E I7 E  .E * E E  E + E . E  .( E ) E  E . + E E  .id E  E . * E e3 Push + into the stack and change to state 4 “missing operator”

  25. Intro to Translation • Side-effects and Translation Schemes. • Do the construction as before but: • Side-effect in front of a symbol will be executed in a state when we make the move following that symbol to another state. • Side-effects on the rightmost end are executed during reduce actions. side-effects attached to the symbols to the right of them. E’  EE  E + E {print(+)} | E * E {print(*)} | {parenthesis++}( E) {parenthesis--} | id { print(id); print(parenthesis); } Do for example id*(id+id)$

More Related