520 likes | 634 Views
CSE P501 – Compilers. LR ~ Bottom-Up ~ Shift-Reduce - concluded Dotted Items Building the Handles DFA Building Action & Goto Tables Building the Handles DFA: Theory Nullable , FIRST & FOLLOW SLR, LALR, LR Next. Recap: LR State Machine. Problem: when to reduce? when to shift?
E N D
CSE P501 – Compilers • LR ~ Bottom-Up ~ Shift-Reduce - concluded • Dotted Items • Building the Handles DFA • Building Action & Goto Tables • Building the Handles DFA: Theory • Nullable, FIRST & FOLLOW • SLR, LALR, LR • Next Jim Hogg - UW - CSE - P501
Recap: LR State Machine • Problem: when to reduce? when to shift? • Clairvoyance - foretelling the future - tricky algorithm! • Backtrack and try another path - slow and cumbersome • DFA for prefixes - great solution, but still magical • Action & Goto tables - encoding of 3, so still magical • Idea: devise algorithm for 3&4 - DFA to recognize handles • Language generated by a CFG is generally not regular (cannot be generated by regex, as we saw earlier) • But language of handles for a CFG is regular • So use DFA to recognize those handles • if (DFA accepts) then REDUCE else SHIFT Jim Hogg - UW - CSE - P501
Recap: Prefixes, Handles, Items • S is start symbol of a grammar G • If S =>* then is a sentential form of G • is a viable prefix of G if there is some derivation: S =>*rm Aw =>*rm w and is a prefix of • A viable prefix is a prefix of a sentential form in a rightmost derivation • The occurrence of in w is a handle of w Jim Hogg - UW - CSE - P501
Dotted Items An itemis a production containing a at some position in its RHS • eg: [A X Y ] [A X Y ] [A X Y ] The shows how much of the token stream we have seen so far Jim Hogg - UW - CSE - P501
Example Grammar 0. Z S $ ($ denotes end-of-file) 1. S ( L ) 2. S x 3. L S 4. L L , S • We have added a production Z with the original start symbol followed by end of file marker, $ • What language does this grammar generate? Jim Hogg - UW - CSE - P501
Handles DFA - Opening Skirmish 0. Z S$ 1. S (L ) 2. S x 3. L S 4. L L , S Initially • Stack is empty • Input is the right hand side of Z (ie, S$) • Initial configuration is [Z S$] • As before marks what we've seen so far - initially, nothing • But, since is just before S, we might expect to next see anything that can be derived from S. So: • just before ( or x Jim Hogg - UW - CSE - P501
Initial Items State 0. Z S$ 1. S ( L ) 2. S x 3. L S 4. L L , S • A state is just a set of items • Start: an initial set of items • Completion (or closure): additional productions whose LHS appears immediately to the right of the dot in some item already in the state Z S $ S ( L ) S x start item, or core Completion, or closure, of core Jim Hogg - UW - CSE - P501
Shift Action on x 0. Z S$ 1. S ( L ) 2. S x 3. L S 4. L L , S • If we shift and see x, we will transition to a state containing the item S x • So, add a new state with that item • In this case, closure of [S x] adds nothing • If we hit this state during parse, we will reduce - no alternative! Z S $ S ( L ) S x x S x Jim Hogg - UW - CSE - P501
Shift Action on ( 0. Z S$ 1. S ( L ) 2. S x 3. L S 4. L L , S • If, instead, we shift and see ( we obtain item S ( L ) • Its closure adds another 4 items, as shown S (L ) L L , S L S S ( L ) S x Z S $ S ( L ) S x ( Jim Hogg - UW - CSE - P501
Goto Actions 0. Z S$ 1. S ( L ) 2. S x 3. L S 4. L L , S • If we instead shift S, we pop the RHS from the stack exposing the first state. Add a Goto transition on S for this. Z S $ S ( L ) S x S Z S $ Jim Hogg - UW - CSE - P501
Building the Handles DFA 0. S’ S $ 1. S ( L ) 2. S x 3. L S 4. L L , S S’ S $ S ( L ) S x 1 Jim Hogg - UW - CSE - P501
Build Handles DFA – Step 2 0. S’ S $ 1. S ( L ) 2. S x 3. L S 4. L L , S S’ S $ S ( L ) S x 1 S 4 S’ S $ Jim Hogg - UW - CSE - P501
Build Handles DFA – Step 3 0. S’ S $ 1. S ( L ) 2. S x 3. L S 4. L L , S 2 S’ S $ S ( L ) S x x 1 S x S 4 S’ S $ Jim Hogg - UW - CSE - P501
Build Handles DFA – Step 4 0. S’ S $ 1. S ( L ) 2. S x 3. L S 4. L L , S 2 S’ S $ S ( L ) S x x 1 S x 3 ( S S ( L ) L S L L , S S ( L ) S x 4 S’ S $ Jim Hogg - UW - CSE - P501
Build Handles DFA – Step 5 0. S’ S $ 1. S ( L ) 2. S x 3. L S 4. L L , S 2 S’ S $ S ( L ) S x x 1 S x x ( S S ( L ) L S L L , S S ( L ) S x 4 S’ S $ 3 Jim Hogg - UW - CSE - P501
Build Handles DFA – Step 6 0. S’ S $ 1. S ( L ) 2. S x 3. L S 4. L L , S 2 S’ S $ S ( L ) S x x 1 S x x ( S S ( L ) L S L L , S S ( L ) S x 4 S’ S $ 3 S L S 7 Jim Hogg - UW - CSE - P501
Build Handles DFA – Step 7 0. S’ S $ 1. S ( L ) 2. S x 3. L S 4. L L , S 2 S’ S $ S ( L ) S x x 1 S x x ( S ( S ( L ) L S L L , S S ( L ) S x 4 S’ S $ 3 S L S 7 Jim Hogg - UW - CSE - P501
Build Handles DFA – Step 8 0. S’ S $ 1. S ( L ) 2. S x 3. L S 4. L L , S 2 S’ S $ S ( L ) S x x 1 S x x ( S ( S ( L ) L S L L , S S ( L ) S x 4 S’ S $ L S ( L ) L L , S 5 3 S L S 7 Jim Hogg - UW - CSE - P501
Build Handles DFA – Step 9 0. S’ S $ 1. S ( L ) 2. S x 3. L S 4. L L , S 2 S’ S $ S ( L ) S x x 1 S x x ( S ( S ( L ) L S L L , S S ( L ) S x 4 S’ S $ L S ( L ) L L , S 5 3 ) S S ( L ) 6 L S 7 Jim Hogg - UW - CSE - P501
Build Handles DFA – Step 10 0. S’ S $ 1. S ( L ) 2. S x 3. L S 4. L L , S 2 S’ S $ S ( L ) S x 8 x 1 S x L L , S S ( L ) S x x ( S ( , S ( L ) L S L L , S S ( L ) S x 4 S’ S $ L S ( L ) L L , S 5 3 ) S S ( L ) 6 L S 7 Jim Hogg - UW - CSE - P501
Build Handles DFA – Step 11 0. S’ S $ 1. S ( L ) 2. S x 3. L S 4. L L , S 2 S’ S $ S ( L ) S x 8 x 1 S x L L , S S ( L ) S x 9 S L L , S x ( S ( , S ( L ) L S L L , S S ( L ) S x 4 S’ S $ L S ( L ) L L , S 5 3 ) S S ( L ) 6 L S 7 Jim Hogg - UW - CSE - P501
Build Handles DFA – Step 12 0. S’ S $ 1. S ( L ) 2. S x 3. L S 4. L L , S 2 S’ S $ S ( L ) S x 8 x 1 S x x L L , S S ( L ) S x 9 S L L , S x ( S ( , S ( L ) L S L L , S S ( L ) S x 4 S’ S $ L S ( L ) L L , S 5 3 ) S S ( L ) 6 L S 7 Jim Hogg - UW - CSE - P501
Build Handles DFA – Step 13 0. S’ S $ 1. S ( L ) 2. S x 3. L S 4. L L , S 2 S’ S $ S ( L ) S x 8 x 1 S x x L L , S S ( L ) S x 9 S L L , S x ( ( S ( , S ( L ) L S L L , S S ( L ) S x 4 S’ S $ L S ( L ) L L , S 5 3 ) S S ( L ) 6 L S 7 Jim Hogg - UW - CSE - P501
Building Action & Goto Tables 2 S’ S $ S ( L ) S x 8 x 1 S x x L L , S S ( L ) S x 9 S L L , S x ( ( S ( , S ( L ) L S L L , S S ( L ) S x S ( L ) L L , S 5 4 S’ S $ L ) 3 S ( L ) 6 S 0. S’ S $ 1. S ( L ) 2. S x 3. L S 4. L L , S L S 7
Building the Handles DFA: Theory • Closure(S) • S is a state in the Handles-DFA • Closure adds all items implied by items already in S • Goto (S, A) • S is a state in the Handles-DFA (set-of-items) • A is a grammar symbol (single non-terminal) • Goto moves the dot past symbol A in all appropriate items in S Jim Hogg - UW - CSE - P501
Closure Algorithm fun Closure(S) = // S is a state in the Handles-DFA repeat foreachitem = [AB] in S do// B N foreachproduction B do S = [B] enddo enddo until S does not change return S // update-in-place endfun Adds all the items to S that it should have: given B we are about to see a token that starts B; equally well, a token that starts any production of B Jim Hogg - UW - CSE - P501
Goto Algorithm funGoto(S, X) = // S is node in DFA = set-of-Items // X is a terminal or non-terminal new = {} foreachitem = [AX] in S do new = [AX] enddo return Closure(new) endfun • How to find new, or existing, DFA state we reach on starting in state S and reading X. • We view S as a set-of-items Jim Hogg - UW - CSE - P501
LR(0) Construction • Our LR Parse Table and Algorithm is based only on what we can see on the parse stack; so far, we ignore lookahead • So our grammar is restricted to LR(0) • Augment grammar with extra start production S’ S $ • Let N be the set of states (Nodes in DFA) • Let E be the set of edges • Initialize N to Closure( [S’ S $] ) • Initialize E to empty Jim Hogg - UW - CSE - P501
LR(0) Construction Algorithm repeat foreachstate S in DFA do foreachitem [A X] in S do new = Goto (S, X) S = new E = I x new enddo enddo until E and S do not change View S (state in DFA) as a set-of-items. So S = new For symbol $, don’t compute Goto(I, $); instead, make this an accept action. Jim Hogg - UW - CSE - P501
Building the Parse Tables (1) foreachedge = I x J do if X T then Action[I, X] = sj // shift and goto state j if X N then Goto[I, X] = gj // goto state j enddo Key T = set of Terminals N = set of NonTerminals Jim Hogg - UW - CSE - P501
Building the Parse Tables (2) foreachstate S do foreachitem = [S' S$] do Action[I, $] = accept enddo foreachitem = [A ] do Action[I, *] = rj, where j is production number for A enddo enddo Jim Hogg - UW - CSE - P501
Where have we reached? We have built LR(0) DFA and Parser Tables (Action & Goto) • No lookahead yet • Variations of LR parsers add lookahead info, but basic idea of states, closures, and edges remains the same Jim Hogg - UW - CSE - P501
A Grammar that is not LR(0) • We cannot parse the grammar below using LR(0), because it produces a Shift-Reduce conflict: • S E $ • E T + E • E T • T x (This grammar generates strings: x, x + x, x + x + x, etc) Jim Hogg - UW - CSE - P501
0. S E $ 1. E T + E 2. E T 3. T x LR(0) Parser for 2 1 E S E $ S E $ E T + E E T T x 3 T E T + E E T x + T 5 4 x T x E T + E E T + E E T T x 6 • SR conflict in State 3 • So grammar is not LR(0) E E T + E Jim Hogg - UW - CSE - P501
SLR Parsers • Idea: Use information about what can follow a non-terminal to decide if we should perform a reduction • Easiest form is SLR – "Simple LR" • We need to compute FOLLOW(A) – the set of symbols that can immediately follow A in any possible derivation • ie, terminal t is in FOLLOW(A) if any derivation contains At • To compute this, we need to compute FIRST() for strings that can follow A Jim Hogg - UW - CSE - P501
Nullable, FIRST& FOLLOW • Nullable(A) is true if A can derive the empty string • ie, we can find A =>* • FIRST(A)= set of Terminals that can begin strings derived from A • FIRST() = set of Terminals that can begin strings derived from • FOLLOW(A) is the set of terminals that can immediately follow A • Recall standard notation: • A N // Non-terminal • (N T)* // String of Non-terminals and Terminals Jim Hogg - UW - CSE - P501
Computing FIRST: Intuition • To calculate FIRST(): • if = A then FIRST() = FIRST(A), right? • Yes, but what if A has an epsilon production: A • Then we need to union-in FOLLOW(A) (in this case, equal to FIRST()) Jim Hogg - UW - CSE - P501
Example Z d Z X Y Z Y ε Y c X Y X a Jim Hogg - UW - CSE - P501
Example Nullable? FIRST FOLLOW Z d Z X Y Z Y ε Y c X Y X a Iterate thru productions, until tables stop changing Jim Hogg - UW - CSE - P501
Nullable, FIRST & FOLLOW: Intuition • To compute, start by: • A N, FIRST(A) = FOLLOW(A) = {} • A N, Nullable(A) = false • a T, FIRST(a) = {a} • Rules: • if A a then FIRST(A) = a • if A Y1 and Y1 not Nullable then FIRST(A) = FIRST(Y1) • if A Y1Y2 and Y1 is Nullable then FIRST(A) = FIRST(Y2) • if A Y1Y2Y3 and Y1,Y2 both Nullable then FIRST(A) = FIRST(Y3) • if A Y1..Yn and all Yi are Nullable then FIRST(A) = a T // Terminal A N // Non-terminal Y (N T)// Terminal or Non-terminal (N T)* // String of Terminals and Non-terminals Jim Hogg - UW - CSE - P501
Computing Nullable Nullable(A) is true if A can derive the empty string. ie, we can find A =>* foreachA N doNullable(A) = false enddo repeat foreachA Y1..Yk in P do if Y1..Yk are all NullablethenNullable(A) = true enddo untilNullable does not change Nullable :: N Bool • Intuition: • Suppose A XY then we can derive A =>* only if both X =>* AND Y =>* Jim Hogg - UW - CSE - P501
Computing FIRST FIRST :: N {T} foreach A N do FIRST(A) = {} enddo repeat foreachA Y1..Yk in P do FIRST(A) = FIRST(Y1) foreachiin [1..k-1] do ifNullable(Yi) then FIRST(A) = FIRST(Yi+1) else break enddo if all Y1..Yk are NullablethenFIRST(A) = enddo until FIRST does not change Jim Hogg - UW - CSE - P501
Computing FOLLOW FOLLOW :: N {T} foreachA N do FOLLOW(A) = {} enddo repeat foreachA Y1..Yk in P do foreachiin [1..k-1] do if Yi N then FOLLOW(Yi) = FIRST(Yi+1) enddo enddo until FOLLOW does not change • Intuition: • We cannot tell anything about FOLLOW(A) by looking at A's productions • But, given A Y1..Yk we can infer FOLLOW(Yi). • Eg: AaBcdEFg => FOLLOW(B) = c; FOLLOW(E) = FIRST(F); etc Jim Hogg - UW - CSE - P501
SLR Construction • Identical to LR(0) states. But refine the reduce actions • Algorithm: foreachstate S in DFA do foreachitem [A] in S do reduce by A onlyiflookahead FOLLOW(A) enddo enddo Jim Hogg - UW - CSE - P501
0. S E $ 1. E T + E 2. E T 3. T x LR(0) Parser for 1 2 E S E $ S E $ E T + E E T T x 3 T E T + E E T x + T 5 4 x T x E T + E E T + E E T T x By taking account of lookahead, we can eliminate 5 reduce actions 6 E E T + E Jim Hogg - UW - CSE - P501
On To LR(1) • Many practical grammars are SLR • LR(1) is even more powerful • Similar construction for LR(1), but notion of an item is more complex, incorporating lookahead information Jim Hogg - UW - CSE - P501
LR(1) Items • An LR(1) item [A , a] is • A grammar production (A ) • A right hand side position (the dot) • A lookahead symbol (a) • Idea: This item indicates that is the top of the stack and the next input is derivable from a • Full construction: see Cooper&Torczon Jim Hogg - UW - CSE - P501
LR(1) Tradeoffs • LR(1) • Pro: extremely precise; largest set of grammars / languages • Con: potentially very large parse tables with many states Jim Hogg - UW - CSE - P501
LALR(1) • Variation of LR(1), but merge any two states that differ only in lookahead • Example: these two would be merged [A ::= x , a] [A ::= x , b] Jim Hogg - UW - CSE - P501
LALR(1) vs LR(1) • LALR(1) tables can have many fewer states than LR(1) • LALR(1) may have reduce conflicts where LR(1) would not (but in practice this doesn’t happen often) • Most practical bottom-up parser tools are LALR(1)(egyacc, bison, CUP) Jim Hogg - UW - CSE - P501