780 likes | 976 Views
CSE P501 – Compilers. LR Parsing Build Bottom-Up Parse Tree Handles Writing a Shift-Reduce Parser ACTION & GOTO Parse Tables Dotted Items SR & RR conflicts Next. LR Parsing. ‘Middle End’. Back End. Target. Source. Front End. chars. IR. IR. Scan. Select Instructions. Optimize.
E N D
CSE P501 – Compilers LR Parsing Build Bottom-Up Parse Tree Handles Writing a Shift-Reduce Parser ACTION & GOTO Parse Tables Dotted Items SR & RR conflicts Next Jim Hogg - UW - CSE - P501
LR Parsing ‘Middle End’ Back End Target Source Front End chars IR IR Scan Select Instructions Optimize tokens IR Allocate Registers Parse IR AST Emit Convert IR IR Machine Code AST = Abstract Syntax Tree IR = Intermediate Representation Jim Hogg - UW - CSE P501
S aABe A Abc| b B d a b b c d e • Build Bottom-Up Parse Tree Black dot marks how much we've read of the input token stream so far (none) Shift dot to right Jim Hogg - UW - CSE - P501
S aABe A Abc| b B d a b b c d e abbcde – done wrong – step 2 Can we reduce a ? No. Shift dot to right Jim Hogg - UW - CSE - P501
S aABe A Abc| b B d a b b c d e abbcde – done wrong – step 3 Can we reduce a or ab? Yes, using A b. Note: a b (in red) marks current frontier Jim Hogg - UW - CSE - P501
S aABe A Abc| b B d A | a b b c d e abbcde – done wrong – step 4 Note: frontier is now aA Can we reduce aA or A ? No. Shift dot Jim Hogg - UW - CSE - P501
S aABe A Abc| b B d A | a b b c d e abbcde – done wrong – step 5 Can we reduce aAb or Ab or b? Yes, using A b Jim Hogg - UW - CSE - P501
AA | | a b b c d e abbcde – done wrong – step 6 S aABe A Abc| b B d Can we reduce aAA or AA or A? No. Shift dot Jim Hogg - UW - CSE - P501
AA | | a b bc d e abbcde – done wrong – step 7 S aABe A Abc| b B d Can we reduce aAAc or AAc or Ac or c? No. Shift dot Jim Hogg - UW - CSE - P501
AA | | a b bcd e abbcde – done wrong – step 8 S aABe A Abc| b B d Can we reduce aAAcdor AAcdor Acdor ddord? No. Shift dot Jim Hogg - UW - CSE - P501
S a A B e A A b c| b B d AA | | a b bcde abbcde – done wrong – step 9 Can we reduce aAAcdeor AAcdeor Acdeor cde or de or e? No. We are stuck! Jim Hogg - UW - CSE - P501
Rewind • Rewind to step 5 • Let's not reduce using A b. Shift instead • Try again! Jim Hogg - UW - CSE - P501
A | a b b c d e abbcde – done right – step 5 S aABe A Abc| b B d Can we reduce aAb or Ab or b? Yes, we could, using A b. But it led to a dead-end, first time. So do not reduce. Instead, shift Jim Hogg - UW - CSE - P501
A | a b bc d e abbcde – done right – step 6 S aABe A Abc| b B d Can we reduce aAbc or Abc or bc or c? Yes, using A Abc. Jim Hogg - UW - CSE - P501
A | A | a b b c d e abbcde – done right – step 7 S aABe A Abc| b B d Can we reduce aAorA? No. Shift Jim Hogg - UW - CSE - P501
A | A | a b b c d e abbcde – done right – step 8 S aABe A Abc| b B d Can we reduce aAd or Ad ord? Yes, using B d Jim Hogg - UW - CSE - P501
A | A B | | a b b c d e abbcde – done right – step 9 S aABe A Abc| b B d Can we reduce aAB or AB? No. Shift Jim Hogg - UW - CSE - P501
A | A B | | a b b c d e abbcde – done right – step 10 S aABe A Abc| b B d Can we reduce aABe or ABe or Be or e ? Yes, using S aABe Jim Hogg - UW - CSE - P501
S A | A B | | a b b c d e abbcde – done right – step 11 S aABe A Abc| b B d We just executed a rightmost derivation, backwards: S => a AB e => a A d e => a A b c d e => a b b c d e Jim Hogg - UW - CSE - P501
LR(1) Parsing • Left to right scan; Rightmost derivation; 1 token lookahead • "Bottom-Up" approach. Also called Shift-Reduce parser • The syntax of almost all practical programming languages can be specified by an LR(1) grammar • LALR(1) and SLR are subsets of LR(1) • LALR(1) can parse most real languages, has a smaller memory footprint, and is used by CUP • All variants (SLR, LALR, LR) use same algorithm – but different driver tables Jim Hogg - UW - CSE - P501
LR Parsing in Greek • Bottom-up parser builds a rightmost derivation, backwards • Given the rightmost derivation: • S =>1=>2=>…=>n-2=>n-1=>n = w parser will first discover n-1=>n , then n-2=>n-1 , etc • But it discovers n-1=>n before seeing all of n: • S => a AB e => a A d e => a A b c d e => a b b c d e • X denotes rightmost terminal to derive • S <= a A B e <= a Ad e <= a A b c d e <= a b b c d e • denotes handle • denotes top-of-stack = end of input so far seen • Parsing terminates when • 1 reduced to S (start symbol, success), or • No match can be found (syntax error) Jim Hogg - UW - CSE - P501
Terminology : Sentential Forms • If S =>* , the string is called a sentential form of the of the grammar (not yet a sentence, but on its way) • In the derivation S =>1=>2=>…=>n-2=>n-1=>n = w each of the iare sentential forms • A sentential form in a rightmost derivation is called a rightsentential form Jim Hogg - UW - CSE - P501
Handles Informally, a handle is a substring of the frontier that matches the RHS of the "correct " production • Even ifA is a production, is a handle only if it matches the frontier at a point where A was used in the derivation. So, it’s a handle if we should reduce by it (yes, this definition is circular) • may appear in many other places in the frontier without being a handle for that particular production Jim Hogg - UW - CSE - P501
Handle – the Dragon Definition Formally, a handleof a right-sentential form is a production A and a position in where may be replaced by A to produce the previous right-sentential form in the rightmost derivation of Jim Hogg - UW - CSE - P501
Handle Examples In the derivation: S => a A B e => a Ad e => a Ab c d e => abbcde • abbcde is a right sentential form whose handle is Ab at position 2 • aAbcde is a right sentential form whose handle is AAbc at position 4 • Note: some books take the left of the match as the position – but it really doesn't matter Jim Hogg - UW - CSE - P501
Writing a Shift-Reduce Parser Key Data structures • A stack holding the frontier of the tree • The token-stream of remaining input Jim Hogg - UW - CSE - P501
Shift-Reduce Parser Operations • Reduce– if the top of stack is a handle (RHS of some A that we should use to reduce), pop , push A • Shift– push the next input symbol onto the stack • Accept– announce success • Error– syntax error discovered Jim Hogg - UW - CSE - P501
S aABe A Abc| b B d Shift-Reduce Example – Step 0 Stack Input Action $ abbcde$ shift • Note: • “$” marks bottom-of-stack • “$” also marks end-of-input • Neither one takes part in the parse. They don’t move. Jim Hogg - UW - CSE - P501
S aABe A Abc| b B d Shift-Reduce Example – Step 1 Stack Input Action $ abbcde$ shift $a bbcde$ shift • At each step, look for a handle at top-of-stack • If we find a handle, then reduce • If we don’t find a handle, then shift • We are relying on clairvoyance - foretelling the future - to decide which RHSs at top-of-stack are handles Jim Hogg - UW - CSE - P501
S aAB e A Abc| b B d Shift-Reduce Example Stack Input Action 0 $abbcde$ shift 1 $abbcde$ shift 2 $abbcde$ reduce 3 $aAbcde$ shift 4 $aAbcde$ shift 5 $aAbcde$ reduce 6 $aAde$ shift 7 $aAde$ reduce 8 $aABe$ shift 9 $aABe$ reduce 10 $S$ accept Jim Hogg - UW - CSE - P501
How Do We Automate This? • Lacking a clairvoyance function, we could resort to back-tracking. But it's too slow. It's a non-starter • Viable prefix– a prefix of a right-sentential form that can appear on the stack of the shift-reduce parser (on its way to a successful parse) • Idea: Construct a DFA to recognize viable prefixes given the stack and (one or two tokens from) remaining input • Perform reductions when we recognize handles Jim Hogg - UW - CSE - P501
S' S$ S aABe A Abc| b B d DFA for prefixes for: e accept 8 9 S a A B e B $ a A b c start 1 2 3 6 7 A A b c b d 4 5 A b B d • We have augmented the grammar with a unique start symbol, S’ • This DFA replaces our clairvoyance function – equally magical at this point! • States 4,5,7,9 of this DFA define the handles • Eg: if stack is …aAbc then reduce using A Abc (always at top of stack); then unwind, back to state 1 Jim Hogg - UW - CSE - P501
Stack Input $ abbcde$ e accept 8 9 S aABe B $ a start A b c 1 2 3 6 7 A Abc b d 4 5 A b B d Trace – Step 1 S' S$ S aABe A Abc| b B d • Stack = { } • DFA = state 1; not a “reduce” state, {4,5,7,9}, so shift Jim Hogg - UW - CSE - P501
Stack Input $a bbcde$ e accept 8 9 S aABe B $ a start A b c 1 2 3 6 7 A Abc b d 4 5 A b B d Trace – Step 2 S' S$ S aABe A Abc| b B d • Stack = a • DFA = 2; not a “reduce” state, so shift Jim Hogg - UW - CSE - P501
Stack Input $ab bcde$ e accept 8 9 S aABe B $ a start A b c 1 2 3 6 7 A Abc b d 4 5 A b B d Trace – Step 3 S' S$ S aABe A Abc| b B d • Stack = ab • DFA = 4 => reduce, using production A b • So, pop b (RHS of production); and push A (LHS or production) • Retrace to DFA state 1 Jim Hogg - UW - CSE - P501
Stack Input $aAbcde$ e accept 8 9 S aABe B $ a start A b c 1 2 3 6 7 A Abc b d 4 5 A b B d Trace – Step 4 S' S$ S aABe A Abc| b B d • Stack = aA • So transition states 1 2 3 • DFA = 3 => shift Jim Hogg - UW - CSE - P501
Stack Input $aAbcde$ e accept 8 9 S aABe B $ a start A b c 1 2 3 6 7 A Abc b d 4 5 A b B d Trace – Step 5 S' S$ S aABe A Abc| b B d • Stack = aAb • DFA = 6 => shift Jim Hogg - UW - CSE - P501
Stack Input $aAbcde$ e accept 8 9 S aABe B $ a start A b c 1 2 3 6 7 A Abc b d 4 5 A b B d Trace – Step 6 S' S$ S aABe A Abc| b B d • Stack = aAbc • DFA = 7 => reduce by A Abc • So, pop Abc push A • Retreat to state 1 Jim Hogg - UW - CSE - P501
Stack Input $aAde$ e accept 8 9 S aABe B $ a start A b c 1 2 3 6 7 A Abc b d 4 5 A b B d Trace – Step 7 S' S$ S aABe A Abc| b B d • Stack = aA • DFA = 3 => shift Jim Hogg - UW - CSE - P501
Stack Input $aAde$ e accept 8 9 S aABe B $ a start A b c 1 2 3 6 7 A Abc b d 4 5 A b B d Trace – Step 8 S' S$ S aABe A Abc| b B d • Stack = aAd • DFA = 5 => reduce by B d • So, pop d, push B • Retreat to DFA = 1 Jim Hogg - UW - CSE - P501
Stack Input $aABe$ e accept 8 9 S aABe B $ a start A b c 1 2 3 6 7 A Abc b d 4 5 A b B d Trace – Step 9 S' S$ S aABe A Abc| b B d • Stack = aAB • So transition states 1 2 3 8 • DFA = 8 => shift Jim Hogg - UW - CSE - P501
Stack Input $aABe$ e accept 8 9 S aABe B $ a start A b c 1 2 3 6 7 A Abc b d 4 5 A b B d Trace – Step 10 S' S$ S aABe A Abc| b B d • Stack = aABe • DFA = 9 => reduce by S aABe • So pop aABe, push S • Retreat to DFA = 1 Jim Hogg - UW - CSE - P501
Stack Input $S $ e accept 8 9 S aABe B $ a start A b c 1 2 3 6 7 A Abc b d 4 5 A b B d Trace – Step 11 S' S$ S aABe A Abc| b B d • Stack = S • DFA = 1 => shift Jim Hogg - UW - CSE - P501
Stack Input $S$ e accept 8 9 S aABe B $ a start A b c 1 2 3 6 7 A Abc b d 4 5 A b B d Trace – Step 12 S' S$ S aABe A Abc| b B d • Stack = S$ • => accept Jim Hogg - UW - CSE - P501
Cast out the Magic • We started with a magical clairvoyance function • We replaced this with, an equally magical, DFA • The DFA approach included too much repetition: • retreat to DFA = 1, then rescan the stack to find the new DFA state • we only replaced the handle with its NonTerminal LHS, so first part of stack is unchanged • Want the parser to run in linear time – proportional to total number of tokens • How do avoid repetition? • How to construct the magic DFA, for any grammar? Jim Hogg - UW - CSE - P501
Avoiding DFA Rescanning • Observe: after a reduce, the contents of the stack are little altered: we replaced handle at top-of-stack with its LHS non-terminal • So, re-scanning the stack will step thru same DFA transitions as before, until the last one • So, record trace of DFA state numbers on stack to avoid the rescan Jim Hogg - UW - CSE - P501
LR Stack • DFA pictures are nice, but we want a program to do it • Could change the stack to hold <state, token> pairs. Perhaps easier to understand and/or debug? • $ <s0,X0> <s1,X1> . . . <sn,Xn> • But, all we need are the states (think about it!) • on a reduce, pop top states - reduce rule tells us how many • then push corresponding LHS, non-terminal Jim Hogg - UW - CSE - P501
S aABe A Abc A b B d Reminder - DFA for: e accept 8 9 S a A B e B $ a A b c start 1 2 3 6 7 A A b c b d 4 5 A b B d => shift => reduce Jim Hogg - UW - CSE - P501
e accept 8 9 S aABe B $ a start A b c 1 2 3 6 7 A Abc b d 4 5 A b B d ACTION & GOTO Parse Tables S aABe A Abc A b B d Key sN = shift; transition to state number N rR = reduce using rule R gN = goto state number N acc = accept blank = syntax error in program
DFA Transition Tables: Summary • ACTION => what to do after a shift • Row = current state • Column = next token (terminal) • sN = shift; move to DFA state number N • rR = reduce, using Rule number R • acc = accept the input program • blank = syntax error in input program • report, recover, continue • for P501, just report and stop! • GOTO=> what to do after a reduce • Row = current state (top-of-stack, after pushing non-terminal) – think of this as the uncovered state • Column = LHS of reduction (non-terminal) • gN = goto DFA state number N • blank = bug in the GOTO table Jim Hogg - UW - CSE - P501