Welcome to a journey to

Cairo University FCI Compilers CS419 Lecture16: Syntax Analysis: Bottom-Up Parsing Welcome to a journey to Dr. Mohammad Nassef Department of Computer Science, Faculty of Computers and Information, Cairo University

Hierarchy of grammar classes • LL(k): • Left-to-right, Leftmost derivation, k tokens lookahead • LR(k): • Left-to-right, Rightmost derivation, k tokens lookahead • SLR: • Simple LR (uses “follow sets”) • LALR: • LookAhead LR (uses “lookahead sets”) Parsing http://en.wikipedia.org/wiki/LL_parser …

Introduction(1) • Overview • Top-down parsers • starts constructing the parse tree at the top (root) of the tree and move down towards the leaves. • Easy to implement by hand, but work with restricted grammars. • example: predictive parsers (LL(k) parsers) • Bottom-up parsers • build the nodes on the bottom of the parse tree first. • Suitable for automatic parser generation, handle a larger class of grammars. • examples: shift-reduce parser (LR(k) parsers)

Introduction(2) • Bottom-up parsers • A bottom-up parser, or a shift-reduce parser, begins at the leaves and works up to the top of the tree. • The reduction steps trace a rightmost derivation on reverse. Example at next slide to explain the main idea Grammar parse S  aABe A  Abc | b B  d The input string : abbcde.

Introduction(3) Bottom-Up Parser Example Shift a OUTPUT: INPUT: a b b c d e $ Production S  aABe Bottom-Up Parsing Program A  Abc A  b B  d

A b Introduction(4) Bottom-Up Parser Example Shift b Reduce from b to A OUTPUT: INPUT: a b b c d e $ Production S  aABe Bottom-Up Parsing Program A  Abc A  b B  d

b Introduction(5) Bottom-Up Parser Example Shift A OUTPUT: INPUT: a A b c d e $ Production S  aABe Bottom-Up Parsing Program A  Abc A A  b B  d

b Introduction(6) Bottom-Up Parser Example Shift b OUTPUT: INPUT: a A b c d e $ Production S  aABe Bottom-Up Parsing Program A  Abc A A  b B  d Why this is ignored?

A c b b Introduction(7) Bottom-Up Parser Example Shift c Reduce from Abc to A OUTPUT: INPUT: a A b c d e $ Production S  aABe Bottom-Up Parsing Program A  Abc A A  b B  d

b b Introduction(8) Bottom-Up Parser Example Shift A OUTPUT: INPUT: a A d e $ Production A S  aABe Bottom-Up Parsing Program A  Abc A c A  b B  d

B b d b Introduction(9) Bottom-Up Parser Example Shift d Reduce from d to B OUTPUT: INPUT: a A d e $ Production A S  aABe Bottom-Up Parsing Program A  Abc A c A  b B  d

B b d b Introduction(10) Bottom-Up Parser Example Shift B OUTPUT: INPUT: a A B e $ Production A S  aABe Bottom-Up Parsing Program A  Abc A c A  b B  d

S e a B d b b Introduction(11) Bottom-Up Parser Example Shift e Reduce from aABe to S OUTPUT: INPUT: a A B e $ Production A S  aABe Bottom-Up Parsing Program A  Abc A c A  b B  d

S e a B d b b Introduction(12) Bottom-Up Parser Example Shift S Hit the target $ OUTPUT: INPUT: S $ Production A S  aABe Bottom-Up Parsing Program A  Abc A c A  b B  d This parser is known as an LR Parser because it scans the input from Left to right, and it constructs a Rightmost derivation in reverse order.

Introduction(13) • Conclusion • Scanning of productions for matching with handles in the input string • Backtracking makes the method used in the previous example very inefficient. Can we do better? Previous Architecture Better Architecture

designing a bottom-up parser: Steps: • Must eliminate Ambiguity • Remove Left Recursion! • Apply Left Factoring • Get the First-Follow Operators • Build Transition Diagram: Get Canonical States/Items • Build the Parsing Table • Parse the given statements

Shift-Reduce (bottom-up) parser is known as an LR Parser It scans the input from Left to right (shift) It builds a Rightmost derivation in reverse order (REDUCE) Kinds of LR LR(k) most powerful deterministic bottom-up parsing using k lookaheads SLR(k) LALR(k) Components Parse stack Shift-reduce driver Action table Goto table Shift-Reduce Parsers(1)

Shift-Reduce Parsers(2) • Parse stack • Initially empty, but it usually contains symbols already parsed • Elements in the stack are terminal or non-terminal symbols • The parse stack concatenated with the remaining input always represents a (right sentential form = RHS = handle)

Shift-Reduce Parsers(3) • Shift-Reduce driver • Shift -- when top of stack doesn't contain a handle/RHS of the sentential form • push input token (with contextual information) into stack • Reduce -- when top of stack contains a handle/RHS • pop the (RHS) handle • push the (LHS) reduced non-terminal (with contextual information)

Shift-Reduce Parsers(4) • Two questions • Have we reached the end of handles and how long is the handle? • Which non-terminal does the handle reduce to? • We use two tables to answer these questions: • ACTION table • GOTO table

Shift-Reduce Parsers(5) • LR parsers are driven by two tables: • Action table, which specifies the actions to take • Shift, reduce, accept or error • Goto table, which specifies state transition • To indicate transition of finite state machine • We push states, rather than symbols onto the stack • Each state represents a possible sub-trees of the parse tree

Bottom-up Parsing (Cont.) Given the grammar: E → T T → T * F T → F F → id

A Shift-reduce Example • E → T • T → T * F • T → F • F → id

LR Parsers (cont.) Go_to tabledefines the next state after a shift. Action tabletells parser whether to: 1) shift (S), 2) reduce (R), 3) accept (A) the source code, or 4) signal a syntactic error (E).

SLR Parser An SLR(1) parser makes shift-reduce decisions by maintaining states to keep track of where we are in a parse. States represent sets of items.

SLR Item LR(0) and all other LR-style parsing are based on the idea of: an item of the form: A→X1…Xi‧Xi+1…Xj The dot symbol‧in an item may appear anywhere in the right-hand side of a production. It marks how much of the production has already been matched.

SLR Item (Cont.) An SLR item (item for short) of a grammar G is a production of G with a dot at some position of the RHS. The production A → XYZ yields the four items: A → ‧ XYZ A → X ‧ YZ A → XY ‧ Z A → XYZ ‧ The production A → λ generates only one item, A → ‧.

SLR Item Closure If I is a set of items for a grammar G, then CLOSURE(I) is the set of items constructed from I by the 2 rules: 1) Initially, add every item in I to CLOSURE(I) 2) If A → α‧B βis in CLOSURE(I) and B → γis a production, then add B → ‧γto CLOSURE(I), if it is not already there. Apply this until no more new items can be added.

SLR Closure Example E’ → E E → E + T | T T → T * F | F F → (E) | id I is the set of one item {E’→‧E}. Find CLOSURE(I)

SLR Closure Example (Cont.) First, E’ → ‧E is put in CLOSURE(I) by rule 1. Then, E-productions with dots at the left end: E → ‧E + T and E → ‧T. Now, there is a T immediately to the right of a dot in E → ‧T, so we add T → ‧T * F and T → ‧F. Next, T → ‧F forces us to add: F → ‧(E) and F → ‧id.

Another Closure Example S→E $ E→E + T | T T→ID | (E) closure (S→‧E$) = {S→‧E$, E→‧E+T, E→‧T, T→‧ID, T→‧(E)} The five items above forms an item set called states0.

Closure (I) SetOfItems Closure(I) { J=I repeat for (each item A → α‧B β in J) for (each production B → γ of G) if (B → ‧ γ is not in J) add B → ‧ γ to J; until no more items are added to J; return J; } // end of Closure (I)

Goto Next State Given an item set (state) s, we can compute its next state, s’, under a symbol X, that is, Go_to (s, X) = s’

Goto Next State (Cont.) E’ → E E → E + T | T T → T * F | F F → (E) | id Example: if S is the item set (state): E → E ‧ + T

Goto Next State (Cont.) S’ is the next state that Goto(S, +) goes to: E → E +‧T T → ‧T * F T → ‧F F → ‧(E) F → ‧id We can build all the states of the Transition Diagram this way.

An SLR Complete Example Grammar: S’→ S $ S→ ID

SLR Transition Diagram State 0 State 1 id S’ →‧S$ S→‧id S→id‧ S State 2 S’ →S‧$ $ State 3 S’ →S$‧

SLR Transition Diagram (Cont.) Each state in the Transition Diagram, either signals a shift (‧moves to right of a terminal) or signals a reduce (reducing the RHS handle to LHS)

SLR(1) Look-ahead SLR(1) parsers are built first by constructing Transition Diagram, then by computing Follow set as SLR(1) look-aheads. The ideas is: A handle (RHS) should NOT be reduced to non-terminal N if the look ahead token is NOT in follow(N)

Transition Diagram Construction Example • Construction of states …

Building the Initial State: State I0

Passing the dot over E and T: StateI1, I2 Current State to process: Repeated or newly generated state(s)

Passing the dot over F: StateI3 Current State to process: Repeated or newly generated state(s)

Passing the dot over ‘(‘: StateI4 Current State to process: Repeated or newly generated state(s)

Passing the dot over id: StateI5 Current State to process: Repeated or newly generated state(s)

Passing the dot over ‘$’: StateI6ACCept Current State to process: Repeated or newly generated state(s)

Passing the dot over ‘+’: StateI6 Current State to process: Repeated or newly generated state(s)

Passing the dot over ‘*’: StateI7 Current State to process: Repeated or newly generated state(s)

Passing the dot over E,T,F,(,id: StatesI8,I2,I3,I4,I5 Current State to process: Repeated or newly generated state(s)

Welcome to a journey to