Bottom-up parsing

Bottom-up parsing • Synthesize tree from fragments • Automaton performs two actions: • shift: push next symbol on stack • reduce: replace symbols on stack • Automaton synthesizes (reduces) when end of a production is recognized • States of automaton encode synthesis so far, and expectation of pending non-terminals • Automaton has potentially large set of states • Technique more general than LL (k) (C) Edmond Schonberg, New-York University

LR (k) parsing • Left-to-right, rightmost derivation with k-token lookahead. • Most general parsing technique for deterministic grammars. • In general, not practical: tables too large (10^6 states for C++, Ada). • Common subsets: SLR, LALR (1). (C) Edmond Schonberg, New-York University

The states of the LR(0) automaton • An item is a point within a production, indicating that part of the production has been recognized: • A a . B b , • seen the expansion of a, expect to see expansion of B • A state is a set of items • Transition within states are determined by terminals and non-terminals • Parsing tables are built from automaton: • action: shift / reduce depending on next symbol • goto: change state depending on synthesized non-terminal (C) Edmond Schonberg, New-York University

Building LR (0) states • If a state includes: A a . B b • it also includes every state that is the start of B: B . X Y Z • Informally: if I expect to see B next, I expect to see anything that B can start with, and so on: X . G H I • States are built by closure from individual items. (C) Edmond Schonberg, New-York University

A grammar of expressions: initial state • E’ E • E E + T | T; -- left-recursion ok here. • T T * F | F; • F id | (E) • S0 = { E’ .E, E .E + T, E .T, F .id, F . ( E ) , T .T * F, T .F} (C) Edmond Schonberg, New-York University

Adding states • If a state has itemA a .a b, and the next symbol in the input is a, we shifta on the stack and enter a state that contains item • A a a.b (as well as all other items brought in by closure) • if a state has as item A a. , this indicates the end of a production: reduce action. • If a state has an item A a .N b, then after a reduction that find an N, go to a state with A a N. b (C) Edmond Schonberg, New-York University

The LR (0) states for expressions • S1 = { E’ E., E E. + T } • S2 = { E T., T T. * F } • S3 = { T F. } • S4 = { F (. E), } + S0 (by closure) • S5 = { F id. } • S6 = { E E +. T, T .T * F, T .F, F .id, F .(E)} • S7 = { T T *. F, F .id, F .(E)} • S8 = { F (E.), E E.+ T} • S9 = { E E + T., T T.* F} • S10 = { T T * F.}, S11 = {F (E).} (C) Edmond Schonberg, New-York University

Building SLR tables • An arc between two states labeled with a terminal is a shift action. • An arc between two states labeled with a non-terminal is a goto action. • if a state contains an item A a. , (a reduce item) • the action is to reduce by this production, for all terminals in Follow (A). • If there are shift-reduce conflicts or reduce-reduce conflicts, more elaborate techniques are needed. (C) Edmond Schonberg, New-York University

LR (k) parsing • Canonical LR (1): annotate each item with its own follow set: • (A -> a a.b , f ) • f is a subset of the follow set of A, because it is derived from a single specific production for A • A state that includes A -> a a.b is a reduce state only if next symbol is in f: fewer reduce actions, fewer conflicts, technique is more powerful than SLR (1) • Generalization: use sequences of k symbols in f • Disadvantage: state explosion: impractical in general, even for LR (1) (C) Edmond Schonberg, New-York University

LALR (1) • Compute follow set for a small set of items • Tables no bigger than SLR (1) • Same power as LR (1), slightly worse error diagnostics • Incorporated into yacc, bison, etc. (C) Edmond Schonberg, New-York University

Bottom-up parsing