Ambiguity, LL1 Grammars and Table-driven Parsing

Ambiguity, LL1 Grammars andTable-driven Parsing

Problems with Grammars • Not all grammars are usable! • Ambiguous • Unproductive non-terminals • Unreachable rules

E E E E E * E + D E * E E + E D 3 1 D D D D 2 3 1 2 Ambiguous Grammar • = { E  D | ( E ) | E + E| E – E | E * E| E / E , • D  0 | 1 | … | 9 } 1 + 2 * 3 G is ambiguous if there exists S inL(G), such that there are two different parse trees for S Multiple meanings: Precedence (1+2)*3≠1+(2*3) Associativity (1-2)-3≠1-(2-3)

= { E  D | ( E ) | E + E| E – E | E * E| E / E , D  0 | 1 | … | 9 } Fixing Precedence Ambiguity E  T | E + T | E – T T  F | T * F | T / F F  D | ( E ) D  0 | 1 | … | 9 • Observe: • Operators lower in the parse tree are executed first Operators executed first have higher precedence • Fix: • Introduce a new non-terminal symbol for each precedence level E E T + T T * F D F F 3 D D 1 2

Adding the Power Operator E  T | E+T | ET T  P | T*P | T/P P  F | FP F  D | (E) D  0 | 1 | … | 9 E  T | E + T | E – T T  F | T * F | T / F F  D | ( E ) D  0 | 1 | … | 9

E  D | E  D Fixing Associative Ambiguity Left recursion/Left associativity Right recursion/Right associativity E  D | D  E 2 (3 2) 1 23 = 2 (3 2) E E E  D D  E E  D 2 D  E 1 2 D D 3 3 2

Unreachable Rules • Initialize the set of reachable non-terminals R with the start symbol. • For each round, if R includes the lhs of a production rule, add the non-terminals in the rhs to R. • Loop on #2 until there are no changes to R. • Rules whose lhs’sare non-terminals in VNminusthe non-terminals in Rare the set of unreachable rules.  = {S  aABb , A a | aA , B  b | bBD , C cD , D  e } Initialize: R = {S} Round 1: R = {S, A, B} Round 2: R = {S, A, B, D} Round 3: R = {S, A, B, D} Done: no change: VN – {S, A, B, D} = {C} • = { S  aABb , A a | aA , B  b | bBD , C cD , D  e } Least-fixed point algorithm

Unproductive Non-terminals Start with the set of terminals T. For each round, if T covers a rhs of a production rule, add the lhs to T. Loop on #2 until there are no changes to T. The alphabet of terminals and non-terminals, V, minus T is the set of unproductive non-terminals.  = { S  aABb , A bC , B  d | dB , C eC } C never produces all terminals. C  eC  eeC  … enC A also because it always produces C A  bC  beC  … benC S also because it always produce A S  aABb  aAbb  abCbb  …  = { S  aABb , A bC , B  d | dB , C eC } Initialize: T = {a, b, d, e} Round 1: T = {a, b, d, e, B} Round 2: T = {a, b, d, e, B} Done: no change: {a, b, d, e, A, B, C, S} – T = {A, C, S} Least-fixed point algorithm

Top-down Parsing with Backtracking E  N | OEE O  + |  | * | / N  0 | 1 | 2 | 3 | 4 *+342 Prefix expressions associate an operator with the next two operands E.g., *+324=(2+3)*4, *2+34=2*(3+4)

LL(1) Parsers • Problem: • Never know what production to try (and very inefficient) • Solution: • LL parser: parses input from Left to right, and constructs a Leftmost derivation of the sentence • LL(k) parser uses k tokens of look-ahead • LL(1) parsers: • Somewhat restrictive, BUT • Only need current non-terminal and next token to make parsing decision • LL(1) parsers require LL(1) grammars

Simple LL(1) Grammars All rules have the form: A a11 | a22 | … | ann where ai(1 ≤ i ≤ n) is a terminal ai ajfor i  j i (1 ≤ i ≤ n) is a sequence of terminals and non-terminals, or is empty

By making all production rules of the form: A  a11 | a22 | … | ann Thus, E  0 | 1 | 2 | 3 | 4 | +EE | EE | *EE | /EE Why is this not a simple LL(1) grammar? E  N | OEE O  + |  | * | / N  0 | 1 | 2 | 3 | 4 How can we change it to simple LL(1)? Creating Simple LL(1) Grammars

8 7 * E E  E E 6 5 3 8 + E E 4 2 * E E 3 4 4 2 3 3 ? LL(1) Parsing E (1)0 | (2)1 | (3)2 | (4)3 | (5)4 | (6)+EE | (7)EE | (8)*EE | (9)/EE * + 2 3 4  2 * 3 E E Success! Fail!

Simple LL(1) Parse Table A parse table is defined as follows: (V  {#})  (VT  {#})  {(, i), pop, accept, error} where •  is the right side of production number i • # marks the end of the input string (#  V) If A  (V  {#}) is the symbol on top of the stack and a  (VT  {#}) is the current input symbol, then: ACTION(A, a) = pop if A = a for a  VT accept if A = # and a = # (a, i) which means “pop, then push a and output i”(A  a is the ith production) error otherwise

Simple LL(1) Parse Table ExampleE (1)0 | (2)1 | (3)2 | (4)3 | (5)+EE | (6)*EE VT {#} V{#} All blank entries are error

Parse Table Execution: *+123

Relaxing Simple LL(1) Restrictions • Consider the following grammar E  (1)N | (2)OEE O  (3)+ | (4)* N  (5)0 | (6)1 | (7)2 | (8)3 • Not simple LL(1): rules (1) & (2) • However: • N leads only to {0, 1, 2, 3} • O leads only to {+, *} • {0, 1, 2, 3}  {+, *} =  We can distinguish between rules (1) and (2): • If we see 0, 1, 2, or 3, we choose (1) • If we see + or *, we choose (2)

LL(1) Grammars • For any , define FIRST() = { |  * and   VT} • A grammar is LL(1) if for all rules of the form A  1 | 2 | … | n then, FIRST(i)  FIRST(j) =  for i  j (i.e., the sets FIRST(1), FIRST(2), …, and FIRST(n) are pairwise disjoint)

LL(1) Parse Table E (1)N | (2)OEEO  (3)+ | (4)*N  (5)0 | (6)1 | (7)2 | (8)3 For (A, a), we select (, i) if a  FIRST() and  is the right hand side of rule i. VT{#} V{#}

Parse Table Execution Revisited: *+123

(2)OEE (4)* (2)OEE (1)N (3)+ (1)N (1)N (8)3 (6)1 (7)2 What does 2 4 2 3 1 6 1 7 1 8 mean? E (1)N | (2)OEEO  (3)+ | (4)*N  (5)0 | (6)1 | (7)2 | (8)3 E 2 4 2 3 1 6 1 7 1 8 defines a parse tree via a preorder traversal

Ambiguity, LL1 Grammars and Table-driven Parsing