230 likes | 436 Views
Ambiguity, LL1 Grammars and Table-driven Parsing. Problems with Grammars. Not all grammars are usable! Ambiguous Unproductive non-terminals Unreachable rules. E. E. E. E. E. *. E. +. D. E. *. E. E. +. E. D. 3. 1. D. D. D. D. 2. 3. 1. 2. Ambiguous Grammar.
E N D
Problems with Grammars • Not all grammars are usable! • Ambiguous • Unproductive non-terminals • Unreachable rules
E E E E E * E + D E * E E + E D 3 1 D D D D 2 3 1 2 Ambiguous Grammar • = { E D | ( E ) | E + E| E – E | E * E| E / E , • D 0 | 1 | … | 9 } 1 + 2 * 3 G is ambiguous if there exists S inL(G), such that there are two different parse trees for S Multiple meanings: Precedence (1+2)*3≠1+(2*3) Associativity (1-2)-3≠1-(2-3)
= { E D | ( E ) | E + E| E – E | E * E| E / E , D 0 | 1 | … | 9 } Fixing Precedence Ambiguity E T | E + T | E – T T F | T * F | T / F F D | ( E ) D 0 | 1 | … | 9 • Observe: • Operators lower in the parse tree are executed first Operators executed first have higher precedence • Fix: • Introduce a new non-terminal symbol for each precedence level E E T + T T * F D F F 3 D D 1 2
Adding the Power Operator E T | E+T | ET T P | T*P | T/P P F | FP F D | (E) D 0 | 1 | … | 9 E T | E + T | E – T T F | T * F | T / F F D | ( E ) D 0 | 1 | … | 9
E D | E D Fixing Associative Ambiguity Left recursion/Left associativity Right recursion/Right associativity E D | D E 2 (3 2) 1 23 = 2 (3 2) E E E D D E E D 2 D E 1 2 D D 3 3 2
Unreachable Rules • Initialize the set of reachable non-terminals R with the start symbol. • For each round, if R includes the lhs of a production rule, add the non-terminals in the rhs to R. • Loop on #2 until there are no changes to R. • Rules whose lhs’sare non-terminals in VNminusthe non-terminals in Rare the set of unreachable rules. = {S aABb , A a | aA , B b | bBD , C cD , D e } Initialize: R = {S} Round 1: R = {S, A, B} Round 2: R = {S, A, B, D} Round 3: R = {S, A, B, D} Done: no change: VN – {S, A, B, D} = {C} • = { S aABb , A a | aA , B b | bBD , C cD , D e } Least-fixed point algorithm
Unproductive Non-terminals Start with the set of terminals T. For each round, if T covers a rhs of a production rule, add the lhs to T. Loop on #2 until there are no changes to T. The alphabet of terminals and non-terminals, V, minus T is the set of unproductive non-terminals. = { S aABb , A bC , B d | dB , C eC } C never produces all terminals. C eC eeC … enC A also because it always produces C A bC beC … benC S also because it always produce A S aABb aAbb abCbb … = { S aABb , A bC , B d | dB , C eC } Initialize: T = {a, b, d, e} Round 1: T = {a, b, d, e, B} Round 2: T = {a, b, d, e, B} Done: no change: {a, b, d, e, A, B, C, S} – T = {A, C, S} Least-fixed point algorithm
Top-down Parsing with Backtracking E N | OEE O + | | * | / N 0 | 1 | 2 | 3 | 4 *+342 Prefix expressions associate an operator with the next two operands E.g., *+324=(2+3)*4, *2+34=2*(3+4)
LL(1) Parsers • Problem: • Never know what production to try (and very inefficient) • Solution: • LL parser: parses input from Left to right, and constructs a Leftmost derivation of the sentence • LL(k) parser uses k tokens of look-ahead • LL(1) parsers: • Somewhat restrictive, BUT • Only need current non-terminal and next token to make parsing decision • LL(1) parsers require LL(1) grammars
Simple LL(1) Grammars All rules have the form: A a11 | a22 | … | ann where ai(1 ≤ i ≤ n) is a terminal ai ajfor i j i (1 ≤ i ≤ n) is a sequence of terminals and non-terminals, or is empty
By making all production rules of the form: A a11 | a22 | … | ann Thus, E 0 | 1 | 2 | 3 | 4 | +EE | EE | *EE | /EE Why is this not a simple LL(1) grammar? E N | OEE O + | | * | / N 0 | 1 | 2 | 3 | 4 How can we change it to simple LL(1)? Creating Simple LL(1) Grammars
8 7 * E E E E 6 5 3 8 + E E 4 2 * E E 3 4 4 2 3 3 ? LL(1) Parsing E (1)0 | (2)1 | (3)2 | (4)3 | (5)4 | (6)+EE | (7)EE | (8)*EE | (9)/EE * + 2 3 4 2 * 3 E E Success! Fail!
Simple LL(1) Parse Table A parse table is defined as follows: (V {#}) (VT {#}) {(, i), pop, accept, error} where • is the right side of production number i • # marks the end of the input string (# V) If A (V {#}) is the symbol on top of the stack and a (VT {#}) is the current input symbol, then: ACTION(A, a) = pop if A = a for a VT accept if A = # and a = # (a, i) which means “pop, then push a and output i”(A a is the ith production) error otherwise
Simple LL(1) Parse Table ExampleE (1)0 | (2)1 | (3)2 | (4)3 | (5)+EE | (6)*EE VT {#} V{#} All blank entries are error
Relaxing Simple LL(1) Restrictions • Consider the following grammar E (1)N | (2)OEE O (3)+ | (4)* N (5)0 | (6)1 | (7)2 | (8)3 • Not simple LL(1): rules (1) & (2) • However: • N leads only to {0, 1, 2, 3} • O leads only to {+, *} • {0, 1, 2, 3} {+, *} = We can distinguish between rules (1) and (2): • If we see 0, 1, 2, or 3, we choose (1) • If we see + or *, we choose (2)
LL(1) Grammars • For any , define FIRST() = { | * and VT} • A grammar is LL(1) if for all rules of the form A 1 | 2 | … | n then, FIRST(i) FIRST(j) = for i j (i.e., the sets FIRST(1), FIRST(2), …, and FIRST(n) are pairwise disjoint)
LL(1) Parse Table E (1)N | (2)OEEO (3)+ | (4)*N (5)0 | (6)1 | (7)2 | (8)3 For (A, a), we select (, i) if a FIRST() and is the right hand side of rule i. VT{#} V{#}
(2)OEE (4)* (2)OEE (1)N (3)+ (1)N (1)N (8)3 (6)1 (7)2 What does 2 4 2 3 1 6 1 7 1 8 mean? E (1)N | (2)OEEO (3)+ | (4)*N (5)0 | (6)1 | (7)2 | (8)3 E 2 4 2 3 1 6 1 7 1 8 defines a parse tree via a preorder traversal