240 likes | 460 Views
Fall 2010. The Chinese University of Hong Kong. CSCI 3130: Automata theory and formal languages. LR( 1 ) grammars. Andrej Bogdanov http://www.cse.cuhk.edu.hk/~andrejb/csc3130. LR(0) parsing review. A a A b A ab. 3. 4. 2. 1. a. parser generator. A. CFG G. 5.
E N D
Fall 2010 The Chinese University of Hong Kong CSCI 3130: Automata theory and formal languages LR(1) grammars Andrej Bogdanov http://www.cse.cuhk.edu.hk/~andrejb/csc3130
LR(0) parsing review A aAb A ab 3 4 2 1 a parser generator A CFG G 5 “PDA” for parsing G error if G is not LR(0) a b A a•Ab A a•b A •aAb A •ab A aA•b A aAb• A •aAb A •ab Motivation: Fast parsing for programming languages b A ab•
Parsing computer programs if (n == 0) { return x; } else { return x + 1; } elseStatement Statement Block ifParExpressionStatement ... Block (Expression) ... ... Most programming language CFGs are not LR(0)!
LR(0) parsing review 4 5 3 2 1 a b a b A a b A action state stack 1 S A aAb | ab a A 1 2 S • • A a•Ab A a•b A •aAb A •ab A aA•b A aAb• 12 2 S A •aAb A •ab b • • • 122 5 R • A ab• 3 S 12 • • • 4 R 123
Meaning of LR(0) items NFA transitions to: X •g A undiscovered part shift focus to subtree rooted at X (if X is nonterminal) b a X • focus A aX•b A a•Xb move past subtreerooted at X
Outline of LR(0) parsing algorithm • LR(0) parser has two kinds of actions: • What if: no complete itemis valid there is one valid item,and it is complete reduce (R) shift (S) some valid itemscomplete, some not more than one validcomplete item R / R conflict S / R conflict
Hierarchy of context-free grammars context-free grammars CYK algorithm (slow) allow some conflicts conflicts can be resolved by lookahead LR(1) grammars LR(0) grammars LR(0) parsing algorithm
A CFG that is not LR(0) S A(1) | Bc(2)A aA(3) | a(4)B a(5) | ab(6) input: a valid LR(0) items: S •A, S •Bc A •aA, A •a B •a, B •ab, update
A CFG that is not LR(0) S A(1) | Bc(2)A aA(3) | a(4)B a(5) | ab(6) input: a peek inside! valid LR(0) items: A a•A, A a• B a•, B a•b, A •aA, A •a S S S A A B S/R, R/R conflicts! A A R(4), R(5), S(6) A a a a a a a c • possible parse trees • •
Lookahead S A(1) | Bc(2)A aA(3) | a(4)B a(5) | ab(6) input: a a peek inside! S valid LR(0) items: A a•A, A a• B a•, B a•b, A •aA, A •a A A … a a • action: shift parse tree must look like this
Lookahead S A(1) | Bc(2)A aA(3) | a(4)B a(5) | ab(6) input: a a a peek inside! S valid LR(0) items: A a•A, A a• A •aA, A •a A A A … a a • action: shift parse tree must look like this
Lookahead S A(1) | Bc(2)A aA(3) | a(4)B a(5) | ab(6) input: a a a e S valid LR(0) items: A a•A, A a• A •aA, A •a A A A a a a • action: reduce parse tree must look like this
LR(0) items vs. LR(1) items A LR(1) A LR(0) A A b b a a • • b b a a A A A a•Ab [A a•Ab, b] a a b b A aAb | ab
LR(1) items A A x a b a b • • [A a•b, x] [A a•b, e]
Generating an LR(1) parser S A(1) | Bc(2)A aA(3) | a(4)B a(5) | ab(6) NFA DFA + stack states are LR(1) items may have S/R, R/R conflicts A CFG is LR(1) if conflicts can always be resolved with one symbol lookahead
NFA for LR(0) parsing a, b: terminals A, B, C: variables a, b, d: mixed strings X: terminal or variable notation e q0 S •a For every LR(0) item S •a X A •X A X• For every LR(0) item A •X e A •C C •d For every pair of LR(0) items A •C, C •d
NFA for LR(1) parsing a, b: terminals A, B, C: variables a, b, d: mixed strings X: terminal or variable notation e q0 [S •a, e] For every item S •a X [A X•, x] [A •X, x] For every LR(1) item [A •X, x] e [A •C, x] [C •d, y] For every LR(1) item [A a•Cb, x] and production C d and every y in FIRST(bx)
Explaining the transitions A A x x b b a X a X • • X [A •X, x] [A X•, x] C b A y • d x b a C • e [A •C, x] [C •d, y] y ∈ FIRST(bx)
FIRST sets S A(1) | cB(2)A aA(3) | a(4)B a(5) | ab(6) For every y in FIRST(bx) g FIRST(g) A a {a} A {a} x a • b C {a, c} e S [A •C, x] [C •d, y] {c} cA {a} BA FIRST(g) are all leftmost terminals in derivations g ⇒ ... ∅ e
Example: Constructing the NFA [S A•, e] S A(1) | Bc(2)A aA(3) | a(4)B a(5) | ab(6) A [A •aA, e] e [S •A, e] [A •a, e] e e . . . q0 [S B•c, e] e B e [S •Bc, e] [B •a,c] e [B •ab,c]
Example: Constructing the NFA S A(1) | Bc(2)A aA(3) | a(4)B a(5) | ab(6) [S A•, e] A a e A [A aA•, e] [A •aA, e] [S •A, e] [A a•A, e] e e a [A •a, e] [A a•, e] e q0 e c [S B•c, e] [S Bc•, e] B e a [S •Bc, e] [B •a,c] [B a•,c] e a b [B •ab,c] [B a•b,c] [B ab•,c]
Example: Convert NFA to DFA LEGEND S A | Bc A aA | a B a | ab shift variable 8 1 2 7 4 5 6 3 shift terminal reduce A [A a•A, e] [S •A, e] [A •aA, e] [A a•A, e] [S •Bc, e] [A •a, e] [A •aA, e] A a a [A •aA, e] [A aA•, e] [B a•b,c] [A •a, e] [A •a, e] [A a•, e] [A a•, e] [B •a,c] [B a•,c] [B •ab,c] a b A B c [S B•c, e] [S Bc•, e] [B ab•,c] [S A•, e]
Example: Resolving conflicts by lookahead LEGEND S A(1) | Bc(2)A aA(3) | a(4)B a(5) | ab(6) shift variable 2 3 shift terminal reduce action action next next [A a•A, e] [A a•A, e] shift shift a a [A •aA, e] [A •aA, e] shift error [A •a, e] [A •a, e] b b [B a•b,c] [A a•, e] c reduce A c error [A a•, e] e e reduce B reduce A [B a•,c]
Example: Reconstruct the parse tree action state stack [S •A, e] [A a•A, e] 1 2 3 4 6 7 8 5 [S •Bc, e] 1 S [A •aA, e] [A •aA, e] [A •a, e] A a 1 2 S [A •a, e] [B a•b,c] [B •a,c] 12 8 R [A a•, e] [B •ab,c] [B a•,c] 1 6 S A a 7 R 16 B [S A•, e] [A a•A, e] S b [A •aA, e] [S B•c, e] A [A •a, e] B c [A a•, e] [S Bc•, e] a A a b c • • • • [A aA•, e] [B ab•,c]