240 likes | 255 Views
Learn about LR(0) grammars, LR(0) parsing algorithm, and how to convert NFA to DFA for efficient programming language parsing.
E N D
Fall 2011 The Chinese University of Hong Kong CSCI 3130: Formal languages and automata theory LR(1) grammars Andrej Bogdanov http://www.cse.cuhk.edu.hk/~andrejb/csc3130
LR(0) parsing review A aAb A ab 3 4 2 1 a parser generator A CFG G 5 “PDA” for parsing G error if G is not LR(0) a b A a•Ab A a•b A •aAb A •ab A aA•b A aAb• A •aAb A •ab Motivation: Fast parsing for programming languages b A ab•
Parsing computer programs if (n == 0) { return x; } else { return x + 1; } elseStatement Statement Block ifParExpressionStatement ... Block (Expression) ... ... CFGs of programming languages are not LR(0)
LR(0) parsing review 4 5 3 2 1 a b a b A a b A action state stack 1 S A aAb | ab a A 1 2 S • • A a•Ab A a•b A •aAb A •ab A aA•b A aAb• 12 2 S A •aAb A •ab b • • • 122 5 R • A ab• 3 S 12 • • • 4 R 123
Meaning of LR(0) items PDA transitions: A aX•b A undiscovered part move past subtreerooted at X b a X • focus X •g A a•Xb shift focus to subtree rooted at X
Outline of LR(0) parsing algorithm • LR(0) parser has two kinds of actions: • What if: no complete itemis valid there is one valid item,and it is complete reduce (R) shift (S) some valid itemscomplete, some not more than one validcomplete item R / R conflict S / R conflict
Hierarchy of context-free grammars context-free grammars allow some conflicts conflicts can be resolved by lookahead LR(1) grammars LR(0) grammars
A CFG that is not LR(0) S A | Bc A aA | a B a | ab input: a valid LR(0) items: S •A , S •Bc, A •aA, A •a, B •a, B •ab update
A CFG that is not LR(0) S A | BcA aA | a B a | ab input: a peek inside! valid LR(0) items: A a•A, A a• B a•, B a•b, A •aA, A •a S S S A B B A S/R, R/R conflicts! A … a a a c a c b • • • possible parse trees
Lookahead S A | Bc A aA | a B a | ab input: a a peek inside! S S S valid LR(0) items: A a•A, A a• B a•, B a•b, A •aA, A •a A B B A A … a a a c a c b • • • action: shift possible parse trees
Lookahead S A | BcA aA | a B a | ab input: a a a peek inside! S S valid LR(0) items: A a•A, A a• A •aA, A •a A A A A A … a a a a a • • S/R conflict possible parse trees action: shift
Lookahead S A | Bc A aA | a B a | ab input: a a a e S S valid LR(0) items: A a•A, A a• A •aA, A •a A A A A A A … a a a a a a a • • action: reduce possible parse trees
LR(0) items vs. LR(1) items A LR(1) A LR(0) A A b b a a • • b b a a A A A a•Ab [A a•Ab, b] a a b b A aAb | ab
LR(1) items A A x a b a b • • [A a•b, x] [A a•b, e]
Generating an LR(1) parser S A | Bc A aA | a B a | ab NFA DFA with stack states are LR(1) items may have S/R, R/R conflicts In an LR(1) CFG conflicts can always be resolved with one symbol lookahead
NFA for LR(0) parsing a, b: terminals A, B, C: variables a, b, d: mixed strings X: terminal or variable notation e q0 S •a For every LR(0) item S •a X A •X A X• For every LR(0) item A •X e A •C C •d For every pair of LR(0) items A •C, C •d
NFA for LR(1) parsing a, b: terminals A, B, C: variables a, b, d: mixed strings X: terminal or variable notation e q0 [S •a, e] For every item S •a X [A X•, x] [A •X, x] For every LR(1) item [A •X, x] e [A •C, x] [C •d, y] For every LR(1) item [A a•Cb, x] and production C d and every y in FIRST(bx)
Explaining the transitions A A x x b b a X a X • • X [A •X, x] [A X•, x] C b A y • d x b a C • e [A •C, x] [C •d, y] y ∈ FIRST(bx)
FIRST sets S A | cB A aA | aB a | ab For every y in FIRST(bx) g FIRST(g) A a {a} A {a} x a • b C {a, c} e S [A •C, x] [C •d, y] {c} cA {a} BA FIRST(g) are all leftmost terminals in derivations g ⇒ ... ∅ e
Example: Construct the NFA [S A•, e] S A(1) | Bc(2)A aA(3) | a(4)B a(5) | ab(6) A [A •aA, e] e [S •A, e] [A •a, e] e e . . . q0 [S B•c, e] e B e [S •Bc, e] [B •a,c] e [B •ab,c]
Example: Construct the NFA S A(1) | Bc(2)A aA(3) | a(4)B a(5) | ab(6) [S A•, e] A a e A [A aA•, e] [A •aA, e] [S •A, e] [A a•A, e] e e a [A •a, e] [A a•, e] e q0 e c [S B•c, e] [S Bc•, e] B e a [S •Bc, e] [B •a,c] [B a•,c] e a b [B •ab,c] [B a•b,c] [B ab•,c]
Example: Convert NFA to DFA LEGEND S A | Bc A aA | a B a | ab shift variable 8 1 2 7 4 5 6 3 shift terminal reduce A [A a•A, e] [S •A, e] [A •aA, e] [A a•A, e] [S •Bc, e] [A •a, e] [A •aA, e] A a a [A •aA, e] [A aA•, e] [B a•b,c] [A •a, e] [A •a, e] [A a•, e] [A a•, e] [B •a,c] [B a•,c] [B •ab,c] a b A B c [S B•c, e] [S Bc•, e] [B ab•,c] [S A•, e]
Example: Resolve conflicts by lookahead LEGEND S A(1) | Bc(2)A aA(3) | a(4)B a(5) | ab(6) shift variable 2 3 shift terminal reduce action action next next [A a•A, e] [A a•A, e] shift shift a a [A •aA, e] [A •aA, e] shift error [A •a, e] [A •a, e] b b [B a•b,c] [A a•, e] c reduce B c error [A a•, e] e e reduce A reduce A [B a•,c]
Example: Reconstruct the parse tree action state stack [S •A, e] [A a•A, e] 1 2 3 4 6 7 8 5 [S •Bc, e] 1 S [A •aA, e] [A •aA, e] [A •a, e] A a 1 2 S [A •a, e] [B a•b,c] [B •a,c] 12 8 R [A a•, e] [B •ab,c] [B a•,c] 1 6 S A a 7 R 16 B [S A•, e] [A a•A, e] S b [A •aA, e] [S B•c, e] A [A •a, e] B c [A a•, e] [S Bc•, e] a A a b c • • • • [A aA•, e] [B ab•,c]