270 likes | 288 Views
Explore LR(0) grammars, formal languages, and LR(0) parsing algorithm. Understand LR(0) items, NFA transitions, and LR(1) parsing concepts. Learn about LR(0) vs. LR(1) items, FIRST sets, and constructing NFAs. Dive into intricate details of LR(k) grammars.
E N D
Fall 2009 The Chinese University of Hong Kong CSC 3130: Automata theory and formal languages LR(k) grammars Andrej Bogdanov http://www.cse.cuhk.edu.hk/~andrejb/csc3130
LR(0) example from last time 4 A aA•b a A b 2 5 A a•Ab A a•b A •aAb A •ab 1 A aAb• a A •aAb A •ab b 3 A ab• A aAb | ab
LR(0) parsing example revisited S Input A Stack a 1 1 1a2 1a2a2 1a2a2b3 1a2A4 1a2A4b5 1A aabb abb bb b b 1 2 2 3 4 5 2 A •aAb A •ab S S S R S R a A a•Ab A a•b A •aAb A •ab 3 b A ab• A • A 5 4 b a A • • • • A aAb• A aA•b b a b • • A aAb | ab A aAb aabb
Meaning of LR(0) items eNFA transitions to: X •g A undiscovered part shift focus to subtree rooted at X (if X is nonterminal) b a X • focus A aX•b A a•Xb move past subtreerooted at X
Outline of LR(0) parsing algorithm • Algorithm can perform two actions: • What if: no complete itemis valid there is one valid item,and it is complete reduce (R) shift (S) some valid itemscomplete, some not more than one validcomplete item R / R conflict S / R conflict
Definition of LR(0) grammar • A grammar is LR(0) if S/R, R/R conflicts never occur • LR means parsing happens left to right and produces a rightmost derivation • LR(0) grammars are unambiguous and have a fastparsing algorithm • Unfortunately, they are not “expressive” enoughto describe programming languages
Hierarchy of context-free grammars context-free grammars parse using CYK algorithm (slow) LR(∞) grammars … java perl python … LR(1) grammars LR(0) grammars parse using LR(0) algorithm
A grammar that is not LR(0) S A(1) | Bc(2)A aA(3) | a(4)B a(5) | ab(6) input: a
A grammar that is not LR(0) S A(1) | Bc(2)A aA(3) | a(4)B a(5) | ab(6) input: a possibilities: shift (3), reduce (4)reduce (5), shift (6) S valid LR(0) items: A a•A, A a• B a•, B a•b, A •aA, A •a S S A A B A A A S/R, R/R conflicts! a a a a a a c • • •
Lookahead S A(1) | Bc(2)A aA(3) | a(4)B a(5) | ab(6) input: a peek inside! S valid LR(0) items: A a•A, A a• B a•, B a•b, A •aA, A •a S S A A B A A A a a a a a a c • • •
Lookahead S A(1) | Bc(2)A aA(3) | a(4)B a(5) | ab(6) input: a a peek inside! S valid LR(0) items: A a•A, A a• B a•, B a•b, A •aA, A •a A A … a a • action: shift parse tree must look like this
Lookahead S A(1) | Bc(2)A aA(3) | a(4)B a(5) | ab(6) input: a a a peek inside! S valid LR(0) items: A a•A, A a• A •aA, A •a A A A … a a • action: shift parse tree must look like this
Lookahead S A(1) | Bc(2)A aA(3) | a(4)B a(5) | ab(6) input: a a a S valid LR(0) items: A a•A, A a• A •aA, A •a A A A a a a • action: reduce parse tree must look like this
LR(0) items vs. LR(1) items A LR(1) A LR(0) A A b b a a • • b b a a A A A a•Ab [A a•Ab, b] a a b b A aAb | ab
LR(1) items • LR(1) items are of the formto represent this state in the parsing [A a•b, x] or [A a•b, e] A A x a b a b • •
Outline of LR(1) parsing algorithm • Step 1: Build NFA that describes valid item updates • Step 2: Convert NFA to DFA • As in LR(0), DFA will have shift and reduce states • Step 3: Run DFA on input, using stack to remember sequence of states • Use lookahead to eliminate wrong reduce items
Recall eNFA transitions for LR(0) • States of eNFA will be items (plus a start state q0) • For every item S •a we have a transition • For every item A •X we have a transition • For every item A a•Cb and production C •d e q0 S •a X A •X A X• e A •C C •d
eNFA transitions for LR(1) • For every item [S •a, e]we have a transition • For every item A •X we have a transition • For every item [A a•Cb, x] and production C dfor every y in FIRST(bx) e q0 [S •a, e] X [A •X, x] [A X•, x] e [A •C, x] [C •d, y]
FIRST sets • Example FIRST(a) is the set of terminals that occuron the left in some derivation starting from a FIRST(a) = {a} FIRST(A) = {a}FIRST(S) = {a, c} FIRST(bAc) = {b} FIRST(BA) = {a} FIRST(e) = ∅ S A(1) | cB(2)A aA(3) | a(4)B a(5) | ab(6)
Explaining the transitions A A x x b b a X a X • • X [A •X, x] [A X•, x] C b A y • d x b a C • e [A •C, x] [C •d, y] y ∈ FIRST(bx)
Example: Constructing the NFA [S A•, e] S A(1) | Bc(2)A aA(3) | a(4)B a(5) | ab(6) A [A •aA, e] e [S •A, e] [A •a, e] e e . . . q0 [S B•c, e] e B e [S •Bc, e] [B •a,c] e [B •ab,c]
Example: Constructing the NFA S A(1) | Bc(2)A aA(3) | a(4)B a(5) | ab(6) [S A•, e] A a e A [A aA•, e] [A •aA, e] [S •A, e] [A a•A, e] e e a [A •a, e] [A a•, e] e q0 e c [S B•c, e] [S Bc•, e] B e a [S •Bc, e] [B •a,c] [B a•,c] e a b [B •ab,c] [B a•b,c] [B ab•,c]
Example: Running the NFA S A(1) | Bc(2)A aA(3) | a(4)B a(5) | ab(6) look ahead! input valid items A stack abc [S •A, ] [S •Bc, ] [A •aA, ] [A •a, ] [B •a, c] [B •ab, c] S a bc [A a•A, ] [A a•, ] [B a•, c] [B a•b, c] [A •aA, ] [A •a, ] S ab c [B ab•, c] R B c [S B•c, ] S Bc [S Bc•, ] R S
Convert NFA to DFA • Each DFA state is a subset of LR(1) items, e.g. • States can contain S/R, R/R conflicts • But lookahead can always resolve such conflicts [A a•A, ] [A a•, ] [B a•, c] [B a•b, c] [A •aA, ] [A •a, ]
Example: Convert NFA to DFA LEGEND S A(1) | Bc(2)A aA(3) | a(4)B a(5) | ab(6) shift variable 8 1 2 7 4 5 6 3 shift terminal reduce A [A a•A, e] [S •A, e] [A •aA, e] [A a•A, e] [S •Bc, e] [A •a, e] [A •aA, e] A a a [A •aA, e] [A aA•, e] [B a•b,c] [A •a, e] [A •a, e] [A a•, e] [A a•, e] [B •a,c] [B a•,c] [B •ab,c] a b A B c [S B•c, e] [S Bc•, e] [B ab•,c] [S A•, e]
Example: Reconstruct the parse tree input A stack S A(1) | Bc(2)A aA(3) | a(4)B a(5) | ab(6) 2 7 6 8 7 6 2 1 8 [S •A, e] a [S •Bc, e] [A •aA, e] abc 1 [A •a, e] B [B •a,c] c [S B•c, e] [B •ab,c] S 12 bc S [A a•A, e] S 128 c [S Bc•, e] [A •aA, e] B R 16 c [A •a, e] b [B a•b,c] S 167 a b c [B ab•,c] [A a•, e] R 1 [B a•,c]
LR(k) grammars • A context-free grammar is LR(1) if all S/R, R/Rconflicts can be resolved with one lookahead • More generally, LR(k) grammars can resolve allconflicts with k lookahead symbols • Items have the form [A •, c1...ck] • LR(1) grammars describe the semantics of mostprogramming languages