610 likes | 761 Views
15-453. FORMAL LANGUAGES, AUTOMATA, AND COMPUTABILITY. PUSH-DOWN AUTOMATA AND CONTEXT-FREE GRAMMARS. NONE OF THESE ARE REGULAR. Σ = {0, 1}, L = { 0 n 1 n | n ≥ 0 }. Σ = {a, b, c, …, z}, L = { w | w = w R }. L = { palindromes}. madamimadam L zeus sees suez L.
E N D
15-453 FORMAL LANGUAGES, AUTOMATA, AND COMPUTABILITY
NONE OF THESE ARE REGULAR Σ = {0, 1}, L = { 0n1n | n ≥ 0 } Σ = {a, b, c, …, z}, L = { w | w = wR } L = { palindromes} madamimadam L zeus sees suez L Σ = { (, ) }, L = { balanced strings of parens } (), ()(), (()()) are in L, (, ()), ())(() are not in L
NONE OF THESE ARE REGULAR Σ = {0, 1}, L = { 0n1n | n ≥ 0 } Σ = {a, b, c, …, z}, L = { w | w = wR } L = { palindromes} madamimadam L zeus sees suez L
FINITE STATE CONTROL INPUT STACK (Last in, first out) PUSHDOWN AUTOMATA (PDA)
Input, pop, push a, b → c qi qj In state qi reading symbol a from input Seeing b on top of stack, Replace b (pop) by c (push) and go to state qj If a = ε, make transition without reading input If b = ε, “ “ “ popping from stack If c = ε, “ “ “ writing on stack Non-deterministic
input pop push ε,ε → $ 0,ε→0 11 0011 0011 011 1,0→ε ε,$ → ε 1,0→ε 1 0 STACK $ 0 $ $ Non-deterministic
input pop push ε,ε → $ 0,ε→0 1 001 001 01 1,0→ε ε,$ → ε 1,0→ε 0 STACK $ 0 $ $ PDA that recognizes L = { 0n1n | n ≥ 0 }
Definition: A (non-deterministic) PDA is a tuple P = (Q, Σ, Γ, , q0, F), where: Q is a finite set of states Σ is the input alphabet Γ is the stack alphabet : Q Σε Γε→ 2 Q Γε q0 Q is the start state F Q is the set of accept states 2 Q Γε is the set of subsets of Q x Γεwhere Γε = Γ {ε}
Let w Σ* and suppose w can be written as • w1... wn where wi Σε(recall Σε = Σ {ε}) • Then P acceptsw if there are • r0, r1, ..., rn Q and • s0, s1, ..., sn Γ*(sequence of stacks)such that • r0 = q0 and s0 = ε (P starts in q0with empty stack) • For i = 0, ..., n-1: • (ri+1 , b)(ri, wi+1, a), where si =at and si+1 = bt for some a, bΓε and tΓ* • (P moves correctly according to state, stack and symbol read) • 3. rn F (P is in an accept state at the end of its input)
ε,ε → $ 0,ε→ 0 q0 q1 1,0→ε ε,$ → ε q3 q2 1,0→ε Q = {q0, q1, q2, q3} Σ = {0,1} Γ = {$,0,1} : Q Σε Γε→ 2 Q Γε (q1,1,0) = { (q2,ε) } (q2,1,1) =
EVEN-LENGTH PALINDROMES Σ = {a, b, c, …, z} ε,ε → $ ,ε→ q0 q1 ε,ε→ε ε,$ → ε ,→ε q3 q2 Note: Also accepts the empty string
c,ε→ε b,a→ε q2 q3 q0 ε,$ → ε ε,ε → ε ε,ε → $ q1 q4 ε,ε → ε ε,ε → ε ε,$ → ε q6 q5 a,ε→ a b,ε→ε c,a→ε Build a PDA to recognize L = { aibjck | i, j, k ≥ 0 and (i = j or i = k) } choose i=j choose i=k
CONTEXT-FREE GRAMMARS “Colorless green ideas sleep furiously.”
CONTEXT-FREE GRAMMARS production rules start variable A → 0A1 A → B B → # variables terminals A 00A11 00B11 00#11 • 0A1 • (yields) Derivation A * 00#11 (derives) We say: 00#11is generated by the Grammar Non-deterministic
CONTEXT-FREE GRAMMARS A → 0A1 A → B B → # A → 0A1 | B B → #
CONTEXT-FREE GRAMMARS A context-free grammar (CFG) is a tuple G = (V, Σ, R, S), where: V is a finite set of variables Σ is a finite set of terminals (disjoint from V) R is set of production rules of the form A → W, where A V and W (VΣ)* S V is the start variable • L(G) = {w Σ* | S * w} Strings Generated by G
CONTEXT-FREE LANGUAGES A context-free grammar (CFG) is a tuple G = (V, Σ, R, S), where: V is a finite set of variables Σ is a finite set of terminals (disjoint from V) R is set of production rules of the form A → W, where A V and W (VΣ)* S V is the start variable G = { {S}, {0,1}, R, S } R = { S → 0S1, S → ε } • L(G) =
CONTEXT-FREE LANGUAGES A context-free grammar (CFG) is a tuple G = (V, Σ, R, S), where: V is a finite set of variables Σ is a finite set of terminals (disjoint from V) R is set of production rules of the form A → W, where A V and W (VΣ)* S V is the start variable G = { {S}, {0,1}, R, S } R = { S → 0S1, S → ε } • L(G) = { 0n1n | n ≥ 0 } Strings Generated by G
WRITE A CFG FOR EVEN-LENGTH PALINDROMES S → S for all Σ S → ε
WRITE A CFG FOR THE EMPTY SET G = { {S}, Σ, , S }
PARSE TREES A A A B 0 0 # 1 1 A 0A1 00A11 00B11 00#11
<EXPR> <EXPR> <EXPR> <EXPR> <EXPR> <EXPR> <EXPR> <EXPR> <EXPR> <EXPR> a + a x a a + a x a <EXPR> → <EXPR> + <EXPR> <EXPR> → <EXPR> x <EXPR> <EXPR> → ( <EXPR> ) <EXPR> → a Build a parse tree for a + a x a
Definition: a string is derived ambiguously in a context-free grammar if it has more than one parse tree Definition: a grammar is ambiguous if it generates some string ambiguously Can you give an unambiguous grammar with standard arithmetic precedence ?
NOT REGULAR Σ = {0, 1}, L = { 0n1n | n ≥ 0 } But L is CONTEXT FREE A → 0A1 A → ε WHAT ABOUT? Σ = {0, 1}, L1 = { 0n1n 0m| m,n ≥ 0 } Σ = {0, 1}, L2 = { 0n1m 0n| m,n ≥ 0 } Σ = {0, 1}, L3 = { 0m1n 0n| m=n ≥ 0 }
THE PUMPINGLEMMA FOR CFGs Let L be a context-free language Then there is a P such that if w L and |w| ≥ P then can write w = u v x y z, where: 1. |v y| > 0 2. |v x y| ≤ P 3. For every i ≥ 0, u vi x yi z L
THE PUMPINGLEMMA FOR CFGs Let L be a context-free language Then there is a P such that if w L and |w| ≥ P then can write w = u v x y z, where: 1. |v y| > 0 2. |v x y| ≤ P 3. For every i ≥ 0, u vi x yi z L WHAT ABOUT? Σ = {0, 1}, L3 = { 0m1n 0n| m=n ≥ 0 }
T T R R R R R R y v u z u v x y z v x y Idea of Proof: If w is long enough, then any parse tree for w must have a path that contains a variable more than once
Formal Proof: Let b be the maximum number of symbols on the right-hand side of any rule If the height of a parse tree is h, the length of the string generated by that tree is at most: bh Let |V| be the number of variables in G Define P = b|V|+1 Let w be a string of length at least P Let T be a parse tree for w with a minimum number of nodes. T must have height at least |V|+1
T T R R R R R R y v u z u v x y z v x y The longest path in T must have ≥ |V|+1 variables Select R to be the variable that repeats among the lowest |V|+1 variables (in the path) 1. |vy| > 0 2. |vxy| ≤ P Let T be a parse tree for w with a minimum number of nodes. T must have height at least |V|+1
THE PUMPINGLEMMA FOR CFGs Let L be a context-free language Then there is a P such that if w L and |w| ≥ P then can write w = u v x y z, where: 1. |v y| > 0 2. |v x y| ≤ P 3. For every i ≥ 0, u vi x yi z L WHAT ABOUT? Σ = {0, 1}, L3 = { 0m1n 0n| m=n ≥ 0 }
A Language L is generated by a CFG L is recognized by a PDA
Suppose L is generated by a CFG G = (V, Σ, R, S) Construct P = (Q, Σ, Γ, , q, F) that recognizes L A Language L is generated by a CFG L is recognized by a PDA
Suppose L is generated by a CFG G = (V, Σ, R, S) Construct P = (Q, Σ, Γ, , q, F) that recognizes L ε,ε → S$ For each rule 'A → w’R: ε,A → wR For each terminal aΣ: a,a → ε ε,$ → ε
S → aTb T → Ta | ε ε,ε → $ ε,ε → T ε,S → b ε,ε → T ε,ε → S ε,T → a ε,$ → ε ε,ε → a ε,T → ε a,a → ε b,b → ε
S → aTb S*ab (derives) T → Ta | ε ε,ε → $ ε,ε → T ε,S → b ε,ε → T ε,ε → S ε,T → a ε,$ → ε ε,ε → a ε,T → ε a,a → ε b,b → ε
Suppose L is generated by a CFG G = (V, Σ, R, S) Describe P = (Q, Σ, Γ, , q, F) that recognizes L : (1) Push $ and then S on the stack (2) Repeat the following steps forever: (a) Pop the stack, call the result X. (b) If X is a variable A, guess a rule that matches A and push result into the stack (c) If X is a terminal, read next symbol from input and compare it to terminal. If they’re different, reject. (d) If X is $: then accept iff no more input
A Language L is generated by a CFG L is recognized by a PDA
A Language L is generated by a CFG L is recognized by a PDA Given PDA P = (Q, Σ, Γ, , q, F) Construct a CFG G = (V, Σ, R, S) such that L(G) = L(P) First, simplify P to have the following form: (1) It has a unique accept state, qacc (2) It empties the stack before accepting (3) Each transition either pushes a symbol or pops a symbol, but not both at the same time
SIMPLIFY ε,ε → $ 0,ε→ 0 q1 q0 1,0 →ε ε,$ → ε q2 q3 1,0 →ε
SIMPLIFY ε,ε → E ε,ε → $ 0,ε→ 0 Q q1 q0 1,0 →ε ε,ε → ε ε,$ → ε q2 q3 1,0 →ε ε,ε → ε q4 ε,E → ε q5 ε,σ → ε
SIMPLIFY ε,ε → E ε,ε → $ 0,ε→ 0 Q q1 q0 ε,ε → D 1,0 →ε q’0 ε,$ → ε q2 q3 1,0 →ε ε,D→ ε ε,ε → D q’3 ε,D → ε q4 ε,E → ε q5 ε,σ → ε
Idea For Our Grammar G: For every pair of states p and q in PDA P, G will have a variable Apq whose production rules will generate all strings x that can take: P from p with an empty stack to q with an empty stack V = {Apq | p,qQ } S = Aq0qacc
ε,ε → E ε,ε → $ 0,ε→ 0 Q q1 q0 ε,ε → D 1,0 →ε q’0 ε,$ → ε q2 q3 1,0 →ε ε,ε → D q’3 Aq0q1 generates? ε,D → ε Aq1q2 generates? q4 {0n1n | n > 0} ε,E → ε q5 Aq1q3 generates? ε,σ → ε WANT: Apq generates all strings that take p with an empty stack to q with empty stack
ε,ε → E ε,ε → $ 0,ε→ 0 Q q1 q0 ε,ε → D 1,0 →ε q’0 ε,$ → ε q2 q3 1,0 →ε ε,ε → D q’3 Aq0q1 generates? ε,D → ε Aq1q2 generates? q4 {0n1n | n > 0} ε,E → ε q5 AQq5 generates? ε,σ → ε WANT: Apq generates all strings that take p with an empty stack to q with empty stack
WANT: Apq generates all strings that take p with an empty stack to q with empty stack Let x be such a string • P’s first move on x must be a push • P’s last move on x must be a pop Two possibilities: 1. The symbol popped at the end is exactly the one pushed at the beginning 2. The symbol popped at the end is not the one pushed at the beginning
x = ayb takes p with empty stack to q with empty stack 1. The symbol t popped at the end is exactly the one pushed at the beginning stack height r s a b input string pop t push t p q ─ ─ ─ ─ x ─ ─ ─ ─ δ(p, a, ε) → (r, t) Apq → aArsb δ(s, b, t) → (q, ε)
2. The symbol popped at the end is not the one pushed at the beginning stack height input string p r q Apq → AprArq
Formally: V = {Apq | p, qQ } S = Aq0qacc For every p, q, r, s Q, t Γand a, b Σε If (r, t) (p, a, ε) and (q, ε) (s, b, t) Then add the rule Apq → aArsb For every p, q, r Q, add the rule Apq → AprArq For every p Q, add the rule App → ε