260 likes | 366 Views
CSci 4011. INHERENT LIMITATIONS OF COMPUTER PROGRAMS. string. pop. push. ε , ε → $. 0, ε → 0. 1,0 → ε. ε ,$ → ε. 1,0 → ε. CONTEXT-FREE GRAMMARS. A → 0 A 1. A → B. B → #. A. 0A1. 00A11. 00B11. 00#11. uVw yields uvw if (V → v) 2 R.
E N D
CSci 4011 INHERENT LIMITATIONS OF COMPUTER PROGRAMS
string pop push ε,ε → $ 0,ε→ 0 1,0→ε ε,$ → ε 1,0→ε
CONTEXT-FREE GRAMMARS A → 0A1 A → B B → # A 0A1 00A11 00B11 00#11 uVw yields uvw if (V → v) 2 R. A derives 00#11 in 4 steps.
PARSE TREES A A A B 0 0 # 1 1 A 0A1 00A11 00B11 00#11
Definition. T is a parse tree for the derivation S ⇒ w under grammar G = (V,Σ,R,S) if 1. The root of T has label S 2. The leaves of T have labels wi∈ Σ 3. The non-leaf nodes have labels v ∈ V 4. For each node with label v ∈ V and children with labels ri∈ (V∪Σ), (v→r) ∈ R.
<EXPR> <EXPR> <EXPR> <EXPR> <EXPR> <EXPR> <EXPR> <EXPR> <EXPR> <EXPR> a + a x a a + a x a <EXPR> → <EXPR> + <EXPR> <EXPR> → <EXPR> x <EXPR> <EXPR> → ( <EXPR> ) <EXPR> → a Build a parse tree for a + a x a
Definition: a string is derived ambiguously in a context-free grammar if it has two or more different parse trees Definition: a grammar is ambiguous if it generates some string ambiguously
〈stmt〉 →〈assign〉 | 〈if-then〉 | 〈if-then-else〉 〈if-then〉 → if 〈condition〉 then 〈stmt〉 〈if-then-else〉 → if〈condition〉then〈stmt〉else 〈stmt〉 〈assign〉 → a = 1| a = 2 〈condition〉 → a==1| a==2 if a==1 then if a==2 then a = 1 else a = 2
R → AT | AB | ε S → aSb T → SB S → ε S → AT | AB A → a B → b THE CHOMSKY NORMAL FORM A context-free grammar is in Chomsky normal form if every rule is of the form: A → BC B and C are not start variable A → a a is a terminal S → ε S is the start variable
Theorem 2.9: Any context-free language can be generated by a context-free grammar in Chomsky normal form “We can transform any CFG into Chomsky normal form”
S0 → S 2. Remove all A → ε rules (where A is not S0) • Add a new start variable S0 and add the rule S0 → S S → aSb R → aRb For each occurrence of A on right hand side of a rule, add a new rule with the occurrence deleted S → R R → ε R → ab If we have the rule B → A, add B → ε, unless we have previously removed B → ε S → ε S → ab S → aRb S0 → aSb S0→ ε 3. Remove unit rules A → B S0 → ab Whenever B → w appears, add the rule A → w unless this was a unit rule previously removed S0 → aRb
4. Convert all remaining rules into the proper form S → aSb R → aRb S0 → aRb S → aRb R → ab S0 → A1A2 S → A1A2 S → ab A1 → a S → ab S → aRb A2 → RA3 S → A1A3 S0 → aSb A3 → b R → ab S0→ ε S0 → ab R → A1A3 S0 → ab S0 → A1A3 R → aRb S0 → aRb S0 → aSb R → A1A2 S0 → A1A4 S → aSb A4 → SA3 S → A1A4
Convert the following into Chomsky normal form: A → BAB | B | ε B → 00 | ε S SaSb | ε • Add a new start variable S0 and rule S0 → S 2. Remove all A → ε rules 3. Remove unit rules A → B 4. Convert all remaining rules into the proper form
THE CHOMSKY NORMAL FORM A context-free grammar is in Chomsky normal form if every rule is of the form: A → BC B and C are not start variable A → a a is a terminal S → ε S is the start variable Any variable A that is not the start variable can only generate strings of length > 0
Theorem: If G is in CNF, w L(G) and |w| > 0, then any derivation of w in G has length ≤ 2|w| - 1 Proof (by induction on |w|): Base Case: If |w| = 1, then any derivation of w must have length 1 Inductive Step: Assume true for any string of length at most k ≥ 1, and let |w| = k+1 Since |w| > 1, derivation starts with S → AB So w = xy where A x, |x| > 0 and B y, |y| > 0 By the inductive hypothesis, the length of any derivation of w must be at most 1 + (2|x| - 1) + (2|y| - 1) = 2(|x| + |y|) - 1
CONTEXT FREE OR NOT? L₁ = { xy | x,y ∈ {0,1}* and x=y} NOT YES L₂ = {xy | x,y ∈ {0,1}*, |x|=|y| and x ≠ y}
THE CONTEXT-FREE PUMPING LEMMA Let L be a context-free language Then there exists P such that For every w L with |w| ≥ P there exist uvxyz=w, where: 1. |vy| > 0 2. |vxy| ≤ P 3. For every i ≥ 0, uvixyiz L
EXAMPLES there exist uvxyz=w, where: • |vy| > 0 • |vxy| ≤ P • uvixyiz L for any i ≥ 0 Example: L = { w ∈ {0,1}* | w = wR}. w = 0; u,v,x,y,z = ? w = 010; u,v,x,y,z=? Example: L = { w ∈ {a,b}* | #a > #b in w}. w = a; u,v,x,y,z = ? w = aab; u,v,x,y,z=?
T T R R R R R R y v u z u v x y z v x y Idea: If w is long enough, then any parse tree for w must have a path that contains a variable more than once
Formal Proof: Let b be the maximum number of symbols on the right-hand side of a rule If the height of a parse tree is h, the length of the string generated is at most: bh Let |V| be the number of variables in G Define P = b|V|+2 Let w be a string of length at least P Let T be the parse tree for w with the smallest number of nodes. T must have height at least |V|+2
T T R R R R R R y v u z u v x y z v x y The longest path in T must have ≥ |V|+1 variables Select R to be the variable that repeats among the lowest |V|+1 variables 1. |vy| > 0 2. |vxy| ≤ P Let T be the parse tree for w with the smallest number of nodes. T must have height at least |V|+2
NEGATING THE PUMPING LEMMA L is not context-free Let L be a context-free language. Then there exists P such that For every w L with |w| ≥ P For every P, If There is a For every there exist uvxyz=w, where: 1. |vy| > 0 2. |vxy| ≤ P There is an 3. For everyi ≥ 0, uvixyiz L
USING THE PUMPING LEMMA Prove L = {ww | w ∈{0,1}*} is not context-free Assume L is context-free. Then there is a pumping length P. No matter what P is, the string s = 0P1P0P1P has |s| ≥ P and s∈ L. So there should be uvxyz=s with: 1) |vy| > 0, 2) |vxy| ≤ P, 3) ∀i, uvixyiz∈ L. P P P P s = 00…0011…1100…0011…11
USING THE PUMPING LEMMA Prove L = {ww | w ∈{0,1}*} is not context-free Assume L is context-free. Then there is a pumping length P. No matter what P is, the string s = 0P1P0P1P has |s|≥P and s∈ L. vxy cannot be only in the first half, since pumping down would move a 0 to the end of the first half. Then pumping down must remove at least one 1 or one zero, e.g. uxz = 0P1i0j1P where i p or j p. So uxz L, no matter how They are chosen; so L cannot be context-free! vxy cannot be only in the last half, since pumping up would move a 0 to the end of the first half. So there should be uvxyz=s with: 1) |vy| > 0, 2) |vxy| ≤P, 3) ∀i, uvixyiz∈L. P P P P s = 00…0011…1100…0011…11 vxy
USING THE PUMPING LEMMA Prove L = {w#wR | w ∈ {0,1}* and #1s = #0s} is not context-free Assume L is context-free. Then there is a pumping length P. No matter what P is, the string s = 0P1P#1P0P has |s| ≥ P and s∈L. So there should be uvxyz=s with: 1) |vy| > 0, 2) |vxy| ≤ P, 3)∀i, uvixyiz∈ L. P P P P s = 00…0011…11#11…1100…00
USING THE PUMPING LEMMA Prove L = {w#wR | w ∈ {0,1}* and #1s = #0s} is not context-free Assume L is context-free. Then there is a pumping length P. vxy cannot be only in either half, since pumping would make one side of # longer than the other. since # is in x and |vxy|≤ P, neither v nor y can have 0s, and at least one must have a 1. Then uv2xy2z has more 1s than 0s and is not in L. No matter what P is, the string s = 0P1P#1P0P has |s| ¸ P and s 2 L. # cannot be in v or y since pumping would add too many #s. So it must be in x. So there should be uvxyz=s with: 1) |vy| > 0, 2) |vxy| ≤ P, 3) ∀i, uvixyiz∈ L. P P P P s = 00…0011…11#11…1100…00 vxy