610 likes | 1.67k Views
2.3 Properties of Context-free Languages. Chomsky Normal Form Pumping Lemma for CFL ’ s Closure Properties Decision Properties. Chomsky Normal Form. Useful to deal only with CFGs in a simple normal form Most common: Chomsky Normal Form (CNF) Definition: every production has form
E N D
2.3 Properties of Context-free Languages • Chomsky Normal Form • Pumping Lemma for CFL’s • Closure Properties • Decision Properties
Chomsky Normal Form • Useful to deal only with CFGs in a simple normal form • Most common: Chomsky Normal Form (CNF) • Definition: every production has form A → BC or A → a or S → ε where A, B, C are any non-terminals (and B, C are not S) and a is any terminal.
Chomsky Normal Form Theorem: Every CFL is generated by a CFG in Chomsky Normal Form. Proof: Transform any CFG into an equivalent CFG in CNF. Four steps: • add a new start symbol • eliminate “ε-productions”A → ε • eliminate “unit productions”A → B • convert remaining rules into proper form
Chomsky Normal Form • add a new start symbol • add productionS0 → S • remove “ε-productions”A → ε • for each production with A on rhs, add productions with each occurrence of A removed. E.g. for the rule R → uAvAw, add R → uvAw | uAvw | uvw • eliminate “unit productions”A → B • for each production with B on lhs: B → u, add the rule A → u
Chomsky Normal Form • convert remaining rules into proper form • replace production of form: A → u1U2u3…uk with: A → U1A1 U1 → u1 A1 → U2A2 A2 → U3A3 U3 → u3 : Ak-2 → Uk-1Uk Uk-1 → uk-1 Uk → uk U2 is already a non-terminal
Example 2.7 S → ASA | aB A → B | S B → b |
Cleaning Up Grammars • Eliminate useless symbols. In order for a symbol X to be useful, it must: • 1. Derive some terminal string (possibly X is a terminal). • 2. Be reachable from the start symbol; i.e., S * X. • Note that X wouldn't really be useful if or included a symbol that didn't satisfy (1), so it is important that (1) be tested first, and symbols that don't derive terminal strings be eliminated before testing (2).
Eliminate Useless Symbols • Finding symbols that don’t derive any terminal string • Recursive construction: • Basis: A terminal surely derives a terminal string. • Induction: If A is the head of a production whose body is X1X2… Xk, and each Xi is known to derive a terminal string, then surely A derives a terminal string. • Keep going until no more symbols that derive terminal strings are discovered.
Example S AB | C A 0B | C B 1 | A0 C AC | C1
Eliminate Useless Symbols • Finding symbols that cannot be derived from the start symbol • Another recursive algorithm: • Basis: S is “in”. • Induction: If variable A is in, then so is every symbol in the production bodies for A. • Keep going until no more symbols derivable from S can be found.
2.3 Properties of Context-free Languages • Chomsky Normal Form • Pumping Lemma for CFL’s • Closure Properties • Decision Properties
Pumping Lemma for CFL’s • Similar to regular-language PL, but you have to pump two strings in the middle of the string, in tandem (i.e., the same number of copies of each). Formally: • CFL L • integer p • w in L, with |w| p • uvxyz = w such that |vxy| p and |vy| > 0 • i 0, uvixyiz is in L.
CFL Pumping Lemma Proof: consider a parse tree for a very long string w L: S . . . A B C . . . . . . A D S C S A A B some non-terminal must repeat on long path a A C b A D D C B A a b b S S b b B A b b b a a b a
S S A A A u v y z A u v y z A v x y CFL Pumping Lemma • Schematic proof: S A A u v x y z
S A S A u z x u z CFL Pumping Lemma • Schematic proof: S A A u v x y z
CFL Pumping Lemma • how large should pumping length p be? • need to ensure other conditions: |vy| > 0 |vxy| ≤ p • b = max # symbols on rhs of any production (assume b ≥ 2) • if parse tree has height < h, then string generated has length < bh (so length ≥ bh implies height ≥ h)
CFL Pumping Lemma • let m be the # of nonterminals in the grammar • to ensure path of length at least m+2, require |w| ≥ p = bm+2 • since |w| ≥ bm+2, any parse tree for w has height ≥ m+2 • let T be the smallest parse tree for w • longest root-leaf path must consist of at least m+1 non-terminals and 1 terminal.
CFL Pumping Lemma S • there must be a repeated non-terminal A on long path • select a repetition among the lowest m+1 non-terminals on path. • pictures show that for every i 0, uvixyiz L A A u v x y z • is |vy| > 0 ? • smallest parse tree T ensures • is |vxy| ≤ p? • red path has length ≤ m+2, so ≤ bm+2 = p leaves
Example 2.20 • B = {anbncn | n 0} • Assume B is a CFL, let p be the pumping length • Choose w = apbpcp in B (|w| > p) • Applying PL, w = uvxyz, where |vy|>0 and |vxy|p, such that uvixyiz in B for all i0 • Two possible cases: • vxy = a*b*, uv2xy2z will result in more a’s and/or more b’s than c’s, not in B • vxy = b*c*, uv2xy2z will result in more b’s and/or more c’s than a’s, not in B • Contradiction, B is not a CFL
Example 2.21 • C = {aibjck | k j i 0} • Assume C is a CFL, let p be the pumping length • Choose w = apbpcp in C (|w| > p) • Applying PL, w = uvxyz, where |vy|>0 and |vxy|p, such that uvixyiz in C for all i0 • Two possible cases: • vxy = a*b*, uv2xy2z will result in more a’s and/or more b’s than c’s, not in C • vxy = b*c*, uv0xy0z= uxz will result in fewer b’s and/or fewer c’s than a’s, not in C • Contradiction, C is not a CFL
Example 2.22 • D = { ww | w {0, 1}* } • Assume D is a CFL, let p be the pumping length • Choose w = 0p1p0p1p in D (|w| > p) • Why not choose w = 0p10p1? • Applying PL, w = uvxyz, where |vy|>0 and |vxy|p, such that uvixyiz in D for all i0
Example 2.22 (Cont’d) • Three possible cases: • vxy in first half • then uv2xy2z = 0??...?1??...?, not in D • vxy in second half • then uv2xy2z = ??...?0??...?1, not in D • vxy straddles midpoint • then uv0xy0z = uxz = 0p1i0j1p with i ≠ p or j ≠ p, not in D • Contradiction, D is not a CFL
Exercise • Prove that L = {0k^2 | k is any integer} is not a CFL
2.3 Properties of Context-free Languages • Chomsky Normal Form • Pumping Lemma for CFL’s • Closure Properties • Decision Properties
Closure Under Substitution • If a substitutions assigns a CFL to every symbol in the alphabet of a CFL L, then s(L) is a CFL. • Take a grammar for L and a grammar for each language La = s(a). • Make sure all the variables of all these grammars are different (rename variables whenever necessary). • Replace each terminal a in the productions for L by Sa, the start symbol of the grammar for La. • Intuition: this replacement allows any string in La to take the place of any occurrence of a in any string of L.
Example • L = {0n1n | n 1}, generated by CFG S 0S1 | 01. • s(0) = {anbm | m n}, generated by CFG S aSb | A; A aA | ab. • s(1) = {ab, abc}, generated by CFG S abA; A c | . 1. Rename second and third S's to S0 and S1, respectively. Rename second A to B. Resulting grammars are: • S0 aS0b | A; A aA | ab • S1 abB; B c | 2. In the first grammar, replace 0 by S0 and 1 by S1. The combined grammar: • S S0SS1 | S0S1 • S0 aS0b | A; A aA | ab • S1 abB; B c |
Consequences of Closure Under Substitution • Closure of CFL’s under union, concatenation, star, homomorphism. • union: let L1 and L2 be CFL’s, L = {1, 2}, s(1)=L1, s(2)=L2. Then L1L2 = s(L). • concatenation: let L = {12}. Then L1L2 = s(L). • star: let L = {1}*. Then L1* = s(L). • homomorphism: let L be a CFL over , h be a homomorphism over . Define s by s(a) = {h(a)}. Then h(L) = s(L).
Closure Under Reversal • If LCFL, then LRCFL. • PROOF: Suppose L is generated by CFG G = (V, T, P, S), construct GR = (V, T, PR, S), where PR = {A→R | A→ in P} Prove that LR = L(GR) by induction on the lengths of derivations
Closure Under Inverse Homom. • PDA-based construction. • Keep a “buffer” in which we place h(a) for some input symbol a. • Read inputs from the front of the buffer ( OK). • When the buffer is empty, it may be reloaded with h(b) for the next input symbol b, or we may continue making -moves.
Non-closure Under Intersection • A counter example: L={0n1n2n | n1} is not a CFL. But • L1={0n1n2i | n1, i1} is a CFL, can be generated by grammar: S→AB A→0A1 | 01 B→2B | 2 • Likewise L2={0i1n2n | n1, i1} is a CFL. • L = L1L2
Non-closure Under Complement • Since CFL's are closed under union, if they were also closed under complement, they would be closed under intersection by DeMorgan's law.
2.3 Properties of Context-free Languages • Chomsky Normal Form • Pumping Lemma for CFL’s • Closure Properties • Decision Properties
Decision Properties of CFL’s • Testing emptiness of a CFL • Testing finiteness of a CFL • Testing membership of a string in a CFL • Preview of undecidable CFL problems
Testing Emptiness • Use a CFG representation of the given CFL, check if the start symbol is useless.
Testing Finiteness • Let L be a CFL. Then there is some pumping-lemma constant n for L. • Test all strings of length between n and 2n-1 for membership (as in next section). • If there is any such string, it can be pumped, and the language is infinite. • If there is no such string, then n-1 is an upper limit on the length of strings, so the language is finite. • Trick: If there were a string z = uvxyz of length 2n or longer, you can find a shorter string uxz in L, but it's at most n shorter. Thus, if there are any strings of length 2n or more, you can repeatedly cut out vy to get, eventually, a string whose length is in the range n to 2n-1.
Testing Membership • Can simulate NPDA, but this takes exponential time in the worst case. • There is an O(n3) algorithm (n = length of w) that uses a “dynamic programming” technique. • Called Cocke-Younger-Kasami (CYK) algorithm.
Testing Membership • Convert CFG into Chomsky Normal form. • Parse tree for string x generated by nonterminal A: A • If A k x (k > 1) then there must be a way to split x: • x = yz • A → BC is a production and • B i y and C j z for i, j < k B C x
CYK Algorithm • Input is w = a1a2…an • We construct a triangular table, where Xij = {A | A * aiai+1…aj} • See if SX1n, i.e. S * w
CYK Algorithm • Fill the table row-by-row, upwards. • Compute Xij in the induction • Basis: Xii = {A | A → ai is in G} • Induction: A Xij if there exists A → BC in G, and B Xik, C X(k+1)j for some k (i<=k<j). • For w = a1a2…an, there are O(n2) entries Xij to compute. For each Xij we need to compare at most n pairs (Xik, X(k+1)j). Total work is O(n3).
Example S → AS | SB | AB A → a B → b and the input string is aabb.
Preview of Undecidable CFL Problems • Is a given CFG ambiguous? • Is a given CFL inherently ambiguous? • Is the intersection of two CFL’s empty? • Are two CFL’s the same? • Is a given CFL universal (equal to *)?
Assignment 5 • 1、Here is a grammar: it generates the language with blocks of 0's followed by blocks of at least as many 1's. S -> AAS | A | A -> 0A1 | 0B1 B -> B1 | Convert the grammar to Chomsky normal form. Show the grammars that result from the intermediate steps -- elimination of -productions, elimination of unit productions.
Assignment 5 • 2、Show that {0i1j2k | i<j<k} is not a context-free language. • 3、Exercise 2.18 (c).
Summary of Chap. 2 • Context-Free Grammars (CFG’s) describe Context-Free Languages (CFL’s) • terminals, non-terminals • productions • derivations, yields • parse trees • Ambiguity • Chomsky Normal Form (CNF) • Pushdown Automata (PDA)
Summary of Chap. 2 • PDA’s and CFG’s are equivalent • Deterministic PDA’s are weaker • CFL Pumping Lemma is used to show certain languages are not CFL’s • Some of the closure properties of regular languages carry over to CFL’s. But CFL’s are not closed under intersection nor complement. • We can test for emptiness, finiteness and membership. But for instance, equivalence of CFL’s is undecidable.
So far… • We proved (using constructions of FA’s and PDA’s and the two pumping lemmas): {w | w{a,b}* has an even # of b’s} {anbn | n ≥ 0 } regularlanguages all languages context free languages {anbncn | n ≥ 0 }