1 / 47

2.3 Properties of Context-free Languages

2.3 Properties of Context-free Languages. Chomsky Normal Form Pumping Lemma for CFL ’ s Closure Properties Decision Properties. Chomsky Normal Form. Useful to deal only with CFGs in a simple normal form Most common: Chomsky Normal Form (CNF) Definition: every production has form

max
Download Presentation

2.3 Properties of Context-free Languages

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 2.3 Properties of Context-free Languages • Chomsky Normal Form • Pumping Lemma for CFL’s • Closure Properties • Decision Properties

  2. Chomsky Normal Form • Useful to deal only with CFGs in a simple normal form • Most common: Chomsky Normal Form (CNF) • Definition: every production has form A → BC or A → a or S → ε where A, B, C are any non-terminals (and B, C are not S) and a is any terminal.

  3. Chomsky Normal Form Theorem: Every CFL is generated by a CFG in Chomsky Normal Form. Proof: Transform any CFG into an equivalent CFG in CNF. Four steps: • add a new start symbol • eliminate “ε-productions”A → ε • eliminate “unit productions”A → B • convert remaining rules into proper form

  4. Chomsky Normal Form • add a new start symbol • add productionS0 → S • remove “ε-productions”A → ε • for each production with A on rhs, add productions with each occurrence of A removed. E.g. for the rule R → uAvAw, add R → uvAw | uAvw | uvw • eliminate “unit productions”A → B • for each production with B on lhs: B → u, add the rule A → u

  5. Chomsky Normal Form • convert remaining rules into proper form • replace production of form: A → u1U2u3…uk with: A → U1A1 U1 → u1 A1 → U2A2 A2 → U3A3 U3 → u3 : Ak-2 → Uk-1Uk Uk-1 → uk-1 Uk → uk U2 is already a non-terminal

  6. Example 2.7 S → ASA | aB A → B | S B → b | 

  7. Cleaning Up Grammars • Eliminate useless symbols. In order for a symbol X to be useful, it must: • 1. Derive some terminal string (possibly X is a terminal). • 2. Be reachable from the start symbol; i.e., S * X. • Note that X wouldn't really be useful if  or  included a symbol that didn't satisfy (1), so it is important that (1) be tested first, and symbols that don't derive terminal strings be eliminated before testing (2).

  8. Eliminate Useless Symbols • Finding symbols that don’t derive any terminal string • Recursive construction: • Basis: A terminal surely derives a terminal string. • Induction: If A is the head of a production whose body is X1X2… Xk, and each Xi is known to derive a terminal string, then surely A derives a terminal string. • Keep going until no more symbols that derive terminal strings are discovered.

  9. Example S  AB | C A  0B | C B  1 | A0 C  AC | C1

  10. Eliminate Useless Symbols • Finding symbols that cannot be derived from the start symbol • Another recursive algorithm: • Basis: S is “in”. • Induction: If variable A is in, then so is every symbol in the production bodies for A. • Keep going until no more symbols derivable from S can be found.

  11. 2.3 Properties of Context-free Languages • Chomsky Normal Form • Pumping Lemma for CFL’s • Closure Properties • Decision Properties

  12. Pumping Lemma for CFL’s • Similar to regular-language PL, but you have to pump two strings in the middle of the string, in tandem (i.e., the same number of copies of each). Formally: •  CFL L •  integer p •  w in L, with |w|  p •  uvxyz = w such that |vxy|  p and |vy| > 0 •  i  0, uvixyiz is in L.

  13. CFL Pumping Lemma Proof: consider a parse tree for a very long string w  L: S . . . A B C . . . . . . A D S C S A A B some non-terminal must repeat on long path a A C b A D D C B A a b b S S b b B A b b b a a b a

  14. S S A A A u v y z A u v y z A v x y CFL Pumping Lemma • Schematic proof: S A A u v x y z

  15. S A S A u z x u z CFL Pumping Lemma • Schematic proof: S A A u v x y z

  16. CFL Pumping Lemma • how large should pumping length p be? • need to ensure other conditions: |vy| > 0 |vxy| ≤ p • b = max # symbols on rhs of any production (assume b ≥ 2) • if parse tree has height < h, then string generated has length < bh (so length ≥ bh implies height ≥ h)

  17. CFL Pumping Lemma • let m be the # of nonterminals in the grammar • to ensure path of length at least m+2, require |w| ≥ p = bm+2 • since |w| ≥ bm+2, any parse tree for w has height ≥ m+2 • let T be the smallest parse tree for w • longest root-leaf path must consist of at least m+1 non-terminals and 1 terminal.

  18. CFL Pumping Lemma S • there must be a repeated non-terminal A on long path • select a repetition among the lowest m+1 non-terminals on path. • pictures show that for every i  0, uvixyiz  L A A u v x y z • is |vy| > 0 ? • smallest parse tree T ensures • is |vxy| ≤ p? • red path has length ≤ m+2, so ≤ bm+2 = p leaves

  19. Example 2.20 • B = {anbncn | n  0} • Assume B is a CFL, let p be the pumping length • Choose w = apbpcp in B (|w| > p) • Applying PL, w = uvxyz, where |vy|>0 and |vxy|p, such that uvixyiz in B for all i0 • Two possible cases: • vxy = a*b*, uv2xy2z will result in more a’s and/or more b’s than c’s, not in B • vxy = b*c*, uv2xy2z will result in more b’s and/or more c’s than a’s, not in B • Contradiction, B is not a CFL

  20. Example 2.21 • C = {aibjck | k  j  i 0} • Assume C is a CFL, let p be the pumping length • Choose w = apbpcp in C (|w| > p) • Applying PL, w = uvxyz, where |vy|>0 and |vxy|p, such that uvixyiz in C for all i0 • Two possible cases: • vxy = a*b*, uv2xy2z will result in more a’s and/or more b’s than c’s, not in C • vxy = b*c*, uv0xy0z= uxz will result in fewer b’s and/or fewer c’s than a’s, not in C • Contradiction, C is not a CFL

  21. Example 2.22 • D = { ww | w  {0, 1}* } • Assume D is a CFL, let p be the pumping length • Choose w = 0p1p0p1p in D (|w| > p) • Why not choose w = 0p10p1? • Applying PL, w = uvxyz, where |vy|>0 and |vxy|p, such that uvixyiz in D for all i0

  22. Example 2.22 (Cont’d) • Three possible cases: • vxy in first half • then uv2xy2z = 0??...?1??...?, not in D • vxy in second half • then uv2xy2z = ??...?0??...?1, not in D • vxy straddles midpoint • then uv0xy0z = uxz = 0p1i0j1p with i ≠ p or j ≠ p, not in D • Contradiction, D is not a CFL

  23. Exercise • Prove that L = {0k^2 | k is any integer} is not a CFL

  24. 2.3 Properties of Context-free Languages • Chomsky Normal Form • Pumping Lemma for CFL’s • Closure Properties • Decision Properties

  25. Closure Under Substitution • If a substitutions assigns a CFL to every symbol in the alphabet of a CFL L, then s(L) is a CFL. • Take a grammar for L and a grammar for each language La = s(a). • Make sure all the variables of all these grammars are different (rename variables whenever necessary). • Replace each terminal a in the productions for L by Sa, the start symbol of the grammar for La. • Intuition: this replacement allows any string in La to take the place of any occurrence of a in any string of L.

  26. Example • L = {0n1n | n  1}, generated by CFG S  0S1 | 01. • s(0) = {anbm | m  n}, generated by CFG S  aSb | A; A  aA | ab. • s(1) = {ab, abc}, generated by CFG S  abA; A  c | . 1. Rename second and third S's to S0 and S1, respectively. Rename second A to B. Resulting grammars are: • S0 aS0b | A; A  aA | ab • S1 abB; B  c | 2. In the first grammar, replace 0 by S0 and 1 by S1. The combined grammar: • S  S0SS1 | S0S1 • S0 aS0b | A; A  aA | ab • S1 abB; B  c | 

  27. Consequences of Closure Under Substitution • Closure of CFL’s under union, concatenation, star, homomorphism. • union: let L1 and L2 be CFL’s, L = {1, 2}, s(1)=L1, s(2)=L2. Then L1L2 = s(L). • concatenation: let L = {12}. Then L1L2 = s(L). • star: let L = {1}*. Then L1* = s(L). • homomorphism: let L be a CFL over , h be a homomorphism over . Define s by s(a) = {h(a)}. Then h(L) = s(L).

  28. Closure Under Reversal • If LCFL, then LRCFL. • PROOF: Suppose L is generated by CFG G = (V, T, P, S), construct GR = (V, T, PR, S), where PR = {A→R | A→ in P} Prove that LR = L(GR) by induction on the lengths of derivations

  29. Closure Under Inverse Homom. • PDA-based construction. • Keep a “buffer” in which we place h(a) for some input symbol a. • Read inputs from the front of the buffer ( OK). • When the buffer is empty, it may be reloaded with h(b) for the next input symbol b, or we may continue making -moves.

  30. Closure Under Inverse Homom.

  31. Non-closure Under Intersection • A counter example: L={0n1n2n | n1} is not a CFL. But • L1={0n1n2i | n1, i1} is a CFL, can be generated by grammar: S→AB A→0A1 | 01 B→2B | 2 • Likewise L2={0i1n2n | n1, i1} is a CFL. • L = L1L2

  32. Non-closure Under Complement • Since CFL's are closed under union, if they were also closed under complement, they would be closed under intersection by DeMorgan's law.

  33. 2.3 Properties of Context-free Languages • Chomsky Normal Form • Pumping Lemma for CFL’s • Closure Properties • Decision Properties

  34. Decision Properties of CFL’s • Testing emptiness of a CFL • Testing finiteness of a CFL • Testing membership of a string in a CFL • Preview of undecidable CFL problems

  35. Testing Emptiness • Use a CFG representation of the given CFL, check if the start symbol is useless.

  36. Testing Finiteness • Let L be a CFL. Then there is some pumping-lemma constant n for L. • Test all strings of length between n and 2n-1 for membership (as in next section). • If there is any such string, it can be pumped, and the language is infinite. • If there is no such string, then n-1 is an upper limit on the length of strings, so the language is finite. • Trick: If there were a string z = uvxyz of length 2n or longer, you can find a shorter string uxz in L, but it's at most n shorter. Thus, if there are any strings of length 2n or more, you can repeatedly cut out vy to get, eventually, a string whose length is in the range n to 2n-1.

  37. Testing Membership • Can simulate NPDA, but this takes exponential time in the worst case. • There is an O(n3) algorithm (n = length of w) that uses a “dynamic programming” technique. • Called Cocke-Younger-Kasami (CYK) algorithm.

  38. Testing Membership • Convert CFG into Chomsky Normal form. • Parse tree for string x generated by nonterminal A: A • If A k x (k > 1) then there must be a way to split x: • x = yz • A → BC is a production and • B i y and C j z for i, j < k B C x

  39. CYK Algorithm • Input is w = a1a2…an • We construct a triangular table, where Xij = {A | A * aiai+1…aj} • See if SX1n, i.e. S * w

  40. CYK Algorithm • Fill the table row-by-row, upwards. • Compute Xij in the induction • Basis: Xii = {A | A → ai is in G} • Induction: A  Xij if there exists A → BC in G, and B  Xik, C  X(k+1)j for some k (i<=k<j). • For w = a1a2…an, there are O(n2) entries Xij to compute. For each Xij we need to compare at most n pairs (Xik, X(k+1)j). Total work is O(n3).

  41. Example S → AS | SB | AB A → a B → b and the input string is aabb.

  42. Preview of Undecidable CFL Problems • Is a given CFG ambiguous? • Is a given CFL inherently ambiguous? • Is the intersection of two CFL’s empty? • Are two CFL’s the same? • Is a given CFL universal (equal to *)?

  43. Assignment 5 • 1、Here is a grammar: it generates the language with blocks of 0's followed by blocks of at least as many 1's. S -> AAS | A |  A -> 0A1 | 0B1 B -> B1 |  Convert the grammar to Chomsky normal form. Show the grammars that result from the intermediate steps -- elimination of -productions, elimination of unit productions.

  44. Assignment 5 • 2、Show that {0i1j2k | i<j<k} is not a context-free language. • 3、Exercise 2.18 (c).

  45. Summary of Chap. 2 • Context-Free Grammars (CFG’s) describe Context-Free Languages (CFL’s) • terminals, non-terminals • productions • derivations, yields • parse trees • Ambiguity • Chomsky Normal Form (CNF) • Pushdown Automata (PDA)

  46. Summary of Chap. 2 • PDA’s and CFG’s are equivalent • Deterministic PDA’s are weaker • CFL Pumping Lemma is used to show certain languages are not CFL’s • Some of the closure properties of regular languages carry over to CFL’s. But CFL’s are not closed under intersection nor complement. • We can test for emptiness, finiteness and membership. But for instance, equivalence of CFL’s is undecidable.

  47. So far… • We proved (using constructions of FA’s and PDA’s and the two pumping lemmas): {w | w{a,b}* has an even # of b’s} {anbn | n ≥ 0 } regularlanguages all languages context free languages {anbncn | n ≥ 0 }

More Related