90 likes | 102 Views
Section 12.4 Context-Free Language Topics Algorithm . Remove L -productions from grammars for langauges without L . Find nonterminals that derive L . For each production A w construct all productions A w’ where w’ is obtained from
E N D
Section 12.4 Context-Free Language Topics Algorithm. Remove L-productions from grammars for langauges without L. Find nonterminals that derive L. For each production A w construct all productions A w’ where w’ is obtained from w by removing one or more occurrences of the nonterminals from Step 1. Combine the original productions with those of step 2 and eliminate any L-productions. Example. Remove L-productions from the grammar S ABc A aA | L B bB | L. Solution. Step 1: The nonterminals A and B derive L. Step 2: From the production S ABc we construct S Bc | Ac | c. From the production A aA we construct A a. From the production B bB we construct B b. Step 3: S ABc | Bc | Ac | c A aA | a B bB | b. Quiz. Remove L -productions from S ABc | Ab | c A ABa | L B Bbc | L. Solution. S ABc | Ab | c| Bc | Ac | b A ABa | Ba | Aa | a B Bbc | bc.
Chomsky Normal Form. Productions have one of the following forms A a (a a terminal) A BC S L (if L is in the language). Advantages: Parse trees are binary, which are easy to represent. Any string of length n > 0 can be derived in 2n – 1 steps. Algorithm. Transform to Chomsky normal form (with the additional property that no start symbol occurs on the right side of a production) 1. If the start symbol S occurs on some right side, create a new start symbol S´ and a new production S´ S. 2. Remove AL (if A ≠ S) by previous algorithm. (If S L is removed, add it back.) 3. Remove unitproductions (i.e., A B): If A B or A + B, then construct productions Aw where Bw is not a unit production. Now remove all unit productions. 4. For each production whose right side has two or more symbols, replace all occurrrences of each terminal a with a new nonterminal A and also add the new production Aa. 5. Replace each production B C1…Cnwith n > 2 with B C1D where D C2 …Cn. Repeat this step until all right sides have length two.
Example. Construct a Chomsky normal form for the grammar S aSb | D D Dc | L. Solution. Step 1: Add the production S´ S. Step 2: S´ S | L S aSb | ab | D D Dc | c. Step 3: S´ aSb | ab | Dc | c | L S aSb | ab | Dc | c D Dc | c. Step 4: S´ ASB | AB | DC | c | L S ASB | AB | DC | c D DC | c A a B b C c. Step 5: Replace S´ ASB and S ASB by S´ AE, S´ AE, and E SB.
Quiz. Construct a Chomsky normal form for the grammar S aTbb | U |L T cT | c U Ud | d. Solution. Step 1: No change.Step 2: No change. Step 3: Remove unit production S U to obtain S aTbb | Ud | d |L T cT | c U Ud | d. Step 4: Transform right sides of length at least two into strings of nonterminals. S ATBB | UD | d |L T CT | c U UD | d. A a B b C c D d. Step 5: Replace S ATBB with the productions S AE, E TF, F BB.
Greibach Normal Form. Productions have one of the following forms A b (b a terminal) A bD1…Dk S L (if L is in the language). Advantage: Any string of length n > 0 can be derived in n steps. Algorithm (idea). Transform context-free grammar to Greibach normal form. Perform steps 1, 2, and 3 of the Chomsky algorithm. Remove all left-recursion, including indirect, without addingL 3. Make substitutions to transform the grammar into the proper form. Example. Put the following grammar into Greibach normal form. S AB | Ac | d A aA | a B Ab | c. Solution: Steps 1 (Chomsky steps 1, 2, and 3) and 2 are non needed. Step 3: Replace A in S AB | Ac | d with aA | a to obtain S aAB | aB | aAc | ac | d. Replace A in B Ab | c with theright side of A aA | a to obtainB aAb | ab | c. Now add the new productions C c and D b to obtain the proper form: S aAB | aB | aAC | aC | d A aA | a B aAD | aD | c C c D b.
Example. Put the following grammar into Greibach normal form. S AB | Ac | d A aA | a B Ab | c. Solution: Steps 1 (Chomsky steps 1, 2, and 3) and 2 are not needed. Step 3: Replace A in S AB | Ac | d with aA | a to obtain S aAB | aB | aAc | ac | d. Replace A in B Ab | c with theright side of A aA | a to obtain B aAb | ab | c. Now add the new productions C c and D b and make appropriate replacements to obtain the proper form: S aAB | aB | aAC | aC | d A aA | a B aAD | aD | c C c D b.
Properties of Context-Free Languages When we know some properties of context-free languages they can help us argue, BWOC, that certain languages are not context-free. The Pumping Lemma If L is an infinite context-free language, then any grammar for L must be recursive, so there must be derivations of the the following form where u, v, w, x, and y are terminal strings. S + uNy N + vNx (where v and x are not both L) N + w. These derivations lead to derivations like S + uNy + uvNxy + uv2Nx2y + uvkNxky + uvkwxky L for all kN. This is the basis for the Pumping Lemma: There is an integer m > 0 such that if zL and | z | ≥ m, then z has the form z = uvwxy where 1 ≤ | vx | ≤ | vwx | ≤ m and uvkwxky L for all kN. Note: The number m depends on the grammar as we’ll see in the following example.
Example. Suppose we have the following grammar for {L, bbc} {abcnd | n N}. S aNd | bbc | L N Nc | b. Here are a few derivations: S aNd abd S aNd aNcd abcd S aNd aNcd aNccd abccd S + abckd for any k in N. For this grammar m = 4 can be used in the pumping lemma because any derivation of a string z with | z | ≥ 4 must use the nonterminal N. For example, if | z | = 8 and z = abcccccd, then the pumping lemma factors z = abcccccd = uvwxy where 1 ≤ | vx | ≤ | vwx | ≤ 4 and uvkwxky L for all kN. In this case let u = a, v = L, w = b, x = c, and y = ccccd. Example. The language L = {anbncn+k| k, n N} is not context-free. Proof: Assume, BWOC, that L is context-free. L is infinite, so pumping lemma applies. Choose z = ambmcmwhere m is the positive integer from the lemma. Then z = ambmcm = uvwxy where 1 ≤ | vx | ≤ | vwx | ≤ m and uvkwxky L for all kN. Observe neither v nor x can contain distinct letters. For example, if v = …a…b…, then v2 = …a…b……a…b…, which can’t appear as a substring of any string in L. So v and x must be strings of repeated occurrences of a single letter. Now since | vwx | ≤ m, there are two possible places in ambmcm where v and x must occur: (1) v and x occur in ambm. (2) v and x occur in bmcm. But we obtain the following contradictions because v and x are not both L. (1) Let k = 2 to obtain uv2wx2y = am+ibm+jcm, where i > 0 or j > 0. So uv2wx2y L (2) Let k = 0 to obtain uwy = ambm-icm–j, where i > 0 or j > 0. So we have uwy L. These contradictions imply that L is not context-free. QED.
Example/Quiz. Prove that the language L = {ss | s {a, b}*} is not context-free. Proof: Assume, BWOC, that L is context-free. L is infinite, so pumping lemma applies. Choose z = ambmambmwhere m is the positive integer from the lemma. Then z = ambmambm = uvwxy where 1 ≤ | vx | ≤ | vwx | ≤ m and uvkwxky L for all kN. Now since | vwx | ≤ m, there are three possible places in ambmambm where v and x must occur: (1) v and x occur in ambm (on the left of z). (2) v and x occur in bmam (in the center of z). (3) v and x occur in ambm (on the right of z). Notice that v and x can consist only of repetitions a single letter. For example, in case (1) suppose v = aibj for some i > 0 and j > 0 and x = bn for some n ≥ 0. Then, letting k = 0, we would obtain uwy = am–ibm–j–nambm, which cannot be in L. The argument is similar for the other cases. So v and x must consist only of repetitions of a single letter. We need to find a contradiction in each of the three cases. We’ll do it by using k = 0. This tells us that uwy L. But we obtain the following contradictions because v and x are not both L. (1) uwy = am–ibm–jambmwhere either i > 0 or j > 0 Souwy L, (2) uwy = ambm–iam–jbmwhere either i > 0 or j > 0.Souwy L. (3) uwy = ambmam–ibm–jwhere either i > 0 or j > 0.Souwy L. Therefore L is not context-free. QED. Remark: Be careful that the choice of z is not in a context-free sublanguage of L. For example, if we chose z = (ab)m(ab)m in the preceding example, we would not get any contradictions.