200 likes | 744 Views
Closure properties for CFLs. CFLs are closed under union, concatenation, and Kleene closure. We’ve seen the proof when observing that all regular languages are CFLs Recall that it involves creating new rules of the form S → S 1 | S 2 , S → S 1 S 2 , or S → l | S 1 S.
E N D
Closure properties for CFLs. • CFLs are closed under union, concatenation, and Kleene closure. • We’ve seen the proof when observing that all regular languages are CFLs • Recall that it involves creating new rules of the form • S → S1 | S2, S → S1S2, or S → l | S1S
Using CFL closure properties • For example, the languages below are CFLs • {0m1n2n | m,n>0} and {0m1m2n | m,n>0} • The first is {0m | m>0} ∙ {1n2n | n>0} • The second is {0m1m | m>0} ∙ {2n | n>0}. • But the intersection of these two CFLs is {0n1n2n | n>0}, which isn’t a CFL. • So the class of CFLs is not closed under intersection
Nonclosure under complementation • Suppose that the class of CFLs was closed under complementation. • Then it would be closed under intersection, just as for regular languages. • So the class of CFLs isn't closed under intersection • But the intersection of a CFL and a regular language is always a CFL. • This is shown as Theorem 8.5 of Linz
Intersecting CFLs with regular languages • The intuition is that a DFA and a PDA may be run in parallel • just as we did for two DFAs, by using the Cartesian product of the sets of states • This won't work for two PDAs • since we can't sensibly define the Cartesian product of two stacks. • But if there is only one PDA, then we can simply use its stack with no problem.
Decision algorithms for CFGs • To check whether L(G) is empty for a CFG G • we merely check whether S generates a string of terminals • To check whether L(G) is infinite for a CFG G • eliminate useless symbols, l-productions, and unit productions from G to get a new CFG G' • then determine whether any variable is nontrivially reachable from itself in G'. • This is equivalent to determining whether there is a cycle in a dependency graph
A sample CFG generating an infinite language • Consider the CFG G with rules below, where we ignore rules for Det, N, V, P • S → NP VP • NP → Det N | Det N PP • VP → V NP • PP → P NP • Since there’s a cycle involving NP and PP, L(G) is infinite • That is, NP =*> Det N P NP • where Det, N, and P generate nontrivial strings
Nonexistence of decision algorithms • For some questions about CFGs, there is no possible solution algorithm! • for example, there’s no algorithm for determining whether two CFGs generate the same language • Showing how to justify such a claim is a major goal of the rest of the course
The pumping lemma for CFLs • CFLs have an analog of the pumping lemma. • Its statement is slightly more complicated than for regular languages. • The basic idea is again simple • every sufficiently deep parse tree must have a long path, and thus a repeated symbol.
Deriving the pumping lemma for CFLs • Suppose a CFG G has m variables. • We may assume that G is in CNF (we won’t care whether l ε L(G). • If a parse tree has height greater than m, then it must have a path with more than m nonleaves. • Among the lowest m+1 nonleaves must be two that correspond to the same variable A.
Repeated variables on a path • If the higher node is Ahi and the lower one Alo • then the yield of Aj is a substring of the yield of Ai. • Let w be the yield of Alo and vwx that of Ahi. • Let uvwxy be the yield of the entire tree. • Since G is context free, we may replace the tree rooted at Alo by a copy of the tree rooted at Ahi.
The ability to pump • The yield of the new tree will be uv2wx2y. • We may repeat the process to get trees that yield uviwxiy for any i>0. • We may also replace the tree rooted at Ahi by the tree rooted at Alo to get a tree yielding uwy. • So for any nonzero i, uviwxiy is in L(G). • That is, z = uvwxy may be pumped • but slightly differently than for DFAs.
Making the pumped part short • Here, it is vwx whose length we will bound. • Since G is in CNF, a parse tree of height at most m has yield of length at most 2m-1 • For n = 2m, any string z of length at least n has a path of length greater than m • z can be pumped, • |vwx| <= n, • and vwx is a proper superstring of w. • We get a new pumping lemma (Linz, Th. 8.1)
Using the pumping for CFLs. • Suppose that the language L = {0j1j2j | j>=0} is a CFL. • If we choose z = 0n1n2n in the pumping lemma for CFLs, then vwx must contain at most two distinct symbols. • Therefore uwy contains more of the third symbol than of one of the other two symbols, and is thus not in L. • So L can't be a CFL.