460 likes | 660 Views
Nonregular Languages. The Pumping Lemma. Many languages can be defined by FA’s and regular expressions such as languages with required substrings, languages that forbid some substrings, languages that begin or end with certain substrings, languages with certain even/odd properties, and so on.
E N D
Nonregular Languages Dr. Shakir Al Faraji
The Pumping Lemma Many languages can be defined by FA’s and regular expressions such as languages with required substrings, languages that forbid some substrings, languages that begin or end with certain substrings, languages with certain even/odd properties, and so on. Dr. Shakir Al Faraji
The Pumping Lemma-Cont Some languages with a new forms, such as the language PALINDROME or the language PRIME of all words ap, where p is a prime number. These languages are not regular. We can describe them in English, but they cannot be defined by an FA or RE. Dr. Shakir Al Faraji
DEFINITION A language that cannot be defined by a regular expression is called nonregular language. By kleene’s theorem, a nonregular language can also not be accepted by FA or TG. All languages are either RL or none RL. Dr. Shakir Al Faraji
DEFINITION-Cont. Consider the following language: L = { ab aabb aaabbb aaaabbbb …} We could also define this language by the formula L = { anbn for n=0 1 2 3 4 . . . } Or for short L = { anbn } Dr. Shakir Al Faraji
DEFINITION-Cont. We need to show that this language is nonregular. Let us note , though, that it is a subset of many regular languages, such as a*b*, which, however, also includes such strings as aab and bb that {anbn} does not. Dr. Shakir Al Faraji
DEFINITION-Cont. Let us be very careful to note that {anbn} is not a regular expression. It involves the symbols { } and n that are not in the alphabet of regular expression. This is a language-defining expression that is not regular. Dr. Shakir Al Faraji
DEFINITION-Cont. Suppose on the contrary that this language were regular. Then there would have some FA that accepts it. Let us build one of these FA’s. this FA might have many states. Let us say that it has 96 states, we know it accepts the word a96b96. Dr. Shakir Al Faraji
DEFINITION-Cont. The first 96 letters of this input string are all a’s and they trace a path through this machine. The path cannot visit a new state with each input letter read because there are only 96 states. Therefore, at some point the path returns to a state that it has already visited. Dr. Shakir Al Faraji
DEFINITION-Cont. The first time it was in that state it left by the a-edge. The second time it is in the state it leaves by the a-edge again. Even if it only returns once, we say that the path contains a circuit in it. (A circuit is a loop that can be made of several edges.) Dr. Shakir Al Faraji
DEFINITION-Cont. First, the path wanders up to the circuit and then it starts to loop around the circuit, maybe many times. It cannot leave the circuit until a b is read from the input. Then the path can take a different turn. Dr. Shakir Al Faraji
DEFINITION-Cont. After the first b is read, the path goes off and does some other stuff following b-edges and eventually winds up at a final state where the word a96b96 is accepted. Dr. Shakir Al Faraji
DEFINITION-Cont. Consider that the circuit that the a-edge path loops around has 7 states in it. The path enters the circuit, loops around it and then goes off on the b-line to a final state. What would happen to the input string a96+7b96 ? Dr. Shakir Al Faraji
DEFINITION-Cont. Just as in the case of the input string a96b96, this string would produce a path through the machine that would walk up to the same circuit (reading only a’s) and begin to loop around it in exactly the same way. However, the path for a96+7b96loopsaround this circuit one more time than the path for a96b96- one extra time. Dr. Shakir Al Faraji
DEFINITION-Cont. Both paths, at exactly the same state in the circuit, begin to branch off on the b-edge. Once on the b-edge, they both go the same 96 b-steps and arrive at the same final state. But this would mean that the input string a103b96is accepted by this machine. However, that string is not in the language L = { anbn}. Dr. Shakir Al Faraji
DEFINITION-Cont. This is a contradiction. We assume that we were talking about an FA that accepts exactly the words in L and then we were able to prove that the same machine accepts some word that is not in L. This means that the machine that accepts exactly the words in L does not exist. L is nonregular. Dr. Shakir Al Faraji
b 9+ aa 7 1- 6 3 b bbbbbb b a aaa a aaa 2 8 10 4 aaa 5 DEFINITION-Cont. Dr. Shakir Al Faraji
DEFINITION-Cont. a96(a7) mb96 Where m is any integer 0, 1 , 3, . . . If m = 0, the path through this machine is the path for the word a96b96 , if m = 1, the path looks the same, but it loops the circuit one more time Dr. Shakir Al Faraji
DEFINITION-Cont. The principle we have been using to discuss the language L above can be generalized so that it applies to consideration of other languages. It is a tool that enables us to prove that certain other languages are also nonregular. We shall now present the generalization of this idea, called the pumping lemma for regular languages. Dr. Shakir Al Faraji
THEOREM Let L be any regular language that has infinitely many words. Then there exist some three strings x, y, and z (where y is not the null string) such that all the strings of the form xynz for n = 1 2 3 . . . are words in L. Dr. Shakir Al Faraji
THEOREM-PROOF If L is a regular language, then there is an FA that accepts exactly the words in L. Let us focus on one such machine. Like all FAs, this machine has only finitely many states. But L has infinitely many words in it. This means that there are arbitrarily long words in L. Dr. Shakir Al Faraji
THEOREM-PROOF Let w be some word in L that has more letters in it than there are states in the machine we are considering. When this word generates a path through the machine, the path cannot visit a new state for each letter because there are more letters than states. Therefore, it must at some point revisit a state that it has been to before. Dr. Shakir Al Faraji
THEOREM-PROOF Let us the word w up into three parts: Part 1: call part x all the letters of w starting at the beginning that lead up to the first state that is revisited. Notice that x may be the null string if the path for w revisits the start state as its first revisit. Dr. Shakir Al Faraji
THEOREM-PROOF Part 2: Starting at the letter after the substring x, let y denote the substring of w that travels around the circuit coming back to the same state the circuit began with. Because there must be a circuit, y cannot be the null string. y contains the letters of w for exactly one loop around this circuit. Dr. Shakir Al Faraji
THEOREM-PROOF Part 3: let z be the rest of w starting with the letter after the substring y and going to the end of the string w. This z could be null. The path for z could also possibly loop around the y-circuit or any other. What z does is arbitrary. Dr. Shakir Al Faraji
THEOREM-PROOF Clearly, from the definition of these three substring w=xyz and w is accepted by this machine. What is the path through this machine for the input string xyz ? Dr. Shakir Al Faraji
THEOREM-PROOF It follows the path for w in the first part x and leads up to the beginning of the place where w looped around a circuit. Then like w, it inputs the string y, which causes the machine to loop back to this same state again. Dr. Shakir Al Faraji
THEOREM-PROOF Then, again like w, it inputs a string y, which causes the machine to loop back to this same state yet another time. Then, just like w, it proceeds along the path dictated by the inputs string z, and so ends on the same final state that w did. This means that xyyz is accepted by this machine, and therefore it must be in the language L. Dr. Shakir Al Faraji
THEOREM-PROOF If we trace the path for xyyz, xyyyz, and xyyyyyyz, they would all be the same. Proceed up the circuit. Loop around the circit some number of times. Then proceed to the final state. All these must be accepted by the machine and therefore are all in the language L. in fact, L must contain all strings of the form xynz for n = 0 1 2 3 . . . As the theorem claims. Dr. Shakir Al Faraji
y x z THEOREM-PROOF This picture can be helpful in understanding the argument Dr. Shakir Al Faraji
y x z THEOREM-PROOF This picture can be helpful in understanding the argument Dr. Shakir Al Faraji
EXAMPLE-1 Suppose that we did not discuss the following language L = { anbn for n = 0 1 2 3 . . . } Let us see how we could apply the pumping lemma to this language Dr. Shakir Al Faraji
EXAMPLE-1-Cont. The pumping lemma says that there must be strings x, y, and z such that all words of the form xynz are in L. Is this possible? A typical word of L looks like aaa . . . aaaabbbb . . . bbb Dr. Shakir Al Faraji
EXAMPLE-1-Cont. aaa . . . aaaabbbb . . . bbb How do we break this into three pieces conformable to the roles x, y, and z? If the middle section y is going to be made entirely of a’s, then when we pump it to xyyz, the word will have more a’s than b’s which is not in L. Dr. Shakir Al Faraji
EXAMPLE-1-Cont. Similarly, if the middle part, y, is composed of only b’s, then the the word xyyz, the word will have more b’s than a’s. The solution is that the y-part must have some positive number of a’s and some positive number of b’s. Dr. Shakir Al Faraji
EXAMPLE-1-Cont. This would mean that y contains the substring ab. Then xyyz would have two copies of the substring ab. But every word in L contains the substring ab exactly once. Therefore, xyyz cannot be a word in L. This proves that pumping lemma cannot apply to L and therefore Lis not regular. Dr. Shakir Al Faraji
EXAMPLE-2 Consider the language anban={ b aba aabaa . . . } If this language were regular, then there would exist three strings x, y, and z such that xyz and xyyz were both words in this language. We can show that this impossible. Dr. Shakir Al Faraji
EXAMPLE-2-Cont. Observation 1: if the y string contained the b, then xyyz would contain two b’s, which no word in this language can have. Observation 2: if the y string is all a’s, then the b in the middle of the word xyz is in the x-side or z-side. In either case, xyyz has incresed the number of a’s either in front of the b or after the b, but not both Dr. Shakir Al Faraji
EXAMPLE-2-Cont. Conclusion 1: Therefore, xyyz does not have its b in the middle and is not in the form anban. Conclusion 2: This language cannot be pumped and is therefore not regular. Dr. Shakir Al Faraji
EXAMPLE-3. Consider the language PRIME = { ap where p is a prime } Is PRIME a regular language? If it is, then there is some FA that accepts exactly these words. Let us keep one such automata in mind. Let us suppose this FA has 345 states. Let us choose a prime number bigger than 345. For example, 347. Dr. Shakir Al Faraji
EXAMPLE-3-Cont. Then a347 can be broken into parts x, y, and z such that xynz is in PRIME for any value of n. The parts x, y, and z are all just strings of a’s. let us take the value of n=348. By pumping lemma, the word xy348z must be in PRIME. Now xy348z =xyzy347 Dr. Shakir Al Faraji
EXAMPLE-3-Cont. Now xy348z =xyzy347 We can write this because the factors x, y, and z are all solid clumps of a’s, and it does not matter in what order we concatenate them. All that matters is how many a’s we end up with. Let us write xyzy347 =a347y347 Dr. Shakir Al Faraji
EXAMPLE-3-Cont. xyzy347 =a347y347 This is because x, y, and z came originally from breaking up a347 into three parts. We also know that y is some (nonempty) string of a’s. Let us say that y=am for some integer m that we do not know Dr. Shakir Al Faraji
EXAMPLE-3-Cont. a347y347 =a347(am)347 =a347+347m =a347(m+1) What we have arrived: is that there is an element in PRIME that is of the form a to the power 347(m+1). Now because m0, we know that 347(m+1) is not a prime number. But this is a contradiction, because all the strings in PRIME are of the form ap, where Dr. Shakir Al Faraji
EXAMPLE-3-Cont. But this is a contradiction, because all the strings in PRIME are of the form ap, where the exponent is a prime number. This contradiction arose from the assumption that PRIME was a regular language. Therefore, PRIME is not regular. Dr. Shakir Al Faraji
END Dr. Shakir Al Faraji