340 likes | 355 Views
Explore the properties of regular languages through DFA construction and applications of the Pumping Lemma. Learn about string acceptance, loop detection, and analysis of accepted strings. Discover insights into language regularity and implications for DFAs.
E N D
Properties of Regular Languages Draw DFA for L={e} for a regular language, L={e, 01} for the r.l., L={e, 01, 0011} for the r.l., L={e, 01, 0011, 000111} …
q0 q6 • Example: {0n1n | 0=<n} is not regular, but {0n1n | 0nk, for some fixed k} is regular, for any fixed k. • For k=3: L = {ε, 01, 0011, 000111} 0 0 0 q1 q2 q3 1 1 1 1 0/1 1 1 q7 q5 q4 0/1 0 0 0
Properties of Regular Languages Drawn DFA for L={e} for a regular language, L={e, 01} for the r.l., L={e, 01, 0011} for the r.l., L={e, 01, 0011, 000111} … now, try for the r.l., L={0k1k | 0≤k ≤ infinity} {0n1n | 0nk, for some fixed k} is regular, for any fixed k but, {0n1n | 0 n} is not regular, to be proved using Pumpimg Lemma
Pumping Lemma for Regular Languages • Pumping Lemma relates the size of string accepted with the number of states in a DFA • For accepting a string of length m how many states do you need on a path? string a1a2a3a4 below -->q1 – a1–> q2 -- a2 q3 – a3 q4 – a4 --> qF* • What is the largest possible string accepted by a DFA with n states, presuming there is NO loop?
Pumping Lemma for Regular Languages • If there is a loop in the path for accepting a string, what type of strings are accepted via the loop(s)? • Think of a string in the language: 001 10 111, with middle 10 on a loop q0 001 q5 10 q5 111 Now, what type of strings should also be accepted? • What is the largest string accepted by a DFA with n states, presuming there is a LOOP ?
Pumping Lemma for Regular Languages • Pumping lemma quantifies two observations toward a path of accepting a string by a DFA: • 1. If the path is longer than the number of states available in the DFA, then there is a repetition of some state in the path, or a ‘loop’ • 2. If there is such a loop in the path, then the substring on the loop may appear 0 or more number of times for the corresponding string to be accepted • string: 001 10 111, with middle 10 on a loop q0 001 q5 10 q5 111 • Other corresponding accepted strings are: 001 (10)* 111
Pumping Lemma for Regular Languages • Lemma: (the pumping lemma) Let M be a DFA with |Q| = n states. If there exists a string x in L(M), such that |x| n, then there exists a way to write it as x = uvw, where u, v, and w are such that: • 1 |uv| n • |v| 1 • AND, all the strings uviw are also in L(M), for all i 0
Pumping Lemma for Regular Languages • Proof: Let x = a1a2 … am where mn, x is in L(M), and δ(q0, a1a2 … ap) = qjp a1 a2 a3 … am --> q0 qj1 qj2 qj3… qjm m n Consider the first n symbols, and first n+1 states on the above path: a1 a2 a3 … an q 0 qj1 qj2 qj3… qjn Since |Q| = n, it follows from the pigeon-hole principle that js = jt for some 0 s<t n, i.e., some state appears on this path twice (perhaps many states appear more than once, but at least one does).
a1 a2 as as+1 at at+1 an q0 qj1 qjs qjt qjn as+1…at a1…as at+1…an qjn qjs = qjt q0 n … … … same state so, loop 1
Let: • u = a1…as • v = as+1…at • Since 0 s<t n and uv = a1…at it follows that: • 1 |v| and therefore 1 |uv| • |uv| n and therefore 1 |uv| n • In addition, let: • w = at+1…am • It follows that uviw = a1…as(as+1…at)iat+1…am is in L(M), for all i 0. In other words, when processing the accepted string x, the loop was traversed once, but it could be allowed to traverse as many times as desired, and the corresponding strings would be accepted. “as many times” >= 0
1 0/1 0/1 0/1 0 0 q0 q1 q2 q3 1 q4 • Example: n = 5 x = 0001000 is in L(M) 0 0 0 1 0 0 0 q0q1 q2 q3q1 q2 q3 q4 first n+1 states u = 0 v = 001 w = 000 uviw is in L(M), i.e., 0(001)i000 is in L(M), for all i 0
1 0/1 0/1 0/1 0 0 q0 q1 q2 q3 1 q4 • Note that this does not mean that every string accepted by the DFA has this form: • 001 is in L(M) but is not of the form 0(001)i000 • Similarly, this does not mean that every long string accepted by the DFA has this form: • 0011111 is in L(M), is very long, but is not of the form 0(001)i000 • Note, however, in this latter case 0011111 could be similarly decomposed.
0/1 0/1 0/1 q0 q2 q1 • Note: It may be the case that no x in L(M) has |x| n. What is the language?
q3 a b a q0 q1 b b a • Example: n = 4 x = bbbab is in L(M) b b b a b |x| = 5 q0 q0q0q0 q1 q3 u = ε u = b u = bb v = b or v = b v = b w = bbab w = bab w = ab (b)ibbab is in L(M), for all i 0 b(b)ibab is in L(M), for all i 0 q2 a/b
Non-Regular Language: Example • Theorem: The language: L = {0k1k | k 0} (1) is not regular. • Proof: (by contradiction) Suppose that L is regular. Then there exists a DFA M such that: L = L(M) (2) We will show that M accepts some strings not in L, contradicting (2). Suppose that M has n states, and choose a string x=0m1m, where the constant m>>n. By (1), x is in L. By (2), x is also in L(M), note that the machine accepts a language not just a string
Since |x| = m >> n, it follows from the pumping lemma that: • x = uvw • 1 |uv| n • 1 |v|, and • uviw is in L(M), for all i 0 Since 1 |uv| n and n<<m, it follows that 1 |uv| <m. Also, since x = 0m1m it follows that uv is a substring of 0m. In other words v=0j, for some j 1. Since uviw is in L(M), for all i 0, it follows that 0m+cj1m is in L(M), for all c 1 (no. of loops), and j 1 (length of the loop) But by (1) and (2), 0m+cj1m is not in L(M), for any c 1, i.e., m+cj > m, a contradiction. • Note that L basically corresponds to balanced parenthesis.
Non-Regularity Example • Theorem: The language: L = {0k1k2k | k 0} (1) is not regular. • Proof: (by contradiction) Suppose that L is regular. Then there exists a DFA M such that: L = L(M) (2) We will show that M accepts some strings not in L, contradicting (2). Suppose that M has n states, and consider a string x=0m1m2m, where the constant m>>n. By (1), x is in L. By (2), x is also in L(M), note that the machine accepts a language not just a string
Since |x| = m >> n, it follows from the pumping lemma that: • x = uvw • 1 |uv| n • 1 |v|, and • uviw is in L(M), for all i 0 Since 1 |uv| n and n<<m, it follows that 1 |uv| m. Also, since x = 0m1m2m it follows that uv is a substring of 0m. In other words v=0j, for some j 1. Since uviw is in L(M), for all i 0, it follows that 0m+cj1m2m is in L(M), for all c 1 and j 1. But by (1) and (2), 0m+cj1m2m is not in L(M), for any integer c 1, a contradiction. • Note that the above proof is almost identical to the previous proof.
NonRegularity Example • Theorem: The language: L = {0m1n2m+n | m,n 0} (1) is not regular. • Proof: (by contradiction) Suppose that L is regular. Then there exists a DFA M such that: L = L(M) (2) We will show that M accepts some strings not in L, contradicting (2). Suppose that M has n states, and consider a string x=0m1n2m+n, where m>>n. By (1), x is in L. By (2), x is also in L(M).
Since |x| = m >> n, it follows from the pumping lemma that: • x = uvw • 1 |uv| n • 1 |v|, and • uviw is in L(M), for all i 0 Since 1 |uv| n and n<<m, it follows that 1 |uv| m. Also, since x = 0m1n2m+n it follows that uv is a substring of 0m. In other words v=0j, for some j 1. Since uviw is in L(M), for all i 0, it follows that 0m+cj1m2m+n is in L(M), for all c 1. In other words v can be “pumped” as many times as we like, and we still get a string in L(M). But by (1) and (2), 0m+cj1n2m+n is not in L(M), for any c 1, because the acceptable expression should be 0m+cj1n2m+cj+n, a contradiction.
What about {0m1n | m, n 0}? • {0m1n | m, n 0, and m<n}? • {0m1n | m, n 0, and m = n^2}? • {0m1n | m, n 0, and m>n}? • Are these regular languages, or not?
Theorem: Let M = (Q, Σ, δ, q0, F) be a DFA. Then L(M) is finite iff |x| < |Q| for all x in L(M). • Proof: (if) Suppose that |x| < |Q| for all x in L(M). Since the number of states |Q| and the number of input symbols |Σ| are both fixed, it follows that there are at most a finite number of strings of length less than |Q|. It follows that L(M) is finite. (Exercise: provide an upper bound on the number of such strings of max length p). (only if) By contradiction. Suppose that L(M) is finite, but that |x| |Q| for some x in L(M). From the pumping lemma it follows that x=uvw, |v| 1 and uviw is in L(M) for all i 0. But then L(M) would be infinite, a contradiction. Pumping lemma creates an infinite set of corresponding strings to be accepted.
Theorem: Let M = (Q, Σ, δ, q0, F) be a DFA. Then L(M) is infinite iff there exists an x in L(M) such that |x| |Q|. • Proof: (if) Suppose there exists an x in L(M) such that |x| |Q|. From the pumping lemma it follows that x=uvw, |v| 1 and uviw is in L(M) for all i 0. Therefore L(M) is infinite. (only if) By contradiction. Suppose that L(M) is infinite, but that there is NO x in L(M) such that |x| |Q|. It follows that each x in L(M) has length less than |Q|. Since the number of states |Q| and the number of input symbols |Σ| are both fixed, it follows that there are at most a finite number of strings of length less than |Q|. It follows that L(M) is finite. A contradiction. • Note that the ‘only-if’ also follows directly from the previous theorem.
Theorem: Let M = (Q, Σ, δ, q0, F) be a DFA. Then L(M) is non-empty iff there exists an x in L(M) such that |x| < |Q|. • Proof: (if) Suppose there exists a string x in L(M) such that |x| < |Q|. Then clearly L(M) is non-empty. (only if) By contradiction. Suppose that L(M) is non-empty, but that there exists NO string x in L(M) such that |x| < |Q|. It follows that for all strings y in L(M), |y| is |Q|. Let z be a string of shortest length in L(M). Then |z| n by the assumption. So, by the pumping lemma z=uvw, |v| 1 and uviw is also in L(M) for all i 0. But then uv0w = uw is in L(M) and: |uw| = |z| - |v| |z| - 1 < |z| Since uw is in L(M), it follows that z is NOT a string of shortest length in L(M), a contradiction. Every string >= n may be shrunk (1 or more times) within <n size, and still be accepted. n z
Corollary: Let M = (Q, Σ, δ, q0, F) be a DFA. Then there is an algorithm (=procedure with finite steps) to determine if L(M) is empty. • Proof: From the theorem it follows that if L(M) is non-empty then there exists a string x where |x| < n and n = |Q| such that M accepts x. We can try running M on each string of length < n to see if any are accepted. If one is accepted then L(M) is non-empty, otherwise L(M) is empty. Since the number of states |Q| and the number of input symbols |Σ| are both fixed, it follows that there are at most a finite number of strings (of length less than |Q|) that need to be tested. There are only finite number of strings to check = finite procedure
2n n z n • Theorem: Let M = (Q, Σ, δ, q0, F) be a DFA and let n = |Q|. Then L(M) is infinite iff there exists an x in L(M) such that n |x| < 2n. • Proof: (if part left as an exercise; similar to the previous theorem). • (only if: by Contradiction) Suppose there is no string of length between n and 2n in the language accepted by DFA M with n states. Suppose a string z is the shortest string accepted by the machine M, such that |z| 2n. Since |z| > n, by pumping lemma z = uvw, and 1 |v| n, because |uv| n. Then z’=uw must be accepted by the machine M. So, |z’| = |z| - |v| cannot be < n. As per assumption, |z’| 2n is not possible as z is the shortest such string and |z|>|z’|. And we presumed that there is no string of size between n and 2n, so, n |z’| 2n cannot be true either. Contradiction. • Corollary: Let M = (Q, Σ, δ, q0, F) be a DFA. Then there is an algorithm to determine if L(M) is infinite. • Proof: (left as an exercise).
Closure Properties of Regular Languages • Consider various operations on languages: = {x | x is in Σ* and x is not in L} = {x | x is in L1 or L2} = {x | x is in L1 and L2} = {x | x is in L1 but not in L2} L1L2= {xy | x is in L1 and y is in L2} L* = Li = L0 U L1 U L2 U… L+ = Li = L1 U L2 U…
Theorem: Let M = (Q, Σ, δ, q0, F) be a DFA. Then M’ = (Q, Σ, δ, q0, Q-F) is a DFA where L(M’) = Σ* - L(M). • Proof: (left as an exercise). • Corollary: The regular languages are closed under complementation, i.e., if L is a regular language then so is . • Does the above construction work for NFAs?
This is a good time to observe DFA-ε • Start state is also a final state, then, machine accepts ε • In our previous definition of DFA such a concept would have been non-determinism! • From now on our default definition of DFA includes DFA- ε
Theorem: The regular languages are closed with respect to union, concatenation and Kleene closure. • Proof: Let L1 and L2 be regular languages. By definition there exist regular expressions r1 and r2 such that L(r1) = L1 and L(r2) = L2. But then r1 + r2 is a regular expression representing . Similarly for concatenation and Kleene closure. • Corollary: L+ too: it is LL*
Lemma: Let L1 and L2 be subsets of Σ*. Then . • Theorem: Let L1 and L2 be regular languages. Then is a regular language. • Proof: Let L1 and L2 be regular languages. Then: is regular is regular is regular [uses: union regular languages is regular] is regular But by the lemma in set theory:
A direct construction of a DFA M such that : • Let: M1 = (Q1, Σ, δ1,p0,F1), where Q1 = {p0, p1,…} M2 = (Q2, Σ, δ2,q0,F2), where Q2 = {q0, q1,…} where L(M1) = L1 and L(M2) = L2. • Construct M where: Q = Q1 x Q2 Note that M has a state for each = {[p0, q0], [p0, q1], [p0, q2],…} pair of states in M1 and M2 Σ = as with M1 and M2 F = F1 x F2 start state = [p0, q0] δ([pi, qj], a) = [δ1(pi, a), δ2(qj, a)] for all [pi, qj] in Q and a in Σ
1 1 0 p0 0 p1 q2 1 0 q0 q1 1 0 • Example: The construct gives: Q = Q1 x Q2 = {[p0, q0], [p0, q1], [p0, q2], [p1, q0], [p1, q1], [p1, q2]} Σ = {0, 1} F = {[p1, q2]} start state = [p0, q0] δ – Some examples: δ([p0, q2], 0) = [δ1(p0, 0), δ2(q2, 0)] = [p1, q2] δ([p1, q2], 1) = [δ1(p1, 1), δ2(q2, 1)] = [p1, q2] 0/1
1 1 1 p0, q0 p0, q1 p0, q2 0 0 • Final Result: • Question: What if Σ1 does not equal Σ2? • Exercise: Prove that is a regular language. • L1 – L2 = L1 intersect (L2-bar) 0 0 0 0 p1, q0 p1, q1 p1, q2 1 1 1