300 likes | 523 Views
Operations on Languages. Let L, L 1 , L 2 be subsets of Σ * Concatenation: L 1 L 2 = {xy | x is in L 1 and y is in L 2 } Concatenating a language with itself: L 0 = {ε} L i = LL i-1 , for all i >= 1 Kleene Closure: L * = L i = L 0 U L 1 U L 2 U…
E N D
Operations on Languages • Let L, L1, L2 be subsets ofΣ* • Concatenation: L1L2 = {xy | x is in L1 and y is in L2} • Concatenating a language with itself: L0 = {ε} Li = LLi-1, for all i >= 1 • Kleene Closure: L* = Li = L0 U L1 U L2 U… • Positive Closure: L+ = Li = L1 U L2 U… • Question: Does L+ contain ε?
Kleene closure Say, L1 ={a, abc, ba}, on Σ ={a,b,c} Then, L2 = {aa, aabc, aba, abca, abcabc, abcba, baa, baabc, baba} L3= {a, abc, ba}. L2 L* = {ε, L1, L2, L3, . . .}
Regular Expressions • Highlights: • A regular expression is used to specify a language, and it does so precisely. • Regular expressions are very intuitive. • Regular expressions are very useful in a variety of contexts. • Given a regular expression, an NFA-ε can be constructed from it automatically. • Thus, so can an NFA be constructed, and a DFA, and a corresponding program, all automatically!
Definition of a Regular Expression • Let Σ be an alphabet. The regular expressions over Σ are: • Ø Represents the empty set { } • ε Represents the set {ε} • a Represents the set {a}, for any symbol a in Σ Let r and s be regular expressions that represent the sets R and S, respectively. • r+s Represents the set R U S (precedence 3) • rs Represents the set RS (precedence 2) • r* Represents the set R* (highest precedence) • (r) Represents the set R (not an op, provides precedence) • If r is a regular expression, then L(r) is used to denote the corresponding language.
Examples: Let Σ = {0, 1} (0 + 1)* All strings of 0’s and 1’s 01* 0 followed by any number 1’s 0(0 + 1)* All strings of 0’s and 1’s, beginning with a 0 (0 + 1)*1 All strings of 0’s and 1’s, ending with a 1 (0 + 1)*0(0 + 1)* All strings of 0’s and 1’s containing at least one 0 (0 + 1)*0(0 + 1)*0(0 + 1)* All strings of 0’s and 1’s containing at least two 0’s (0 + 1)*01*01* All strings of 0’s and 1’s containing at least two 0’s (1 + 01*0)* All strings of 0’s and 1’s containing an even number of 0’s 1*(01*01*)* All strings of 0’s and 1’s containing an even number of 0’s (1*01*0)*1* All strings of 0’s and 1’s containing an even number of 0’s (0+1)* = (0*1*)* Any string, or (sigma)*, sigma={0, 1} in all cases here • Question: Is there a unique minimum regular expression for a given language?
Identities: • Øu = uØ = Ø Multiply by 0 • εu = uε = u Multiply by 1 • Ø* = ε L* = Li = L0 U L1 U L2 U… • ε* = ε = {ε} • u+v = v+u • u + Ø = u • u + u = u • u* = (u*)* • u(v+w) = uv+uw • (u+v)w = uw+vw • (uv)*u = u(vu)* you have to have a single u, start or end • (u+v)* = (u*+v)* = u*(u+v)* = (u+vu*)* = (u*v*)* = u*(vu*)* = (u*v)*u*
Equivalence of Regular Expressionsand NFA-εs • Note: Throughout the following, keep in mind that a string is accepted by an NFA-ε if there exists a path from the start state to a final state. • Lemma 1: Let r be a regular expression. Then there exists an NFA-ε M such that L(M) = L(r). Furthermore, M has exactly one final state with no transitions out of it. • Proof: (by induction on the number of operators, denoted by OP(r), in r).
q0 q0 qf qf qf a Basis: OP(r) = 0 Then r is either Ø, ε, or a, for some symbol a in Σ For Ø: For ε: For a:
qf Inductive Hypothesis: Suppose there exists a k 0 such that for any regular expression r where 0 OP(r) k, there exists an NFA-ε such that L(M) = L(r). Furthermore, suppose that M has exactly one final state. Inductive Step: Let r be a regular expression with k + 1 operators (OP(r) = k + 1), where k + 1 >= 1. Case 1) r = r1 + r2 Since OP(r) = k +1, it follows that 0<= OP(r1), OP(r2) <= k. By the inductive hypothesis there exist NFA-ε machines M1 and M2 such that L(M1) = L(r1) and L(M2) = L(r2). Furthermore, both M1 and M2 have exactly one final state. Construct M as: M1 f1 q1 ε ε q0 ε ε M2 f2 q2
f2 qf M1 ε q1 f1 M2 q2 Case 2) r = r1r2 Since OP(r) = k+1, it follows that 0<= OP(r1), OP(r2) <= k. By the inductive hypothesis there exist NFA-ε machines M1 and M2 such that L(M1) = L(r1) and L(M2) = L(r2). Furthermore, both M1 and M2 have exactly one final state. Construct M as: Case 3) r = r1* Since OP(r) = k+1, it follows that 0<= OP(r1) <= k. By the inductive hypothesis there exists an NFA-ε machine M1 such that L(M1) = L(r1). Furthermore, M1 has exactly one final state. Construct M as: ε M1 ε ε q0 q1 f1 ε
q1 • Example: r = 0(0+1)* r = r1r2 r1 = 0 r2 = (0+1)* r2 = r3* r3 = 0+1 r3 = r4 + r5 r4 = 0 r5 = 1 1 q0
q1 q3 • Example: r = 0(0+1)* r = r1r2 r1 = 0 r2 = (0+1)* r2 = r3* r3 = 0+1 r3 = r4 + r5 r4 = 0 r5 = 1 1 q0 0 q2
1 0 q0 q2 q1 q3 q5 • Example: r = 0(0+1)* r = r1r2 r1 = 0 r2 = (0+1)* r2 = r3* r3 = 0+1 r3 = r4 + r5 r4 = 0 r5 = 1 ε ε q4 ε ε
ε ε q4 q5 0 1 q0 q2 q3 q1 ε ε qf ε ε ε ε q6 • Example: r = 0(0+1)* r = r1r2 r1 = 0 r2 = (0+1)* r2 = r3* r3 = 0+1 r3 = r4 + r5 r4 = 0 r5 = 1
ε ε q4 q5 1 0 q0 q2 q3 q1 ε ε qf ε ε ε ε q6 0 q8 q9 • Example: r = 0(0+1)* r = r1r2 r1 = 0 r2 = (0+1)* r2 = r3* r3 = 0+1 r3 = r4 + r5 r4 = 0 r5 = 1
ε ε q4 0 q5 q8 q9 0 1 q0 q2 q3 q1 ε ε ε qf ε ε ε ε q6 • Example: r = 0(0+1)* r = r1r2 r1 = 0 r2 = (0+1)* r2 = r3* r3 = 0+1 r3 = r4 + r5 r4 = 0 r5 = 1
Definitions Required to Convert a DFAto a Regular Expression • Let M = (Q, Σ, δ, q1, F) be a DFA with state set Q = {q1, q2, …, qn}, and define: Ri,j = { x | x is in Σ* and δ(qi,x) = qj} Ri,j is the set of all strings that define a path in M from qi to qj. • Note that states have been numbered starting at 1, not 0!
q2 1 q4 0 0 1 q1 0 q5 1 q3 0 1 1 0 • Example: R2,3 = {0, 001, 00101, 011, …} R1,4 = {01, 00101, …} R3,3 = {11, 100, …}
Another definition: Rki,j = { x | x is in Σ* and δ(qi,x) = qj, and for no u where 1 |u| < |x| and x = uv there is no case such that δ(qi,u) = qp where p>k} • In words: Rki,j is the set of all the strings that define a path in M from qi to qj but that passes through no state numbered greater than k. • Note that it may be true that i>k or j>k, only the intermediate states may not be >k.
q2 1 q4 0 0 1 q1 0 q5 1 q3 0 1 1 • Example: R42,3 = {0, 1000, 011, …} R12,3 = {0} 111 is not in R42,3 111 is not in R12,3 101 is not in R12,3 R52,3 = R2,3
Obeservations: 1) Rni,j = Ri,j 2) Rk-1i,j is a subset of Rki,j 3) L(M) = Rn1,q = R1,q 4) R0i,j = Easily computed from the DFA! 5) Rki,j = Rk-1i,k (Rk-1k,k)* Rk-1k,j U Rk-1i,j Now, you see the purpose of introducing k: So that we can write it as a RE
qi qj • Notes on 5: 5) Rki,j = Rk-1i,k (Rk-1k,k)* Rk-1k,jU Rk-1i,j • Consider paths represented by the strings in Rki,j : : • IF x is a string in Rki,jthen no state numbered > k is passed through when processing x and either: • qk is not passed through, i.e., x is in Rk-1i,j • qk is passed through one or more times, i.e., x is in Rk-1i,k (Rk-1k,k)* Rk-1k,j
Lemma 2: Let M = (Q, Σ, δ, q1, F) be a DFA. Then there exists a regular expression r such that L(M) = L(r). • Proof: First we will show (by induction on k) that for all i,j, and k, where 1 i,j n and 0 k n, that there exists a regular expression r such that L(r) = Rki,j . Basis: k=0 R0i,j contains single symbols, one for each transition from qi to qj, and possibly ε if i=j. case 1) No transitions from qi to qj and i != j r0i,j = Ø case 2) At least one (m 1) transition from qi to qj and i != j r0i,j= a1 + a2 + a3 + … + am where δ(qi, ap) = qj, for all 1 p m
case 3) No transitions from qi to qj and i = j r0i,j = ε case 4) At least one (m 1) transition from qi to qj and i = j r0i,j= a1 + a2 + a3 + … + am + ε where δ(qi, ap) = qj for all 1 p m Inductive Hypothesis: Suppose that Rk-1i,j can be represented by the regular expression rk-1i,j for all 1 i,j n, and some k1. Inductive Step: Consider Rki,j = Rk-1i,k (Rk-1k,k)* Rk-1k,j U Rk-1i,j . By the inductive hypothesis there exist regular expressions rk-1i,k , rk-1k,k , rk-1k,j , and rk-1i,j generating Rk-1i,k , Rk-1k,k , Rk-1k,j , and Rk-1i,j , respectively. Thus, if we let rki,j = rk-1i,k (rk-1k,k)* rk-1k,j + rk-1i,j then rki,j is a regular expression generating Rki,j ,i.e., L(rki,j) = Rki,j .
Finally, if F = {qj1, qj2, …, qjr}, then rn1,j1 + rn1,j2 + … + rn1,jr is a regular expression generating L(M). • Note: not only does this prove that the regular expressions generate the regular languages, but it also provides an algorithm for computing it!
1 0 1 q1 0 0/1 q2 q3 • Example: First table column is computed from the DFA. k = 0 k = 1 k = 2 rk1,1 ε rk1,2 0 rk1,3 1 rk2,1 0 rk2,2ε rk2,3 1 rk3,1Ø rk3,2 0 + 1 rk3,3ε
All remaining columns are computed from the previous column using the formula. r12,3 = r02,1 (r01,1 )* r01,3 + r02,3 = 0 (ε)* 1 + 1 = 01 + 1 k = 0 k = 1 k = 2 rk1,1 εε rk1,2 0 0 rk1,3 1 1 rk2,1 0 0 rk2,2εε + 00 rk2,3 1 1 + 01 rk3,1Ø Ø rk3,2 0 + 1 0 + 1 rk3,3εε
r21,3 = r11,2 (r12,2 )* r12,3 + r11,3 = 0 (ε + 00)* (1 + 01) + 1 = 0*1 k = 0 k = 1 k = 2 rk1,1 ε ε (00)* rk1,2 0 0 0(00)* rk1,3 1 1 0*1 rk2,1 0 0 0(00)* rk2,2ε ε + 00 (00)* rk2,3 1 1 + 01 0*1 rk3,1Ø Ø (0 + 1)(00)*0 rk3,2 0 + 1 0 + 1 (0 + 1)(00)* rk3,3ε ε ε + (0 + 1)0*1
To complete the regular expression, we compute: r31,2 + r31,3 k = 0 k = 1 k = 2 rk1,1 ε ε (00)* rk1,2 0 0 0(00)* rk1,3 1 1 0*1 rk2,1 0 0 0(00)* rk2,2ε ε + 00 (00)* rk2,3 1 1 + 01 0*1 rk3,1Ø Ø (0 + 1)(00)*0 rk3,2 0 + 1 0 + 1 (0 + 1)(00)* rk3,3ε ε ε + (0 + 1)0*1
Theorem: Let L be a language. Then there exists an a regular expression r such that L = L(r) if and only if there exits a DFA M such that L = L(M). • Proof: (if) Suppose there exists a DFA M such that L = L(M). Then by Lemma 2 there exists a regular expression r such that L = L(r). (only if) Suppose there exists a regular expression r such that L = L(r). Then by Lemma 1 there exists a DFA M such that L = L(M). • Corollary: The regular expressions define the regular languages. • Note: The conversion from a regular expression to a DFA and a program accepting L(r) is now complete, and fully automated!