Chapter 3 Regular Expressions and Languages

Chapter 3 Regular Expressions and Languages Giza Pyramids, Egypt

Outline • 3.1 Regular Expressions • 3.2 Finite Automata & Regular Expressions • 3.3 Applications of RE’s • 3.4 Algebraic Laws for RE’s

3.1 Regular Expressions • Use of Regular expressions • The regular expression is a kind of generator for languages. • It offers a “declarative” way of expressing strings of symbols. • It defines all and only regular languages (a theorem).

3.1 Regular Expressions • Applications of Regular expressions • Used as commands for finding strings in Web browsers or text-formatting systems (such as UNIX grep commands)* • Used as lexical analyzer generator (such Lex or Flex) • A lexical analyzer breaks source programs into “tokens” (keywords, identifiers, signs, …) • The grep command searches files or standard input globally for lines matching a given regular expression, and prints them to the program's standard output.

3.1 Regular Expressions • Operators of Regular Expressions • Review of three operations on languages L and M: • Union --- L∪M = {x | xL or xM} • Concatenation --- LM = {xy | xL, yM} • Example --- L0 = {e}, L1 = L, L2 = LL, … • Closure (or star, or Kleene closure) --- L* = L0∪L1∪L2∪...

3.1 Regular Expressions • Example 3.1 --- (language) • f* = {e} because f0 = {e}. • If L = {0, 1}, then L0 = {e}, L1 = L, L2 = {00. 01, 10, 11}, … • If L is the set of all strings of 0’s, then it can be proved that L* is L itself (see the textbook for the proof).

3.1 Regular Expressions • 3.1.2 Building Regular Expressions • Recursive definition of a regular expression (RE)E and the language which it defines, L(E): • Basis: • Constants e and f are RE’s, defining languages {e} and f, respectively L(e) = {e}, L(f) = f. • If a is a symbol, then a is an RE, defining the language {a} L(a) = {a}. (note: ais of bold face) • A variable like L (capitalized and italic) represents any language.

3.1 Regular Expressions • 3.1.2 Building Regular Expressions • Recursive definition of an RE (cont’d): • Induction: given two RE’s E and F, then • E + F is an RE such that L(E + F) = L(E)∪L(F) (union) • EF is an RE such that L(EF) = L(E)L(F) (concatenation) • E* is an RE such that L(E*) = (L(E))*(closure) • (E) is an RE such that L((E)) = L(E) (parenthesization).

3.1 Regular Expressions • Examples (supplemental)(1/4) • RE F = 1 “expresses” the language L(1) = {1}. • RE E = 1* • Language expressed by E --- L = L(E) = L(1*) = (L(1))* = ({1})* (closure of language) = {e, 1, 11, 111, 1111, …} = {1n | n 0}

3.1 Regular Expressions • Examples (supplemental)(2/4) • RE G = 01* • Language expressed by G --- L = L(G) = L(01*) = L(0)L(1*) (concatenation) = {0}{e, 1, 11, 111, 1111, …} = {0, 01, 011, 0111, …} = {01n | n 0}

3.1 Regular Expressions • Examples (supplemental)(3/4) • RE H = 1 + 01* • Language expressed by H --- L = L(H) = L(1 + 01*) = L(1) U L(01*) = {1} U {0, 01, 011, 0111, …} = {1, 0, 01, 011, 0111, …} = {1}U{01n | n 0}

3.1 Regular Expressions • Examples (supplemental)(4/4) • RE K = e + a* • Language expressed by K --- L = L(K) = L(e + a*) = L(e) UL(a*) = {e} U {e, a, aa, aaa, …} = {e, a, aa, aaa, …} = L(a*) That is, we have the following RE equalities: e + a* = a* = a* +e

3.1 Regular Expressions • Example 3.2 • An RE defining a language of strings of alternating 0’s and 1’s (including none) is one of the two below: • (01)* + (10)* + 0(10)* + 1(01)* (0…1 1…0 0…0 1…1) • (e + 1)(01)*(e + 0) (Why? See the textbook.)

3.1 Regular Expressions • 3.1.3 Precedence of RE operators • Precedence • Highest --- * (closure) • Next--- . (concatenation) (left to right) • Last--- + (union) (left to right) • Use parentheses anywhere to resolve ambiguity

3.1 Regular Expressions • 3.1.3 Precedence of RE operators • Example 3.3: Three ways to interpret 01* + 1: • (0(1*)) + 1 by precedence above (= 01* + 1) • (01)* + 1 (another meaning) • 0(1* + 1) (a third meaning)

3.2 FA’s & RE’s • Theorems to be proved: • Every language defined by a DFA is also defined by an RE. • Every language defined by an RE is also defined by an e-NFA.

3.2 FA’s & RE’s • Relations of theorems (yellow lines are to be proved): NFA e-NFA RE DFA

3.2 FA’s & RE’s • 3.2.1 From DFA’s to RE’s • Theorem 3.4: If L = L(A) for some DFA A, then there is an RE R such that L = L(R). Proof. • Prove by constructing progressively string sets defined by a certain RE formRij(k)until the entire set of acceptable strings (i.e., language L(A)) is obtained. • Assume the states are {1, 2, ..., n} (1 is the start state).

3.2 FA’s & RE’s • Meaning of Rij(k) --- • RE Rij(k) is used to denote the set of strings w such that • Each w is the label of a path from state i to state j in DFA A; • the path has nointermediate node whose number is larger than k.

3.2 FA’s & RE’s • Meaning of Rij(k) --- • T construct Rij(k), we use induction, starting at k = 0 and stop at k = n (the largest state number). • Then, when k = n, i =1, and j specifies an accepting state, then Rij(k) defines a set of strings accepted by DFA A, with each string forming a path starting from the start state to the accepting state.

3.2 FA’s & RE’s • Meaning of Rij(k) --- Basis: • when k = 0, all state numbers  1, and so there is no intermediate state in path i to j, leading to 2 cases: (1) an arc (a transition) from i to j; (2) a path from i to i itself.

3.2 FA’s & RE’s • Meaning of Rij(k) --- Basis (cont’d): • If ij, only (1) is possible, leading to 3 cases: • no symbol for such a transition Rij(0) = f • one symbol a for the transition Rij(0) = a • multiple symbls a1, a2, ..., am for the transition, Rij(0) = a1 + a2 + ... + am

qi qj a qi qj a1+…+am qi qj 3.2 FA’s & RE’s • Meaning of Rij(k)(supplemental) --- Basis (cont’d) ij: • Rij(0) = f • Rij(0) = a • Rij(0) = a1 + a2 + ... + am

3.2 FA’s & RE’s • Meaning of Rij(k) --- Basis (cont’d): • If i=j, only (2) is possible, which means there exists at least a path e from i to i itself, in addition to the 3 cases: • no symbol for such a transition Rij(0)= e • one symbol a for the transition Rij(0)= e + a • multiple symbls a1, a2, ..., am for the transition, Rij(0) = e + a1 + a2 + ... + am

qi qi e+a1+…+am 3.2 FA’s & RE’s • Meaning of Rij(k)(supplemental) --- Basis (cont’d) i=j: • Rij(0) = e • Rij(0) = e + a • Rij(0) = e + a1 + a2 + ... + am e e + a qi

3.2 FA’s & RE’s Induction (to compute Rij(k)): • Suppose there is a path from i to j that goes through no state numbered higher than k. Then, two cases should be considered: • (1) the path does not go through k  Rij(k-1) • (2) the path goes through k at least once, then the path may be broken into 3 pieces: • through i to k without passing k Rik(k-1) • from k to k itself  (Rkk(k-1))* (recusive); • from k to j without passing k  Rkj(k-1).

(Rkk(k-1))* … circulating zero or more times … … i j k Rkj(k-1) Rik(k-1) 3.2 FA’s & RE’s • Illustration of paths represented by Rij(k):

3.2 FA’s & RE’s Induction (cont’d): • The three pieces are concatenated to be Rik(k-1)(Rkk(k-1))*Rkj(k-1). • Combining (1) & (2), we get the RE defining “all the paths from i to j that go through no state higher than k” as Rij(k) = Rij(k-1) + Rik(k-1)(Rkk(k-1))*Rkj(k-1).

3.2 FA’s & RE’s Induction (cont’d): • ConstructingRik(k)in the order of k until k = n for i = 1 • For each accepting state jk, we can get the union below as the result R1j1(n) +R1j2(n)+ ... +R1jm(n) where {j1, j2, ..., jm} are the set of final states, F. (End of proof of thereom)

0, 1 1 0 start 1 2 3.2 FA’s & RE’s • Example 3.5 • Convert the following DFA into an RE. • Rij(0) may be constructed to be (details in the next page):

0, 1 1 0 start 1 2 3.2 FA’s & RE’s • Example 3.5 (cont’d) • R11(0) = e + 1 because d(1, 1) = 1 & going back to itself • R12(0) = 0 because d(1, 0) = 2 (going out to state 2) • R21(0) = f because there is no path from state 2 to 1 • R22(0) = (e + 0 + 1) because d(2, 0) = 2 & d(2, 1) = 2 & going back to itself

0, 1 1 0 start 1 2 3.2 FA’s & RE’s • Example 3.5 (cont’d) • We can then compute all Rij(k) for k=1 & k=2. • However, we may alternatively compute onlynecessary terms of Rij(k) backward from the final states, to save time.

3.2 FA’s & RE’s • Example 3.5 (cont’d) • There is only one final state 2, so only have to compute R12(2) = R12(1) + R12(1)(R22(1))*R22(1). • Only have to compute R12(1) and R22(1), without computing R21(1) and R11(1). • To compute each of these terms, we need some RE equalities to simplify intermediate results.

3.2 FA’s & RE’s • Some equalities (R is an RE): • fR=Rf=f (f=annihilator for concatenation) • f + R = R + f = R (f=identity for union) • eR = Re = R (e = identity for concatenation) • (e + a)* = a* =(a + e)* • (e + a)a* = (ea* + aa*) = a* + a+ = a* a*(e + a) = (a*e + a*a) = a* + a+ = a* (all provable by easy deduction)

3.2 FA’s & RE’s • To compute R12(2) = R12(1) + R12(1)(R22(1))*R22(1) • R12(1) = R12(0) + R11(0)(R11(0))*R12(0) = 0 + (e + 1)(e + 1)*0 (by substitutions) = 0 + (e + 1)1*0 (by 4. in last page) = 0 + 1*0 (by 5.) = (e + 1*)0 (by distributive law) = 1*0 (by 4.)

3.2 FA’s & RE’s • To compute R12(2) = R12(1) + R12(1)(R22(1))*R22(1) • R22(1) = R22(0) + R21(0)(R11(0))*R12(0) = (e+ 0 + 1) + f(e + 1)*0 (by substitutions) = (e+ 0 + 1) + f (by 1.) = e+ 0 + 1 (by 2.)

3.2 FA’s & RE’s • To compute R12(2) = R12(1) + R12(1)(R22(1))*R22(1) • Finally, R12(2) = 1*0 +1*0(e + 0 + 1)*(e + 0 + 1) (by subst.) = 1*0 +1*0(0 + 1)*(e + 0 + 1) (by 4.) = 1*0 +1*0(0 + 1)* (by 6.) =1*0(e + (0 + 1)*) (by distributive law) = 1*0(0 + 1)*(by 4.)

0, 1 1 0 start 1 2 3.2 FA’s & RE’s • Check the correctness of the final result R12(2)= 1*0(0 + 1)* correct (by looking at the diagram directly)! • The above method also works for NFA ande-NFA.

R11 R11+ Q1S*P1 q2 q1 q2 q1 S . . . Q1 P1 . . . . . . . . . s Fig. 3.8 (partial) Fig. 3.7 (partial) 3.2 FA’s & RE’s • 3.2.2 Converting DFA’s to RE’s by Eliminating States --- another way • Step 1 – regard symbols on arcs as RE’s • Step 2 – conduct the following conversion • Step 3 – collect RE’s for all the final states (for a complete diagram of this, see textbook)

U R S Fig. 3.9 q q0 start T 3.2 FA’s & RE’s • Details of Step 3: (1) For each final state q, eliminate all states as above except the start state q0. (2) If qq0, then a 2-state automaton is left as follows: Corresponding RE is (R+SU*T)*SU* (provable by the first method)

R q0 Fig. 3.10 start 3.2 FA’s & RE’s (3) If q = q0, then perform one more state elimination to eliminate q, leaving only the start state q0 as follows (see an example in the next page): The corresponding RE is R*. (4)Collect the result for each final state derived as above to get the final result.

V X Y q q0 Z R11+ Q1S*P1 =X+YV*Z R11=X q0 q0 q0 q0 S= V . . . . . . . . . Q1=Y P1=Z . . . start s Fig. 3.7 (partial) Fig. 3.8 (partial) 3.2 FA’s & RE’s • An example of Case (3) in the last page (supplemental) • Regard q0 as two separate states, q as s, and apply Figs. 3.7 & 3.8 to eliminate q1 as follows:

R=X + YV*Z start q0 3.2 FA’s & RE’s • An example of Case (3) in the last page (supplemental) (cont’d) • Use the result R11 + Q1S*P1 = X + YV*Z as R in Fig. 3.10 like the following: • And the final result is R* = (X + YV*Z)*. • This will be used in your homework.

0, 1 1 0 start 1 U R 2 S q2 q1 start T 3.2 FA’s & RE’s • Example 3.5 revisited • Use the derivation for 2-state automaton described previously directly to be (R+SU*T)*SU* = (1 + 0(0 + 1)*f)*0(0 + 1)* = 1* 0(0 + 1)*correct!

0, 1 1 0, 1 0, 1 start A 0 + 1 B C D C B D 1 0 + 1 0 + 1 start A 3.2 FA’s & RE’s • Example 3.6 • Step 1: regard all symbols on the arcs as RE’s, we get

0 + 1 0 + 1 C B D R11 R11+ Q1S *P1 q1 q2 q2 q1 S . . . . . . . . . Q1 P1 . . . s 3.2 FA’s & RE’s • Example 3.6 • Step 2: to remove B, use the following conversion we get s = B, q1 = A, q2 = C, S = f, Q1 = 1, P1 = 0 + 1, R11 = f, so R11 + Q1S*P1 = f + 1f*(0 + 1) = 1e(0 + 1) = 1(0 + 1) 0 + 1 1 start A

0 + 1 0 + 1 1(0 + 1) start A D C R11 R11+ Q1S *P1 q1 q2 q2 q1 S . . . . . . . . . Q1 P1 . . . s 3.2 FA’s & RE’s • Example 3.6 (cont’d) • For final state D, we have to remove C further, resulting in s = C, q1 = A, q2 = D, S = f, Q1 =1(0 + 1), P1 =0 + 1, R11= f, so R11 + Q1S*P1 = f + 1(0 + 1)f*(0 + 1) = 1(0 + 1)(0 + 1)

0 + 1 1(0 + 1)(0 + 1) start A U R D S q2 q1 start T 3.2 FA’s & RE’s • Example 3.6 (cont’d) • By the following conversion, we get • R = (0 + 1), q1 =A, q2 =D, S = 1(0 + 1)(0 + 1),T = f, U= f so (R+SU*T)*SU*= (0+1+1(0 + 1)f*f)*(1(0 + 1)(0 + 1)) f* = (0 + 1)*1(0 + 1)(0 + 1)

0 + 1 0 + 1 1(0 + 1) start A C D R11 R11+ Q1S *P1 q1 q2 q2 q1 S . . . . . . . . . Q1 P1 . . . s 3.2 FA’s & RE’s • Example 3.6 (cont’d) • For the other final state C, starting from the following diagram We have to eliminate D by the following diagram

0 + 1 U R C 1(0 + 1) start A S q2 q1 start T 3.2 FA’s & RE’s • Example 3.6 (cont’d) • Since D has no successor (and C before it is a final state), deleting D has no effect to the other parts, resulting in the following diagram. And by the following conversion, we get (R+SU*T)*SU*= (0 + 1 + 1(0 + 1)f*f)*(1(0 + 1)) f* = (0 + 1)*1(0 + 1)

Chapter 3 Regular Expressions and Languages

Chapter 3 Regular Expressions and Languages

Presentation Transcript

Chapter 4: Regular Expressions

Languages, grammars, and regular expressions

CHAPTER 1 Regular Languages

Languages, Grammars, and Regular Expressions

Regular Expressions and Non-regular Languages

Regular Languages Regular Expressions Finite-State Automata

Lesson 3 – Regular Expressions

Regular expressions Regular languages

Regular Languages and Regular Expressions

Regular Languages and Expressions

Chapter 3 Properties of Finite Automata and Regular Expressions

Regular Expressions

Chapter Seven: Regular Expressions

Regular Expressions

Chapter 2. Regular Expressions and Automata 2.1 Regular Expressions

Chapter 4: Regular Expressions

Regular Expressions and Regular Languages

Chapter 3 Regular Languages and Regular Grammars

Regular Expressions and Non-regular Languages

Chapter 3 Regular languages and expressions

3. Regular Expressions and Languages

Regular expressions