Regular Expressions

CSC 4170 Theory of Computation Regular Expressions Section 1.3 (also 1.1, 1.2)

1.3.a Union:L1  L2 = {x | xL1 or xL2} {Good,Bad}  {Boy,Girl} = {0,00,000,…} {1,11,111,…} = L  = Concatenation:L1 L2 = {xy | xL1 and yL2} {Good,Bad}{Boy,Girl} = {0,00,000,…}{1,11,111,…} = L  = Star: L* = {x1…xk | k0 and each xiL} {Boy,Girl}* = {0,00,000,…}* =  * = Regular operations

1.3.b We say that R is a regular expression (RE) iff R is one of the following: 1. a, whereais a symbol of the alphabet 2.  3.  4. (R1)(R2), where R1and R2 are RE 5. (R1)(R2), where R1 andR2 are RE 6. (R1)*, where R1 is a RE What language is represented by the expression: {a} {}  The union of the languages represented by R1 and R2 The concatenation of the languages represented by R1 and R2 The star of the language represented by R1 Regular expressions • The symbol  is often omitted in RE • Some parentheses can be omitted. • The precedence order for the operators is: • * (highest), (medium),  (lowest) Conventions:

1.3.c Regular languages A language is said to beregulariff it can be represented by a regular expression. Language Expression {11} {Boy, Girl, Good, Bad} {,0,00,000,0000,…} {0,00,000,0000,…} {,01,0101,010101,01010101,…} {x | x = 0k where k is a multiple of 2 or 3} {x | x is divisible by 8} {x | x MOD 4 = 3}

1.3.d Exercising reading regular expressions Expression Language (Good  Bad)(Boy  Girl) (Tom  Bob)_is_(good  bad) {Name_is_adjective| Name is an uppercase letter followed by zero or more lowercase letters, and adjective is a lowercase letter followed by zero or more lowercase letters} 0*10* (0 1)*101(0 1)* ((0 1)(0 1))*

1.3.e Regular languages and DFA-recognizable languages are the same • Theorem 1.54*A language is regular if and only if some NFA (DFA) • recognizes it. • In other words, • a) [The “only if” part] • For every regular expression there is an NFA that recognizes exactly • the language represented by that expression. • b) [The “if” part] • For every NFA there is a regular expression that represents exactly • the language recognized by that NFA.

1.3.f Constructing an NFA from a regular expression: Base cases Case of a, where a is a symbol of the alphabet. Case of  Case of 

1.3.g Constructing an NFA from a regular expression: Case of union Case of (R1)(R2), where R1and R2 are RE First, construct NFAs N1 and N2 from R1 and R2: Then, combine them in the following way: N1 N1 s1 s1 N2 N2 s2 s2

1.3.h Constructing an NFA from a regular expression: Case of concatenation Case of (R1) (R2), where R1and R2 are RE First, construct NFAs N1 and N2 from R1 and R2: N1 N2 s1 s2 Then, combine them in the following way: N1 N2 s1 s2

1.3.i Constructing an NFA from a regular expression: Case of star Case of (R1)*, where R1is a RE First, construct an NFA N1 from R1: N1 s1 Then, extend it in the following way: N1 s1

1.3.j Constructing an NFA from a regular expression: An example #(0 1)* (0 1)*  0 1 0 #  0   #  1 1  # 1 0 1

1.3.k great GNFA mother  father grand   grand  (great)* g r e a t g r e a t g r e a t g r a n d f a t h e r

1.3.l great About -transitions mother  father grand   grand (great)* Adding or removing -transitions does not change the recognized language

1.3.m great The same GNFA simplified mother  father grand  

1.3.n Ripping a state out mother  father (great)* grand 

1.3.o Eliminating parallel transitions mother  father   (great)*grand

1.3.p Again ripping out (  (great)*grand)(mother  father)

1.3.q1 How, exactly, to do ripping out L R T S Assume, we are ripping out the state r from a GNFA that has no parallel transitions. Let L be the label of the loop from r to r (if there is no loop, then L=).

1.3.q2 How, exactly, to do ripping out RL*T L R T S SL*T Assume, we are ripping out the state r from a GNFA that has no parallel transitions. Let L be the label of the loop from r to r (if there is no loop, then L=). 1. For every pair s1,s2 of states such that there is an E1-labeled transition from s1 to r and an E2-labeled transition from r to s2, add an R1L*R2-labeled transition from s1 to s2;

1.3.q3 How, exactly, to do ripping out RL*T SL*T Assume, we are ripping out the state r from a GNFA that has no parallel transitions. Let L be the label of the loop from r to r (if there is no loop, then L=). 1. For every pair s1,s2 of states such that there is an E1-labeled transition from s1 to r and an E2-labeled transition from r to s2, add an R1L*R2-labeled transition from s1 to s2; 2. Delete r together with all its incoming and outgoing transitions.

1.3.r How, exactly, to eliminate parallel transitions Whenever you see parallel transitions labeled with R1 and R2, R1 R2 Replace them by a transition labeled with R1R2. R1R2 Repeat until there are no parallel transitions remaining.

1.3.s From NFA to RE b a b b

1.3.t From NFA to RE: Step 1 Step 1: If there are incoming arrows to the start state, or the start state is an accept state, then add a new start state and connect it with an -arrow to the old start state. b a a  b b

1.3.u From NFA to RE: Step 2 Step 2: If there are more than one, or no, accept states, or there is an accept state that has outgoing arrows, then add a new accept state, make all the old accept states non-accept states and connect each of them with an -arrow to the new accept state. b a  a  b b 

1.3.v From NFA to RE: Step 3 Step 3: Eliminate all parallel transitions. b a  a  b b 

1.3.w1 From NFA to RE: Step 4 Step 4: While there are internal states (states that are neither the start nor the accept state), do the following: Step 4.1: Select an internal state and rip it out; Step 4.2: Eliminate all parallel transitions. aa b a  b ab  b

1.3.w2 From NFA to RE: Step 4 Step 4: While there are internal states (states that are neither the start nor the accept state), do the following: Step 4.1: Select an internal state and rip it out; Step 4.2: Eliminate all parallel transitions. baa a  b ab  b

1.3.w3 From NFA to RE: Step 4 Step 4: While there are internal states (states that are neither the start nor the accept state), do the following: Step 4.1: Select an internal state and rip it out; Step 4.2: Eliminate all parallel transitions. a(baa)* a(baa)*ab  b b(baa)* b(baa)*ab

1.3.w4 From NFA to RE: Step 4 Step 4: While there are internal states (states that are neither the start nor the accept state), do the following: Step 4.1: Select an internal state and rip it out; Step 4.2: Eliminate all parallel transitions. a(baa)* ba(baa)*ab  b(baa)* b(baa)*ab

1.3.w5 From NFA to RE: Step 4 Step 4: While there are internal states (states that are neither the start nor the accept state), do the following: Step 4.1: Select an internal state and rip it out; Step 4.2: Eliminate all parallel transitions. (b a(baa)*ab)(b(baa)*ab)* ( b(baa)*) a(baa)*

1.3.w6 From NFA to RE: Step 4 Step 4: While there are internal states (states that are neither the start nor the accept state), do the following: Step 4.1: Select an internal state and rip it out; Step 4.2: Eliminate all parallel transitions. ((b a(baa)*ab)(b(baa)*ab)* ( b(baa)*)(a(baa)*)

1.3.x From NFA to RE: Step 5 Step 5: Return the label of the only remaining arrow (if there is no arrow, return ). ((b a(baa)*ab)(b(baa)*ab)* ( b(baa)*)(a(baa)*) ((b a(baa)*ab)(b(baa)*ab)* ( b(baa)*)(a(baa)*) Claim: The resulting RE represents exactly the language recognized by the original NFA. This completes the proof of Theorem 1.54.

Regular Expressions