Example. Simplify a regular expression:

Distinct regular expressions may represent the same language: • a+b and b+a represent the same language {a, b}. • Two expressions R and S that represent the same language, • L(R) = L(S), are considered equal. For example, a + a*= a* • because L(a + a*) = L(a*) = {, a, aa, aaa, …} Example. Simplify a regular expression: •  +ab +abab(ab)* = (ab)* L ={, ab, abab, ababab, …} • aa (b*+a)+a(ab*+aa) = aa (b*+a) • a(a+b)*+aa(a+b)*+aaa(a+b)* = a(a+b)* To prove it we can show that aa(a+b)*  a (a+b)(a+b)* a(a+b)* and aaa(a+b)*  a(a+b)*. Then use A B  AB =B

Properties of Regular Expressions 1) + properties R+T=T+R R+=+R=R R+R=R (R+S)+T=R+(S+T)  L(R)  L(T) = L(R) L(T)  L(R)  =L(R)  L(R)L(R) = L(R)  (L(R)L(S))L(T) = L(R)(L(S)L(T)) 2) ‘’ properties of regular expressions R=R= R=R=R (RS)T=R(ST) 3) distributive properties of regular expressions R(S+T)=RS+RT (S+T)R=SR +ST

4) closure properties *=*= R*=R*R*=(R*)*=R+R* R*=+ R*=(+ R*)*=(+R) R*=+R R* R*=(R+…+ Rk)* for any k1 R*=+ R+ R2+…+ Rk1+Rk R* for any k1 R*R=R R* (R+S)*=(R*+S*)*=(R*S*)*=(R*S)*R*=R*(SR*)* R(SR)*=(RS)*R (R*S)*=+(R+S)*S (RS*)*=+R(R+S)*

Each of this properties can be proved. • For example, let's prove that R* = R*R*. Proof. We need to prove two inclusion properties, i) R* R* R* and ii) R*  R* R*. i) To prove R*R* R* it is sufficient to note that for any expression S (understand: for any set of strings, described by expression S) S  S R* because  R*. ii) To prove R* R* R* let's take arbitrary string w R*R* ……..(1) to prove that wR*. (1)  w=uv, where uR* and vR*……………….. (2) (2)  uRn and vRm for some integer n and m. Then w=uvRn+m  R*. So, we proved both subset relations, i.e. two regular expressions are equal, R*=R*R*.

Let's prove one more property of regular expressions, (R+S)*=(R*S)*R*. Proof. We understand the equality of two expressions as the equality of two sets of strings denoted by these expressions. So, we are going to prove two subset relations: i) (R+S)* (R*S)*R* and ii) (R*S)*R* (R+S)*. i). Take arbitrary string w(R+S)*  w(R+S)n for some integer n0.  w=u1u2…un, where uiR+S for i=1, 2,…n. uiR+S uiR or uiS. Denote any substring uiR as ui=r and a substring uiS as ui=s. Then string w is a sequence of substrings r and s, like w=rrrssrsssrsrr = rrrssrsssrsrr= v1v2v3v4v5v6 t, , where vjR*S, w=v1v2…vkt, where tR So, any string w(R*S)*R*.

R* S S R In the same way we may prove that ii) (R*S)*R* (R+S)*. (left as an exercise). Example. Using properties of regular expressions prove the equality ba*(baa*)* = b( a + ba )*. We can prove the equality by using the property (R+S)* = R*(SR*)* Take R = a, S =ba, then (a+ba)*= (R+S)*= R*(SR*)* = a*(baa*)*

We can also establish simple rules, that can be used in proofs by ‘double inclusion’. Let A, B and C be sets of strings, then 1) A  B  AC BC 2) A  B  A* B* 3)   B  A  AB and A  BA Then we can prove a(a+b)*+aa(a+b)*+aaa(a+b)*= a(a+b)* by proving aa(a+b)* a(a+b)* and aaa(a+b)* a(a+b)* a  a+b  (a+b)* a(a+b)* (a+b)(a+b)*  (a+b)*(a+b)* = (a+b)* aa(a+b)* a(a+b)* aaa(a+b)* aa(a+b)*  a(a+b)*

Example. Prove that ( + a+b*a)*b* = (a + b)*. We can show two subset relations: i) ( + a+b*a)*b*  (a + b)* and ii) (a + b)*  ( + a+b*a)*b* i) ( + a+b*a)*b*  (a + b)* b a+b b*(a+b)*   (a+b)* (+ a+ b*a) (a+b)* a a+b(a+b)* b*a (a+b)* (a+b)* = (a+b)* (+ a+ b*a)*(a+b)**= (a+b)* Finally, (+ a+ b*a)* b* (a+b)* (a+b)*= (a + b)*

by the rule (R+S)*=(R*S)*R* (a + b)*= (b*a)* b* ii) (a + b)*  ( + a+b*a)*b* by b*a  ( +a+ b*a)  ( + a+b*a)*b*

DFA w* Accept (wL) recognizer for L * Reject (wL) Deterministic Finite Automata (DFA) DFA is a recognizer for regular languages. They model the behavior of real computing devices which are designed to distinguish a correct input over a given alphabet. This abstract machine (DFA) is a device that reads an input string, one symbol at a time and decides whether the string belongs to the language or not (accept or reject).

DFA includes: • alphabet • finite nonempty set of “states” • transition function defined for each state and on each symbol • start states • accepting states The DFA can be depicted as a directed graph, where vertices represent states and each edge is labeled by the input symbol and dictates how the machine changes its state on reading this symbol.

b q1 q0 w a a q3 a, b b q2 “sink state” a, b Example. Construct a DFA to recognize the regular language over alphabet {a, b} described by regular expression L(ab*a). So, we need to find a DFA that is able to distinguish between strings, that belong to L(ab*a) and strings that do not. Transition function (q0, a)=q1, (q0, b)=q2 (q1, a)=q3, (q1, b)=q1 (q2, a)=q2, (q2, b)=q2 (q3, a)=q2, (q3, b)=q2

b q1 q0 w a a q3 a, b b q2 “sink state” a, b DFA  L (ab*a) DFA consists of: • Alphabet ={a, b} • Set of states: Q={q0, q1, q2, q3} including q0 - start state q3 - accepting state • Transition function (qi, ak) • that assigns the nest state on • reading any ak  for each qi Q

b q1 q0 w a a q3 a, b b b b a a a q2 q0 q1 q1 q1 q3 q2 abbaa abbaa abbaa abbaa abbaa “sink state” a, b abbaa (q0, a)=q1 (q1, b)=q1 (q3, a)=q2 (q1, b)=q1 (q1, a)=q3 Assume w = abbaa enters the DFA. By reading an input DFA goes through sequence of configurations:

b b a a a q2 q0 q1 q1 q1 q3 abbaa abbaa abbaa abbaa abbaa abbaa The configuration is a pair of a state and remaining input, (qi, w): (q0, abbaa)  (q1, bbaa)  (q1, baa)  (q1, aa)  (q3, a)  (q2, ) A string is accepted by a DFA if and only if on the reading this string the DFA comes to the configuration (qa, ), where qa is an accepting state.

a b q0 q1 q2 q1 q3 q1 q2 q2 q2 q3 q2 q2 b q1 q0 w a a q3 a, b b q2 “sink state” a, b The string is accepted (recognized to be in the language) if DFA comes to accepting state after reading the input string Instead of using transition function (qi, ak) we can give the equivalent transition table.

Inductive proofs on strings. Usually induction is done on the length of a string |w| =n, or the number of repetition of some pattern. Prove that the regular expression R =(ab+b)*(+a) describes the language L {a, b}* , consisting of all strings that do not contain aa. Proof. To prove the equality of two sets of strings, L and R, we can prove two subset relations, RL and LR i) R L , we need to prove that for any string w [wR  wL] Assume wR =(ab+b)*(+a)  w (ab+b)n (+a), for some n0 Prove by induction on n0 , that for any w (ab+b)n (+a)  w  L.

Prove by induction on n0 , that for any w(ab+b)n(+a)  w  L. Basis.n=0, w(+a), we have either w =  or w = a. In both cases w  L, because it does not contain aa. IH. Assume that for n=k, k 0, any string from the set s(ab+b)k(+a) belongs to L. IS. We need to prove that any string w(ab+b)k+1(+a) belongs to L. w(ab+b)k+1(+a) w(ab+b)s , where s(ab+b)k(+a), either w=abs or w=bs , in both cases w does not contain aa since s does not contain aa by IH.

ii) Take any w  L and prove that wR =(ab+b)*(+a). Let’s prove it by induction on the length |w|=n 0 Basis. n=0, w= ,  R =(ab+b)*(+a). IH. Assume that for n=k, k 0, we have that any string v L with length |v| k belongs to R. IS. We need to prove that any string from L with length k+1 belongs to R. Take w L, |w|=k+1. We can consider two cases: 1) w=as or 2) w=bs. In the first case w L  s=bu, where u L, and by IH u R, since |u|= k1<k , i. e. u (ab+b)*(+a). Then w = abu ab(ab+b)*(+a)  (ab+b)*(+a).

In the second case, w=bs, where s L and |s|=k, so sR =(ab+b)*(+a) by IH. Then w b(ab+b)*(+a)  (ab+b)*(+a).

Example. Simplify a regular expression:

Example. Simplify a regular expression:

Presentation Transcript

Simplify each expression.

Matlab Regular Expression

Regular Expression 1. What is regular expression?

Regular Expression

Regular Expression

^Regular Expression$

Regular Expression - Intro

Regular Expression

Simplify the expression.

Regular Expression

Regular Expression

Regular Expression

Regular Expression

Regular Expression

Regular Expression (1)

Regular Expression Support