200 likes | 292 Views
Distinct regular expressions may represent the same language: a + b and b + a represent the same language { a , b }. Two expressions R and S that represent the same language, L ( R ) = L ( S ), are considered equal . For example, a + a * = a *
E N D
Distinct regular expressions may represent the same language: • a+b and b+a represent the same language {a, b}. • Two expressions R and S that represent the same language, • L(R) = L(S), are considered equal. For example, a + a*= a* • because L(a + a*) = L(a*) = {, a, aa, aaa, …} Example. Simplify a regular expression: • +ab +abab(ab)* = (ab)* L ={, ab, abab, ababab, …} • aa (b*+a)+a(ab*+aa) = aa (b*+a) • a(a+b)*+aa(a+b)*+aaa(a+b)* = a(a+b)* To prove it we can show that aa(a+b)* a (a+b)(a+b)* a(a+b)* and aaa(a+b)* a(a+b)*. Then use A B AB =B
Properties of Regular Expressions 1) + properties R+T=T+R R+=+R=R R+R=R (R+S)+T=R+(S+T) L(R) L(T) = L(R) L(T) L(R) =L(R) L(R)L(R) = L(R) (L(R)L(S))L(T) = L(R)(L(S)L(T)) 2) ‘’ properties of regular expressions R=R= R=R=R (RS)T=R(ST) 3) distributive properties of regular expressions R(S+T)=RS+RT (S+T)R=SR +ST
4) closure properties *=*= R*=R*R*=(R*)*=R+R* R*=+ R*=(+ R*)*=(+R) R*=+R R* R*=(R+…+ Rk)* for any k1 R*=+ R+ R2+…+ Rk1+Rk R* for any k1 R*R=R R* (R+S)*=(R*+S*)*=(R*S*)*=(R*S)*R*=R*(SR*)* R(SR)*=(RS)*R (R*S)*=+(R+S)*S (RS*)*=+R(R+S)*
Each of this properties can be proved. • For example, let's prove that R* = R*R*. Proof. We need to prove two inclusion properties, i) R* R* R* and ii) R* R* R*. i) To prove R*R* R* it is sufficient to note that for any expression S (understand: for any set of strings, described by expression S) S S R* because R*. ii) To prove R* R* R* let's take arbitrary string w R*R* ……..(1) to prove that wR*. (1) w=uv, where uR* and vR*……………….. (2) (2) uRn and vRm for some integer n and m. Then w=uvRn+m R*. So, we proved both subset relations, i.e. two regular expressions are equal, R*=R*R*.
Let's prove one more property of regular expressions, (R+S)*=(R*S)*R*. Proof. We understand the equality of two expressions as the equality of two sets of strings denoted by these expressions. So, we are going to prove two subset relations: i) (R+S)* (R*S)*R* and ii) (R*S)*R* (R+S)*. i). Take arbitrary string w(R+S)* w(R+S)n for some integer n0. w=u1u2…un, where uiR+S for i=1, 2,…n. uiR+S uiR or uiS. Denote any substring uiR as ui=r and a substring uiS as ui=s. Then string w is a sequence of substrings r and s, like w=rrrssrsssrsrr = rrrssrsssrsrr= v1v2v3v4v5v6 t, , where vjR*S, w=v1v2…vkt, where tR So, any string w(R*S)*R*.
R* S S R In the same way we may prove that ii) (R*S)*R* (R+S)*. (left as an exercise). Example. Using properties of regular expressions prove the equality ba*(baa*)* = b( a + ba )*. We can prove the equality by using the property (R+S)* = R*(SR*)* Take R = a, S =ba, then (a+ba)*= (R+S)*= R*(SR*)* = a*(baa*)*
We can also establish simple rules, that can be used in proofs by ‘double inclusion’. Let A, B and C be sets of strings, then 1) A B AC BC 2) A B A* B* 3) B A AB and A BA Then we can prove a(a+b)*+aa(a+b)*+aaa(a+b)*= a(a+b)* by proving aa(a+b)* a(a+b)* and aaa(a+b)* a(a+b)* a a+b (a+b)* a(a+b)* (a+b)(a+b)* (a+b)*(a+b)* = (a+b)* aa(a+b)* a(a+b)* aaa(a+b)* aa(a+b)* a(a+b)*
Example. Prove that ( + a+b*a)*b* = (a + b)*. We can show two subset relations: i) ( + a+b*a)*b* (a + b)* and ii) (a + b)* ( + a+b*a)*b* i) ( + a+b*a)*b* (a + b)* b a+b b*(a+b)* (a+b)* (+ a+ b*a) (a+b)* a a+b(a+b)* b*a (a+b)* (a+b)* = (a+b)* (+ a+ b*a)*(a+b)**= (a+b)* Finally, (+ a+ b*a)* b* (a+b)* (a+b)*= (a + b)*
by the rule (R+S)*=(R*S)*R* (a + b)*= (b*a)* b* ii) (a + b)* ( + a+b*a)*b* by b*a ( +a+ b*a) ( + a+b*a)*b*
DFA w* Accept (wL) recognizer for L * Reject (wL) Deterministic Finite Automata (DFA) DFA is a recognizer for regular languages. They model the behavior of real computing devices which are designed to distinguish a correct input over a given alphabet. This abstract machine (DFA) is a device that reads an input string, one symbol at a time and decides whether the string belongs to the language or not (accept or reject).
DFA includes: • alphabet • finite nonempty set of “states” • transition function defined for each state and on each symbol • start states • accepting states The DFA can be depicted as a directed graph, where vertices represent states and each edge is labeled by the input symbol and dictates how the machine changes its state on reading this symbol.
b q1 q0 w a a q3 a, b b q2 “sink state” a, b Example. Construct a DFA to recognize the regular language over alphabet {a, b} described by regular expression L(ab*a). So, we need to find a DFA that is able to distinguish between strings, that belong to L(ab*a) and strings that do not. Transition function (q0, a)=q1, (q0, b)=q2 (q1, a)=q3, (q1, b)=q1 (q2, a)=q2, (q2, b)=q2 (q3, a)=q2, (q3, b)=q2
b q1 q0 w a a q3 a, b b q2 “sink state” a, b DFA L (ab*a) DFA consists of: • Alphabet ={a, b} • Set of states: Q={q0, q1, q2, q3} including q0 - start state q3 - accepting state • Transition function (qi, ak) • that assigns the nest state on • reading any ak for each qi Q
b q1 q0 w a a q3 a, b b b b a a a q2 q0 q1 q1 q1 q3 q2 abbaa abbaa abbaa abbaa abbaa “sink state” a, b abbaa (q0, a)=q1 (q1, b)=q1 (q3, a)=q2 (q1, b)=q1 (q1, a)=q3 Assume w = abbaa enters the DFA. By reading an input DFA goes through sequence of configurations:
b b a a a q2 q0 q1 q1 q1 q3 abbaa abbaa abbaa abbaa abbaa abbaa The configuration is a pair of a state and remaining input, (qi, w): (q0, abbaa) (q1, bbaa) (q1, baa) (q1, aa) (q3, a) (q2, ) A string is accepted by a DFA if and only if on the reading this string the DFA comes to the configuration (qa, ), where qa is an accepting state.
a b q0 q1 q2 q1 q3 q1 q2 q2 q2 q3 q2 q2 b q1 q0 w a a q3 a, b b q2 “sink state” a, b The string is accepted (recognized to be in the language) if DFA comes to accepting state after reading the input string Instead of using transition function (qi, ak) we can give the equivalent transition table.
Inductive proofs on strings. Usually induction is done on the length of a string |w| =n, or the number of repetition of some pattern. Prove that the regular expression R =(ab+b)*(+a) describes the language L {a, b}* , consisting of all strings that do not contain aa. Proof. To prove the equality of two sets of strings, L and R, we can prove two subset relations, RL and LR i) R L , we need to prove that for any string w [wR wL] Assume wR =(ab+b)*(+a) w (ab+b)n (+a), for some n0 Prove by induction on n0 , that for any w (ab+b)n (+a) w L.
Prove by induction on n0 , that for any w(ab+b)n(+a) w L. Basis.n=0, w(+a), we have either w = or w = a. In both cases w L, because it does not contain aa. IH. Assume that for n=k, k 0, any string from the set s(ab+b)k(+a) belongs to L. IS. We need to prove that any string w(ab+b)k+1(+a) belongs to L. w(ab+b)k+1(+a) w(ab+b)s , where s(ab+b)k(+a), either w=abs or w=bs , in both cases w does not contain aa since s does not contain aa by IH.
ii) Take any w L and prove that wR =(ab+b)*(+a). Let’s prove it by induction on the length |w|=n 0 Basis. n=0, w= , R =(ab+b)*(+a). IH. Assume that for n=k, k 0, we have that any string v L with length |v| k belongs to R. IS. We need to prove that any string from L with length k+1 belongs to R. Take w L, |w|=k+1. We can consider two cases: 1) w=as or 2) w=bs. In the first case w L s=bu, where u L, and by IH u R, since |u|= k1<k , i. e. u (ab+b)*(+a). Then w = abu ab(ab+b)*(+a) (ab+b)*(+a).
In the second case, w=bs, where s L and |s|=k, so sR =(ab+b)*(+a) by IH. Then w b(ab+b)*(+a) (ab+b)*(+a).