Regular Grammars

Regular Grammars • Formal definition of a regular expression. • Languages associated with regular expressions. • Introduction regular grammars. • Regular language and homomorphism. • The Chomsky Hierarchy

Regular Expression • The regular expressions over a set I are defined recursively by: • the symbol ∅ is a regular expression; • the symbol λ is a regular expression; • the symbol x is a regular expression whenever x ∈ I ; • the symbols (AB), (A ∪ B), and A*are regular expressions whenever A and B are regular expressions. ∅ represents the empty set, that is, the set with no strings; λ represents empty string; x represents the set {x} containing the string with one symbol x; (AB) represents the concatenation of the sets represented by A and by B; (A ∪ B) represents the union of the sets represented by A and by B; A*represents the Kleene closure of the set represented by A.

Example What are the strings in the regular sets specified by the regular expressions 10*, (10)*, 0 ∪ 01, 0(0 ∪ 1)*, and (0*1)*?

Example Find a regular expression that specifies each of these sets: (a) the set of bit strings with even length (b) the set of bit strings ending with a 0 and not containing 11 The set of strings of two bits is specified by the regular expression (00 ∪ 01 ∪ 10 ∪ 11). Consequently, the set of strings with even length is specified by (00 ∪ 01 ∪ 10 ∪ 11)∗ . It must be the concatenation of one or more strings where each string is either a 0 or a 10. It follows that the regular expression (0 ∪ 10)∗(0 ∪ 10) specifies the set of bit strings that do not contain 11 and end with a 0.

symbol ∅; • symbol λ; • symbol a whenever a ∈ I ;

Construct a nondeterministic finite-state automaton that recognizes the regular set 1∗ ∪ 01.

Languages associated with regular expression • Definition: The Language L(r) denoted by any regular expression r is defined by the following rules. • ∅ is a regular expression denoting the empty set, • λ is a regular expression denoting {λ }, • For every aϵ∑, a is a regular expression denoting {a} • If r1 and r2 are regular expressions, then • L(r1+ r2) = L(r1) U L(r2), • L(r1.r2) = L(r1)L(r2), • L((r1)) = L(r1), • L(r1*) = (L(r1))*

Example: Exhibit the language L(a*.(a + b)) in set notation. Solution: L(a*.(a + b)) = L(a*)L(a + b) (from L(r1.r2) = L(r1)L(r2)) = (L(a))*(L(a)U(L(b)) (from L(r1*)) = (L(r1))*) = (L(a))*(L(a)U(L(b)) (from L(r1+r2)=L(r1) U L(r2)) But (L(a))*={ , a, aa, aaa, …..} L(a) ={a} and L(b) ={b} L(a) U L(b) ={a,b} L(a*.(a + b)) = { , a, aa, aaa, …..}{a,b} = {a, b, aa, ab, aaa, aab,……}.

Example: For ∑= {a, b} , the expression r= (a + b) * (a + bb) is a regular expression. Write its language. Solution: (we can prove easily r is regular expression) r= (a + b) * (a + bb) L(r) = L((a + b) * (a + bb)) = L((a + b) *) L((a+bb)) = (L(a+b))* (L(a) U L(bb)) = (L(a) U L(b))* (L(a) U L(bb)) =((L(a))* U (L(b))*) (L(a) U L(bb)) But (L(a))*={a}*= { , a, aa, aaa, …..} (L(b))*={b}*= { , b, bb, bbb, …..} L(a) U L(bb) ={a, bb} So, L((a+b)*(a + bb))={ , a, aa, aaa….., b, bb, bbb,……}{a, bb} = {a, bb, aa, abb, …… ba, bbb, ……….}, In other words L(r) is the set of all strings on {a, b}, terminated by either a or bb.

Example: write the language for the following expression; r= (aa)*(bb)*b Solution: L(r) = L((aa)*(bb)*b) = L((aa)*) L((bb)*) L(b) = (L(aa))* (L(bb))* L(b) = {aa}*{bb}*{b} = { , aa, aaaa, aaaaaa, ..} { , bb, bbbb, bbbbbb, ...} {b} = {a2n: n ≥ 0} {b2m: m ≥ 0} {b} = {a2nb2m+1; n ≥ 0, m ≥ 0}

Regular Grammars Regular Grammars are two types as follows: 1) Right-Linear Grammar: A grammar G = (V, T, S, P) is said to be right-linear if all productions are of the form; A  xB, A  x, Where A, B ϵ V, and xϵT * 2) Left-Linear Grammar: A grammar G = (V, T, S, P) is said to be Left-linear if all productions are of the form; A  Bx, A  x, Where A, B ϵ V, and xϵT * V: finite set of non-terminals (upper case) T: finite set of terminals (lower case) S: Start symbol P: finite set of rewriting rules of the form A-> xB or A-> x, where A and B stand for non-terminals and x stands for a terminal

Example : The grammar G1= ({S}, {a, b}, S, P1), with P1 given as S abS|a, It is right-linear. 2) The Grammar G2 =({S,S1,S2}, {a, b}, S, P2) with productions S S1ab, S1S1ab|S2, S2  a, It is left-linear. Both G1 and G2 are regular grammars. Example: Write the regular expression generated by these; 1) S abSababSababa r= (ab)*a 2) SS1ab  S1abab  S2abab aabab r= a(ab)* Example: The grammar G= ({S, A, B},{a, b}, S, P), with production SA, AaB|λ,BAb. Is it a regular language? Solution: It is not a regular language because it is neither right-liner not left-linear.

Homomorphism: Suppose ∑ and T are alphabets. Then a function f : ∑  T* is called a homomorphism. In words, a homomorphism is a substitution in which a single letter is replaced with a string. The domain of the function h is extended to strings in an obvious fashion if w= a1a2a3…an. Then h(w)=h(a1)h(a2)h(a3)……h(an). Remark: if L is a language on ∑, then its homomorphism image is defined as h(L) = {h(w): wϵL}.

Example: let ∑ = {a, b} and T= {a, b, c} and define h by h(a)= ab, h(b) = bbc. Find the homomorphic image of L={aa,aba}, h(L). • Solution: • h(aa) = abab, • h(aba) = abbbcab, • The homomorphic image of L={aa,aba} is the language • h(L) = {abab, abbbcab} • Example: let ∑ = {a, b} and T= {b, c, d} and define h by • h(a)= dbcc, h(b) = bdc. If L is the regular language denoted by r = (a + b*)(aa)*. Find the regular language h(L). • Solution: • Since r = (a + b*)(aa)*. • Then r’ = (dbcc+ (bdc)* (dbccdbcc)*denotes the regular language h(L).

The Chomsky Hierarchy The Chomsky Hierarchy: Noam Chomsky, a founder of formal language theory, provided an initial classification into four language types, type 0, 1, 2, and 3, described as; Type 0 : Type 0 languages are those generated by unrestricted grammars, that is, the recursively enumerable languages. It is denoted as LRE. Type 1 : Type 1 consists of the context-sensitive languages. It is denoted as LCS. Type 2 : Type 2 consists of the context-free languages. It is denoted as LCF. Type 3 : Type 3 consists of the regular languages. It is denoted as LREG.

The relationship between these types is shown in the diagram. It is clear that LREG ⊆ LCF ⊆ LCS⊆ LRE.

Home Work • Q1: Find all strings in L((a+ b)*b(a + ab)*) of length less than four. • Q2: if r= ((0+1)(0+1)*)*00(0+1)*,Give the language L(r). • Q3:Give regular expressions for the following languages on {a,b,c}. • a) All strings containing exactly one a. • b) All strings containing no more than three a’s • c) All strings that contain at least one occurrence of each symbol in a given set. • Q4: Find a regular grammars that generates the language L(aa*(ab+a)*) and L((aab)*ab) . • Q5: What are the strings generated by the regular expressions 10*, (10)*, (0 + 01), 0(0+1)*, and (0*1)* . • Q6: Solve questions 3, 4, 5, and 6 at page DMA-826.

Regular Grammars