360 likes | 398 Views
This lecture covers the fundamental concepts of strings and languages, including string length, equality, substrings, concatenation, and operations on languages. It also introduces regular languages and regular expressions.
E N D
String • string is a finite sequence of symbols. For example, string ( s, t, r, i, n, g) CS4384 ( C, S, 4, 3, 8) 101001 (1, 0) • Symbols are given through alphabet. • An alphabet is a finite set of symbols.
Examples of Alphabet • {a, b, c, ..., x, y, z} (Roman alphabet) • {0, 1, ..., 9} • {0, 1} (binary alphabet)
Length of a String • The length of a string x is the number of symbols contained in the string x, denoted by |x|. • For example, | string | = 6, • |CS5400| = 6, | 101001 | = 6. • The empty string is a string having no symbol, denoted by ε.
Equal • Two strings x1x2···xn and y1y2···ym are equal if and only if (1) n=m and (2) xi=yi for all i. • For example, 01 ≠ 010 and 1010 ≠1110.
Substring • s is a substring of x if there exist strings y and z such that x = ysz. • In particular, when x = sz (y=ε), s is called a prefix of x; when x = ys (z=ε), s is called a suffix of x. For example, CS is a prefix of CS5400 • and 5400 is a surfix of CS5400.
Concatenation • The concatenation of two strings x and y is a string xy, i.e., x is followed by y. • For example, CS5400 is a concatenation of CS and 5400. • In particular, we denote xx = x, xxx = x, xxxx = x, ..., and define x = ε • For example, 101010 = (10), (10) = ε 4 2 3 0 3 0
Solve equation 011x=x011 • If x=ε, then ok. • If |x|=1, then no solution. • If |x|=2, then no solution. • If |x|>3, then x=011y. Hence, 011x=011y011. So, x=y011. Hence, 011y=y011. • x=(011) for k > 0 k
Language • A language is a set of strings. For example, {0, 1}, {all English words}, {0, 0, 0, ...} are all languages. • The following are operations on sets and hence also on languages. Union: A U B Intersection: A ∩ B Difference: A \ B (A - B when B A) Complement: A = Σ* - A where Σ* is the set of all strings on alphabet Σ. 0 1 2 _
Concatenation of Languages • Concatenation: AB = {ab | a \in A, b \in B} • For example, {0, 1}{1, 2} = {01, 02, 11, 12}. • Especially, we denote A = A, A = AA, ..., and define A = {ε}. 1 2 0
If AB=B for any B, then A ={ε}. • Choose B = {ε }. Then A ≠ empty and A cannot contain a nonempty string.
Examples 2 • For Σ = {0, 1}, Σ = {00, 01, 10, 11}, • (Σ is the set of all strings of length k on Σ.) Therefore, • Σ* = Σ U Σ U Σ U ···. k 0 1 2
Kleene Closure • Kleene closure: A* = A U A U A U ··· • Notation: A = A U A U A U ··· 0 1 2 + 1 2 3
A={grand, ε}, B={father, mother}. What is A*B? • A*B={father, mother, grandfather, grandmother, …}
What is ? • What is ? • What is ? • Where is the empty language.
A* = A if and only if ε is in A + + + • If ε is in A, then ε is in A. Hence A* = A. • If ε is not in A, then ε is not in A. Hence A* ≠ A. + +
{0, 10}* is the language of strings not containing substring 11 and not ending with 1. • What is the language of strings not containing substring 11 and ending with 0? • {0, 10} +
Puzzle • How many strings of length at most 40 are in the following language ?
Regular Languages • The concept of regular languages on an alphabet Σ is defined recursively as follows: (1) The empty language is regular. (2) For every symbol a Σ, {a} is regular. (3) If A and B are regular languages, then A U B, AB, and A* are regular. (4) Nothing else is a regular language.
{ε} is regular. • Because the empty language is regular, = {ε} is regular.
For Σ={0,1}, {011} is regular. • Since {0} and {1} are regular, {011}={0}{1}{1} is regular • Remark: Every language containing only one string is regular.
{011,100} is regular. • Because {011} and {100} are regular, {011, 100} = {011}U{100} is regular. • Remark: Every finite language is regular. • Remark: Every infinite regular language must be obtained with Kleene closure.
Operation Preference • ({0}*U{0}{1}{1}*){0}{0}{1}* • (1) Kleene closure has the higher preference over union and concatenation. • (2) Concatenation has the higher preference over union.
The language of all binary strings starting with 01 is regular. Proof. The string in this language is in form 01x1··· xn where x1··· xn {0,1}*. Therefore, the language can be written as {01} {0,1}* = ({0}{1})({0} U {1})*, which is regular.
The language of all binary strings ending at01 is regular. Proof. The string in this language is in form x1··· xn01 where x1 ··· xn {0,1}*. Therefore, the language can be written as {0,1}*{01} = ({0} U {1})*({0}{1}), which is regular.
The language of all binary strings having substring 01 is regular. Proof. The string in this language is in form x1 ··· xn01y1 ··· ym where x1 ··· xn, y1 ··· ym {0,1}*. Therefore, the language can be written as {0,1}* {01} {0,1}* =({0}U{1})*({0}{1})({0}U{1})*, which is regular.
Question: Do you fell that the expression of the regular set in the above example contains too many parentheses? • Here is a simple expression -- Regular Expression
Regular Expression • (1) is a regular expression of the empty language. • (2) ε is a regular expression of {ε}. • (3) For any symbol a, a is a regular expression of {a}. • (4) If rA and rB are regular expressions of languages A and B, then rA+rB is a regular expression of A U B, rArB is a regular expression of AB, and rA*is a regular expression of A*.
Examples • 011 is a regular expression of {0}{1}{1}. • 0+1 is a regular expression of {0,1}. • (0+1)* is a regular expression of {0,1}*. • Remark: (0+1) is also considered to be a regular expression of {0, 1}. + +
The language of all binary strings starting with 01 has a regular expression 01(0+1)*. • The language of all binary strings ending at 01 has a regular expression (0+1)*01. • The language of all binary strings having substring 01 has a regular expression (0+1)*01(0+1)*.
Induction Proof • Because the regular language is defined recursively, • we can prove the property of regular languages by • proving the following: (1) has the property. (2) For any symbol a Σ, {a} has the property. (3) If A and B has the property, then all A U B, AB, and A* have the property. • Actually, this is an induction proof. (1), (2) serve the basis step and (3) is the induction step.
R R R R • For a string x=x1x2…xn, x =xn…x2x1. • For a language A, A = {x | x A}. • Show that if A is regular, so is A. Proof. (1) is regular. (2) For any symbol a, {a} = {a} is regular. (3) Suppose that for regular languages A and B, A and B are regular. Then (A U B) = A U B is regular, (AB) = B A is regular. (A*) = (A )* is regular. R R R R R R R R R R R
Find a regular expression for {xwx | x (0+1)*, w (0+1)*} R • {xwx | x (0+1)*, w (0+1)*} = (0+1)* R
Find a regular expression for {xwx | x (0+1), w (0+1)*} R + + • {xwx | x (0+1), w (0+1)*} = 0(0+1)*0 + 1(0+1)*1 R
Puzzle • How many regular expressions can a language have?