Strings

Strings • A string over a set A is a finite sequence of elements from A. • The set of elements from which the strings are built is called • an alphabet. Definition. An alphabet is a nonempty, finite set of indivisible symbols. We are going to denote it by . • Any program is a string of keywords, variable names, and • permissible symbols. • A programming language should satisfy general rules (grammar) • to be understood by computer (compiler). These rules are studied • by the formal theory of programming languages.

Definition. A string w over an alphabet is a sequence of symbols, w = a1a2… an, where each ai , 1 i  n. • The number of symbols is called thelengthof the string, |w|=n. There is one special string that has zero length (contains no symbols). It is called the empty string and has special notation , . |w|=0  w = .  is not an element of any alphabet,  . Example. Let  = {a, b, c}. Find all possible strings with length less or equal 3 built from . Length 0:  Length 1: a b c Length 2: aaab ac ba bb bc ca cb cc Length 3: aaa aab aac aba abb abc aca acb acc …

Two strings u and v can be concatenated to form a single • string uv, that consists of the symbols of string u, followed by • symbols of of string v. The length | uv | = | u|+| v |. • u = u = u • Concatenation is associative: (uv)w = u(vw), • but not commutative: uv  vu (the order is important!).

? Q : L1L2 = L2L1 Definition. Any set of strings over some alphabet is called a language. Examples: Set of all executable computer programs is a language. Alphabet itself is a language as well (the language of all one-symbol words). Since languages are sets, we can apply all set operations to languages: union, intersection and set difference. There is one operation specific for languages: concatenation of two languages : L1L2 ={uv | u  L1 and v  L2 } A: L1L2L2L1

Example. Take the alphabet = {a, b, c}. Consider two languages over alphabet : L1 ={a, ab} and L2 ={b, bc, c}. Find L1L2 and L2L1. We need to take every string from L1 and concatenate with every string from L2 . In this way we get |L1||L2 | strings: ab, abc, ac, abb, abbc, abc. Note, that not all strings are distinct, like abc. L1L2 = {ab, abc, ac, abb, abbc} . In the same way: L2L1 = {ba, bab, bca, bcab, ca, cab}. The cardinality | L1L2 | is the number of distinct strings, resulting from concatenation . In general, | L1L2 |  | L1|  |L2 | and | L1L2 |  | L2L1 | In the example | L1L2 | = 5< | L1|  |L2 |=6.

In particular, we can consider the concatenation of an alphabet  with itself:  is the language of all two-symbol words. Notation:  = 2 Example: ={a, b}, = 2 = {aa, ab, ba, bb} Similarly, 3 = 2, the language that consists of all 3-symbol words: 3 ={aaa, aba, baa, bba, aab, abb, bab, bbb}. So, we can define recursively for any n>1: n = n-1 To make this recursive definition agree with the basis case n =1, = 0, zero power 0 is defined as 0 = {}, (no matter what is ). Then {} ={ x | x  } = { x | x  }=  What is  2? What is  2  3  …  n?

‘Kleene star’ notation: * = 0  1  2  … So, * is the (infinite) set of all possible words over alphabet , including empty string . Example. = {0, 1}. * is an infinite set of all possible bit strings. (or all binary numbers including numbers with leading 0’s and empty string). Any language L over alphabet is a subset of *, L  *. Note that {} , because  ={}  {} | {}|=1, ||=|{}|=0. A language L may contain  , or may not.

Example. Consider two languages over alphabet  = {a}: L1={aa}, L2={, aa, aaaa}. What is L1*? By definition of Kleene star L1* = L10 L11 L12 … ={}{aa} {aaaa} {aaaaaa} … = {, aa, aaaa, aaaaaa, …} infinite set of strings of even length build from symbol a. What is L2*? L2* = L20 L21 L22 … ={}{, aa, aaaa} {, aa, aaaa, aaaaaa, aaaaaaaa}… ={, aa, aaaa, aaaaaa, …} = L1*

Definition. A string u is called a substring of v if there exist two strings x and y, such that v = xuy, and x, y  * Definition. A string u is called a prefix of v if there exists a string x  *, such that v = ux. Similarly, a string u is called a suffix of v if there exists a string y  *, such that v = yu.

Theorem 1. Let A, B and C be sets of strings. Then (AB)C = ACBC Proof. a) We need to prove the equality of two sets of strings. We can do it by double-inclusion, i. e. to show that i) (AB)C  ACBC and ii) ACBC  (AB)C

i) To prove (AB)C  ACBC, it’s suffices to show that for any string w, w(AB)C  wAC BC Take any w (AB)C x, y, such that w = xy and x(AB) and yC (dfn of concat) … (xA or xB) and y C (dfn of  ) … (xA and yC)or (xB and yC) (distributive property)  w AC or wBC (dfn of concat)  w AC BC (dfn of )

ii) To prove that ACBC (AB)C, we need to show that for any string w, w ACBC  w (AB)C Take any w  AC BC  w AC or w BC (dfnof ) x, y, such that w = xy and (x A and y C) or (x B and y C) (dfnof concat) So we can have two cases. In the first case, (x A and y C) implies that (x AB and y C) because A (AB). In the second case, (x B and y C) implies that (x AB and y C) because B (AB). So, in either case we have  w (AB)C(dfn of concat) So, we proved ACBC (AB)C and (AB)CACBC, that means (AB)C = ACBC

Theorem 2. Let A, B and C be sets of strings. Then (AB)C  ACBC Proof. To prove subset relation we need to show that for any string w, w(AB)C  wACBC. Why not to prove ACBC  (AB)C as well? Let’s try. Take arbitrary wACBC  wAC and wBC . (x, y, w=xy, xA and yC)and (u,v, w=uv, uB and vC) Can we imply xy=uv  x = u ? No, because the same string abc may come from abc and abc Example.A ={a}, B ={ab}, C ={c, bc}. Then AB={}, (AB)C={}. AC={ac, abc} BC={abc, abbc} abc ACBC, but we can not imply that abc (AB)C={}

Using set operations to specify languages. • The specification of a language requires an unambiguous • description of the strings that belong to the language. • Set notations can be used for strict definitions of languages. • Consider a few examples of set notations for languages: • The language over {a, b} that consists of the strings • containing the substring bb. L1= {a, b}*{bb}{a, b}* The set {a, b}* permits any number of a's and b's to precede and follow the occurrence of bb. 2) The language L2 consists of all strings that begin with aa and end with bb. L2={aa}{a, b}*{bb}.

3) The language L3 consists of all strings that begin with aa or end with bb. L3={aa}{a, b}*{a, b}*{bb}. 4) The set of even-length strings L4={aa, ab, bb, ba}*.

Regular Languages Regular languages are the simplest and satisfy some restrictions. Definition. Let  be an alphabet. A regular language over  is defined recursively as follows: i) Basis: , {}, {a}, for any a are regular. ii) Recursive Step: If X and Y are regular, then XY, XY and X* are regular languages. iii) Closure. X is regular language over  only if it can be obtained from the basis elements by finite number of applications of the recursive step.

Example. Show that L={ab}{a, b}*{ba} is regular language. Consider all steps: {a}, {b} are regular by Basis. {ab}={a}{b} is regular as concatenation of regular languages. {ba}={b}{a} is regular as concatenation of regular languages. {a}{b}={a, b} is regular as the union of regular languages {a, b}* is regular as Kleene closure of regular language {ab}{a, b}*{ba} is regular as concatenation of regular languages All finite languages are regular. Infinite languages may be not.

Regular languages are often described by algebraic expressions called regular expressions. Regular expressions are used to abbreviate the specification of regular languages. Definition. Let  be an alphabet. A regular expression over  is defined recursively as follows: i) Basis: , , a are regular expressions for all a. ii) Recursive Step: Let u and v be regular expressions over . Then (u+v), uv, u* are regular expressions. iii) Closure: u is a regular expression over  only if it can be obtained from the basis elements by finite number of applications of the recursive step.

Examples of regular expressions over alphabet ={a, b}: , , a, b, +a, b*, a+ba, (a+b)a, ab*, a*+b*, etc. For each regular expression E we might be able to associate a regular language L(E) following the following rules: L()=, L()={}, L(a)={a}, L(R+S)=L(R)L(S), L(RS)=L(R)L(S), L(R*)=L(R)*

Example. Let's find the language of the regular expression a+bc* over ={a, b, c}. L (a+bc*)=L(a)L(bc*) ={a} L(b)L(c*) ={a}{b}{c}* So, the language described by expression a+bc* consist of string a and strings that start with one b followed by any number of c’s. L ={a, b, bc, bcc, bccc, …}

Describe the language for each of the following regular expressions. L1={a, b} 1) a+b L2={a, bc} 2) a+bc L3={c, a, ab, abb, …} 3) ab*+c L4={a, b, ab, abb, …, bc, bcc, …} 4) ab*+bc* 5) a*bc*+ac L5={ac, b, ab, bc, abc, aabc, …}

Distinct regular expressions may represent the same language: a+b and b+a represent the same language {a, b}. Two expressions R and S are considered equal if they represent the same language, i.e. L(R)=L(S). Properties of Regular Expressions 1) + properties R+T=T+R R+=+R=R R+R=R (R+S)+T=R+(S+T)

2) properties R=R= R=R=R (RS)T=R(ST) 3) Distributive properties R(S+T)=RS+RT (S+T)R=SR +ST 4) Closure properties *=*= R*=R*R*=(R*)*=R+R* R*=+ R*=(+ R*)*=(+R) R*=+R R* R*=(R+…+ Rk)* for any k1 R*=+ R+ R2+…+ Rk1+Rk R* for any k1 R*R=R R* 7) (R+S)*=(R*+S*)*=(R*S*)*=(R*S)*R*=R*(SR*)* R(SR)*=(RS)*R (R*S)*=+(R+S)*S (RS*)*=+R(R+S)*

Strings

Strings

Presentation Transcript

Strings

Strings

Strings

Strings

Strings

Strings

Strings

Strings

Strings

Strings

Strings

Strings

Strings

Strings

Strings

STRINGS

Strings

Strings

Strings