150 likes | 356 Views
Preliminary definitions. An alphabet is a finite set of symbols. An alphabet is not a symbol A (character) string over an alphabet S is a finite sequence of symbols from S. A language over the alphabet S is a set of strings over S. So programming languages are sets of programs
E N D
Preliminary definitions • An alphabet is a finite set of symbols. • An alphabet is not a symbol • A (character) string over an alphabet S is a finite sequence of symbols from S. • A language over the alphabet S is a set of strings over S. • So programming languages are sets of programs • Also, alphabets and strings must be finite, but languages needn't be.
Thesis • Many interesting and important questions about computing reduce to questions about membership in languages, e.g. • is x2!y a legal identifier in language PQR? • is null a legal statement in language QRS? • is 10101010101 a prime number? • Is Buffalo buffalo a legal English sentence? • will a given C program halt on all inputs?
Strings (more precisely) • A (character) string over an alphabet S either • is empty, or • has the form x.a, where x is a string over S and a is an element of S • The intuition is that x.a results from adding the symbol a to the end of the string x • The empty string is written as l
Notational conventions • Capital letters are used for languages • Capital Greek letters are used for alphabets • Lower-case letters near the beginning of the English alphabet are used for symbols from alphabets • Lower-case letters near the end of the English alphabet are used for strings
String concatenation • If x and y are strings, then the concatenation xy (or x∙y) of x and y is • x if y is empty • (xz).a if y = z.a • We’ll write the string whose only symbol is a as a. With this notation, xa = x.a • x is a prefix of y iff there exists z with y = xz. • In this case, z is a suffix of y.
String length • If x is a string, then |x| (the length of x) is • 0 if x is empty • 1 + |y| if x = y.a • So the length of the character string abc is • 1 + length(ab) • = 1 + (1 + length(a)) • = 1 + (1 + (1 + length(l)) • = 1 + (1 + (1 + 0)) • = 3
Theorems about strings • If x is an empty string, then xy = y • If x and y are strings, then |xy| = |x| + |y| • Warning: We'll soon define a notion of language concatenation for which the analog of the 2nd theorem does not hold
First symbols • The first symbol of a nonempty string xa is • a if x = l • the first symbol of x otherwise • Theorem 3: The first symbol of xy is • the first symbol of x if x is nonempty • the first symbol of y if x is empty but y is not • undefined if both x and y are empty
Corollary to Theorem 3 • If x is a string over an alphabet S and b is a symbol of S, then the first symbol of bx is b. • Also |bx| = 1 + |x|
Language concatenation • The language LM is equal to • {xy | x ε L and y ε M} • An alternate notation for LM is L∙M
Repetition • The string xn is obtained from x by concatenating n copies of x. Formally, it equals • l if n = 0 • xn-1x if n > 0 • The language Ln contains all concatenations of n members of L. Formally, it equals • {l} if n = 0 • Ln-1∙L if n > 0
Closure • If L is a language, then • L* is the union of Ln over all n>=0 • L+ is the union of Ln over all n>=1 • If S is an alphabet, then • S* is the set of all strings over S • S+ is the set of all nonempty strings over S • The second definition follows from the first if S is considered as a set of strings of length 1
Closure, part 2 • Sometimes the * operator is called the Kleene closure operator. • Note that l is a member of L* for all L. • (L*)* = L* • So L* is closed under the closure operator
Properties of concatenation • Theorem: (xy)w = x(yw) • Theorem: xmxn = xm+n • Corollary: xn = xxn-1 • so the definition of xn could have been stated this way • also LmLn = Lm+n, so Ln = LLn-1. • Warning: |LM| needn't equal |L||M| • L might be {a, ab} and M might be {bc, c}
String reversal • The reverse wR of the string w is • l if w = l • azR if w = za • So for example • (abc)R = c(ab)R • = c(b(aR)) • = c(b(a(lR))) = cba