280 likes | 424 Views
Lecture Two: Formal Languages. Amjad Ali. Formal Language. It is an abstraction of the general characteristics of programming languages It consists of a set of symbols and some rules of formation of sentences Sentences are formed by grouping the symbols. Formal Language.
E N D
Lecture Two: Formal Languages Amjad Ali
Formal Language • It is an abstraction of the general characteristics of programming languages • It consists of a set of symbols and some rules of formation of sentences • Sentences are formed by grouping the symbols
Formal Language • A formal language is the set of all strings permitted by the rules of formation
What is a language? • A system for the expression of certain ideas, facts, or concepts, including a set of symbols and rules for their manipulation
Mathematical definition of a language • This shall require us to understand the following concepts first • Alphabets • Strings • Concatenation of strings etc.
Alphabet • An ALPHABET is a nonempty set of symbols • It is denoted by S • Example: S = {a,b} where a and b are symbols
Alphabets • An alphabet is any finite set of symbols • {0,1}: binary alphabet • {0,1,2,3,4,5,6,7,8,9}: decimal alphabet • ASCII, Unicode: machine-text alphabets • Or just {a,b}: enough for many examples • {}: a legal but not usually interesting alphabet • We will usually use as the name of the alphabet we’re considering, as in = {a,b}
Strings • Strings are constructed from the individual symbols • Strings are finite sequences of symbols from the alphabet • Example : aabba, ababaaa, abbbaaa, etc are the strings formed by t he symbols of the alphabet
Symbols And Variables • Sometimes we will use variables that stand for strings: x = abbb • In programming languages, syntax helps distinguish symbols from variables • String x = "abbb"; • In formal language, we rely on context and naming conventions to tell them apart • We'll use the first letters, like a, b, and c, as symbols • The last few, like x, y, and z, will be string variables
Assumptions • Lower case letters a,b,c,… are used for elements of the alphabet • Lower case letters u,v,w,… for string names eg w=aabbaba • This indicates that w is a string having specific value aabbaba
Empty String • The empty string is written as • Like "" in some programming languages • || = 0 • Don't confuse empty set and empty string: • {} • {} {}
Concatenation • The concatenation of two strings x and y is the string containing all the symbols of x in order, followed by all the symbols of y in order • We show concatenation just by writing the strings next to each other • If x = abc and y = def, then xy = abcdef • For any x, x = x = x
Concatenation of the strings • Two strings are concatenated by appending the symbols of one string to the end of the other string • Example u=aaabbb v=abbabba Concatenated string uv=aaabbbabbabba
Length of the string • The length of the string is the number of symbols in the string • |w| = 5 if w = aabaa • Empty String has no symbols and is denoted by l • |l| = 0
Kleene Star • The Kleene closure of an alphabet , written as *, is the language of all strings over • {a}* is the set of all strings of zero or more as: {, a, aa, aaa, …} • {a,b}* is the set of all strings of zero or more symbols, each of which is either a or b= {, a, b, aa, bb, ab, ba, aaa, …} • x * means x is a string over • Unless = {}, * is infinite
Kleene Star • Iterating a language L • L ={ε} • L =L • L =L·L • L =L ·L • Kleene star: L*=Un≥ 0 L • Example: {a,b}* = {ε,a,b,aa,bb,ab,ba, aab, …} • all finite sequences over {a,b}.
S+andS* • S is an alphabet • S* is the set of allstrings obtained by concatenating zero or more symbols fromS • S* always contains l then S+ = S* - {l}
Finiteness • S is always finite • S* and S+ are always infinite
Numbers • We use N to denote the set of natural numbers: N = {0, 1, …}
Exponents • We use N to denote the set of natural numbers: N = {0, 1, …} • Exponent n concatenates a string with itself n times • If x = ab, then • x0 = • x1 = x = ab • x2 = xx = abab, etc. • We use parentheses for grouping exponentiations (assuming that does not contain the parentheses) • (ab)7 = ababababababab
Languages • A language is a set of strings over some fixed alphabet • Not restricted to finite sets: in fact, finite sets are not usually interesting languages • All our alphabets are finite, and all our strings are finite, but most of the languages we're interested in are infinite
Language • A language L is defined very generally as a subset of S • Astring in a language L will be called a sentence of L
Set Formers • A set written with extra constraints or conditions limiting the elements of the set • Not the rigorous definitions we're looking for, but a useful notation anyway: • {x {a, b}* | |x| ≤ 2} = {,a, b, aa, bb, ab, ba} • {xy | x {a, aa} and y {b, bb}} = {ab, abb, aab, aabb} • {x {a, b}* | x contains one a and two bs} = {abb, bab, bba} • {anbn | n ≥ 1} = {ab, aabb, aaabbb, aaaabbbb, ...}
Free Variables in Set Formers • Unless otherwise constrained, exponents in a set former are assumed to range over all N • Examples • {(ab)n} = {, ab, abab, ababab, abababab, ...} • {anbn} = {, ab, aabb, aaabbb, aaaabbbb, ...}
The Quest • Set formers are relatively informal • They can be vague, ambiguous, or self-contradictory • A big part of our quest in the study of formal language is to develop better tools for defining languages
Problem • S = {a,b} • S* = {l,a,b,aa,ab,ba,bb,aaa,aab,aba, abb, baa,bab,bba,bbb,aaaa,… ….} • L = {a,aa,aab} is a language on S as L is a subset of S* and is finite • L = {anbn:n>0} is also a subset of S* but it is infinite
Concatenation of two Languages • L1L2 = {xy :x ε L1 and y ε L2 }