500 likes | 637 Views
Mathematical Foundations of Computer Science. Chapter 3: Regular Languages and Regular Grammars. Languages. A language (over an alphabet Σ ) is any subset of the set of all possible strings over Σ . The set of all possible strings is written as Σ * . Example: Σ = { a , b , c }
E N D
Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars
Languages • A language (over an alphabetΣ) is any subset of the set of all possible strings over Σ . The set of all possible strings is written as Σ*. • Example: • Σ = {a, b, c} • Σ* = {, a, b, c, ab, ac, ba, bc, ca, aaa, …} • one language might be the set of strings of length less than or equal to 2. L = {, a, b, c, aa, ab, ac, ba, bb, bc, ca, cb, cc}
Regular Languages • A regular language (over an alphabetΣ) is any language for which there exists a finite automaton that recognizes it.
Mathematical Models of Computation • This course studies a variety of mathematical models corresponding to notions of computation. • The finiteautomaton was our first example. • The finite automaton is an example of an automaton model. • There are other models as well.
Mathematical Models of Computation • Another important model is that of a grammar. • We will shortly look at regulargrammars. • But first, a digression:
Regular Expressions • A regular expression is a mathematical model for describing a particular type of language. • Regular expressions are kind of like arithmetic expressions. • The regular expression is defined recursively.
empty set empty string Regular Expressions • Given an alphabet Σ • , λ and a Σ are all regular expressions. • If r1 and r2 are regularexpressions, then so are r1 + r2, r1r2,r1*and (r1). • Note: we usually write r1r2as r1r2. • These are the only things that are regularexpressions.
Regular Expressions • Meaning: • represents the empty language • λ represents the language {λ} • arepresents the language {a} • r1 + r2represents the language L(r1) L(r2) • r1r2represents L(r1) L(r2) • r1* represents (L(r1))*
Regular Expressions • Example 1: • What does a*(a + b) represent? • It represents zero or more a's followed by either an a or a b. • {a, b, aa, ab, aaa, aab, aaaa, aaab …}
Regular Expressions • Example2: • What does (a+b)*(a+bb) represent? • It represents zero or more symbols, each of which can be an a or a b, followed by either a or bb. • {a, bb, aa, abb, ba, bbb, aaa, aabb, aba, abbb, baa, babb, bba, bbbb, …}
Regular Expressions • Example 3: • What does (aa)*(bb)*b represent? • All strings over {a, b} that start with an even number of a's which are then followed by an odd number of b's. • It's important to understand the underlying meaning of a regular expression.
Regular Expressions • Example 4: • Find a regular expression for strings of 0's and 1's which have at least one pair of consecutive 0's. • Each such string must have a 00 somewhere in it. • It could have any string in front of it and any string after it, as long as it's there!!! • Any string is represented by (0 + 1)* • Answer: (0 + 1)*00(0 + 1)*
Regular Expressions • Example: • Find a regular expression for strings of 0's and 1's which have no pairs of consecutive 0's. • It's a repetition of strings that are either 1's or, if a substring begins with 0, it must be followed by at least one 1. • (1 + 011*)* • or equivalently, (1 + 01)* • But such strings can't end in a 0.
Regular Expressions • Example: • Find a regular expression for strings of 0's and 1's which have no pairs of consecutive 0's. • (1 + 011*)* • (1 + 01)* • But such strings can't end in a 0. • So we add (0 + λ) to the end to allow for this. • (1 + 01)* (0 + λ) • This is only one of many possible answers.
Regular Expressions • Why are they called regular expressions? • Because, as it turns out, the set of languages they describe is that of the regular languages. • That means that regular expressions are just another model for the same thing as finite automata.
Regular Expressions • Homework: • Chapter 3, Section 1 • Problems 1-11, 17, 18
Regular Expressions and Regular Languages • As we have said, regular expressions and finite automata are really different ways of expressing the same thing. • Let's see why. • Given a regular expression, how can we build an equivalent finite automaton? • (We won't bother going the other way, although it can be done.)
Regular Expressions and Regular Languages • Clearly there are simple finite automata corresponding to the simple regular expressions: • • λ • a λ a Note that each of these has an initial state and one accepting state.
Regular Expressions and Regular Languages • On the previous slide, we saw that the simplest regular expressions can be represented by a finite automaton with an initial state (duh!) and one isolated accepting state:
Regular Expressions and Regular Languages • We can build more complex automata for more complex regular expressions using this model:
Regular Expressions and Regular Languages • Here's how we build an nfa for r1 + r2: r1 r1 + r2 λ λ λ λ r2
r1 r2 Regular Expressions and Regular Languages • Here's how we build an nfa for r1r2: λ λ λ r1r2
Regular Expressions and Regular Languages • Here's how we build an nfa for (r1)*: Note: the last state added is not in book. For safety, I do it to have only one arc going into the final state. λ λ λ λ r1 (r1)* λ
Building an nfa from a regular expression • Example: • Consider the regular expression (a + bb)(a+b)*(bb) λ a λ a λ λ λ λ λ λ b b λ λ b λ λ λ b b λ sometimes we just get tired and take an obvious shortcut
Building regular expression from a finite automaton • The book goes on to show that it works the other way around as well: we can find a corresponding regular expression for any finite automaton. • It's fairly easy in some cases and you can "just do it." • However, it's generally complicated and not worth the bother studying. • You are not responsible for this material
Building regular expression from a finite automaton a • The above automaton clearly corresponds toa*(a+b)c* c a, b
Regular Expressions and nfa's • Homework: • Chapter 3, Section 2 • Problems 1-5
Regular Grammars • Review: A grammar is a quadruple G = (V, T, S, P) where • V is a finite set of variables • T is a finite set of symbols, called terminals • S is in V and is called the startsymbol • P is a finite set of productions, which are rules of the formα→β • where α and β are strings consisting of terminals and variables.
Regular Grammars • A grammar is said to be right-linear if every production in P is of the form • A→xB or • A→x • where A and B are variables (perhaps the same, perhaps the start symbol S) in V • and x is any string of terminal symbols (including the empty string λ)
Regular Grammars • An alternate (and better) definition of a right-linear grammar says that every production in P is of the form • A→aB or • A→a or • S→λ(to allow λ to be in the language) • where A and B are variables (perhaps the same, but B can't be S) in V • and a is any terminal symbol
Regular Grammars • The reason I prefer the second definition (although I accept the first one that happens to be used in the book) is • It's easier to work with in proving things. • It's the much more common definition.
Regular Grammars • A grammar is said to be left-linear if every production in P is of the form • A→Bx or • A→x • where A and B are variables (perhaps the same, perhaps the start symbol S) in V • and x is any string of terminal symbols (including the empty string λ)
Regular Grammars • The alternate definition of a left-linear grammar says that every production in P is of the form • A→Ba or • A→a or • S→λ • where A and B are variables (perhaps the same, but B can't be S) in V • and a is any terminal symbol
Regular Grammars • Any left-linear or right-linear grammar is called a regular grammar.
Regular Grammars • For brevity, we often write a set of productions such as • A→ x1 • A→ x2 • A→ x3 • As • A → x1 | x2 | x3
* Regular Grammars • A derivation in grammar Gis any sequence of strings in V and T, • connected with • starting with Sand ending with a string containing no variables • where each subsequent string is obtained by applying a production in P is called a derivation. • S x1 x2 x3 . . . xn abbreviated as: • Sxn
Regular Grammars • S x1 x2 x3 . . . xn • abbreviated as: • Sxn • We say that xn is a sentence of the language generated by G, L(G). • We say that the other x's are sentential forms. *
* Regular Grammars • L(G) = {w | w T* and Sxn} • We call L(G) the language generated by G • L(G) is the set of all sentences over grammar G
Example 1 • S→abS|a is an example of a right-linear grammar. • Can you figure out what language it generates? • L ={w {a,b}* |w contains alternating a's and b's , begins with an a, and ends with a b}{a} • L((ab)*a)
Example 2 • S→AabA → Aab|aBB → a is an example of a left-linear grammar. • Can you figure out what language it generates? • L = {w {a,b}* | wisaafollowed by at least one set of alternatingab's} • L(aaab(ab)*)
Example 3 • Consider the grammarS→AA → aB| λB → Ab • This grammar is NOT regular. • No "mixing and matching" left- and right-recursive productions.
Regular Grammars and nfa's • It's not hard to show that regular grammars generate and nfa's accept the same class of languages: the regular languages! • It's a long proof, where we must show that • any finite automaton has a corresponding left- or right-linear grammar, • and any regular grammar has a corresponding nfa. • We won't bother with the details.
b a S A a b Regular Grammars and nfa's • We get a feel for this by example. • Let S→aAA → abS| b
Regular Grammars and Regular Expressions • Example: L(aab*a) • We can easily construct a regular language for this expression: • S→aA • A→ aB • B→bB • B→a
Regular Languages regular expressions finite automata regular grammars
Regular Languages • Homework: • Chapter 3, Section 3 • Problems