520 likes | 854 Views
Regular Languages and Regular Grammars. Chapter 3. Regular Languages. Regular Language. Describes. Regular Expression. Accepts. Finite State Machine. Operators on Regular Expressions. In order of precedence:. () Parentheses * Star Closure Concatenation + Union. Example:
E N D
Regular Languages and Regular Grammars Chapter 3
Regular Languages Regular Language Describes Regular Expression Accepts Finite State Machine
Operators on Regular Expressions In order of precedence: () Parentheses * Star Closure Concatenation + Union Example: Over = {a, b, c}, (a + (b . c))* produces: {λ, a, bc, aa, abc, bcbc, … } . Note: The concatenation symbol is often omitted.
Regular Expressions Let be a given alphabet. Then , λ, anda are all primitive regular expressions. 2. If r1and r2 are regular expressions, so are r1 + r2, r1.r2, r1*, and (r1) 3. A string is a regular expression, iff it can be derived from the primitive regular expressions by a finite number of application of the rules in (2).
Languages Associated with Regular Expressions If r is a regular expression L(r) is a language associated with r. Rules to simplify languages associated with r: L() = L(λ) = λL(a) = {a} L(r1 + r2) = L(r1) U L(r2) L(r1.r2) = L(r1).L(r2) L((r1)) = L(r1) L(r1*) = (L(r1))*
Analyzing a Regular Expression L((a+ b)*b) = L((a+b)*) L(b) = (L(a+b))* L(b) = (L(a) UL(b))* L(b) = ({a} U{b})* {b} = {a, b}* {b}. A string of a’s and b’s that end with b
Analyzing a Regular Expression L(a*b*) = L(a*)L(b*) = {a}*{b}* A string of zero or more a’s followed by a string of zero or more b’s.
Given a Language, find a rex L = {w {a, b}* : w = |w| is even} ((a + b)(a + b))* or (aa + ab + ba + bb)*
Examples L = {w {a, b}* : wcontains an odd number of a’s} b*(ab*ab*)*ab* or b*ab*(ab*ab*)* Both expressions require that there be a single a somewhere. There can also be other a’s, but they must occur in pairs.
More Regular Expression Examples Try these: L= {w {a, b}*: there is no more than one b in w} L(r) = {a2nb2m+1 : n 0, m 0}
More Regular Expression Examples Try these: L= {w {a, b}*: there is no more than one b in w} a*(λ+b)a* or a* + a*ba* L(r) = {a2n b2m+1 : n 0, m 0} (aa)*(bb)*b
The Details Matter a* +b* (a+b)* (ab)* a*b*
Rex to NFA Finite state machines and regular expressions define the same class of languages. Theorem: Any language that can be defined with a regular expression can be accepted by some NFA and so is regular. Proof by Construction: Must show that an NFA can be constructed using rules for: , λ, any symbol in , union, and concatenation.
For Every Regular Expression There is a Corresponding FSM We’ll show this by construction. An FSM for: :
For Every Regular Expression There is a Corresponding FSM We’ll show this by construction. An FSM for: :
For Every Regular Expression There is a Corresponding FSM We’ll show this by construction. An FSM for: : A single element of :
For Every Regular Expression There is a Corresponding FSM We’ll show this by construction. An FSM for: : A single element of :
For Every Regular Expression There is a Corresponding FSM We’ll show this by construction. An FSM for: : A single element of : λ:
For Every Regular Expression There is a Corresponding FSM We’ll show this by construction. An FSM for: : A single element of : λ:
Union M1 (recognizes string s) ;;; … λ λ λ λ … M2(recognizes string t) FSA that recognizes s+t
Concatenation M1 (recognizes string s) M2(recognizes string t) λ ;;; λ λ … … FSA that recognizes st
Star Closure λ M1 (recognizes string s) ;;; λ λ … λ λ FSA that recognizes s*
An Example (b +ab)* An FSM for a An FSM for b An FSM for ab: λ
An Example (b +ab)* An FSM for (b+ab): λ λ λ
An Example An FSM for (b+ab)*: λ λ λ λ λ λ λ λ
An Example A Simplified FSM for (b+ab)*: λ b a b λ
For Every FSM There is a Corresponding Regular Expression Theorem: Every regular language (i.e., every language that can be accepted by some DFSM) can be defined with a regular expression. Proof by Construction: Use generalized transition graphs (GTGs) to convert FSM to REX. A GTG is a transition graph whose edges are labeled with regular expressions.
A Simple Example Let M be: Suppose we rip out state 2:
The Algorithm fsmtoregexheuristic • fsmtoregexheuristic(M: FSM) = • 1. Remove unreachable states from M. • 2. If M has no accepting states then return . • 3. If the start state of M is part of a loop, create a new start state s • and connect s to M’s start state via an λ-transition. • 4. If there is more than one accepting state of M or there are any • transitions out of any of them, create a new accepting state and • connect each of M’s accepting states to it via an λ-transition. The • old accepting states no longer accept. • 5. If M has only one state then return λ. • 6. Until only the start state and the accepting state remain do: • 6.1 Select rip (not s or an accepting state). • 6.2 Remove rip from M. • 6.3 *Modify the transitions among the remaining states so M • accepts the same strings. • 7. Return the regular expression that labels the one remaining • transition from the start state to the accepting state.
Example 1 Create a new initial state and a new, unique accepting state, neither of which is part of a loop. Note: λ
Example 1, Continued 2. Remove states and arcs and replace with arcs labeled with larger and larger regular expressions.
Example 1, Continued Remove state 3:
Example 1, Continued + Remove state 2:
Example 1, Continued + Remove state 1: + +
Example 2 a*(a + b)c*
Example 3 a* + a*(a + b)c*
Simplifying Regular Expressions Regex’s describe sets: ● Union is commutative: + = +. ● Union is associative: (+) + = +(+). ● is the identity for union: += + = . ● Union is idempotent: + = . Concatenation: ● Concatenation is associative: () = (). ●λis the identity for concatenation: λ= λ = . ● is a zero for concatenation: = = . Concatenation distributes over union: ● (+) = () +(). ● (+) = () +(). Kleene star: ● * = λ. ●λ* = λ. ●(*)* = *. ● ** = *. ●(+)* = (**)*.
Applications of regular expressions: Pattern Matching • Many applications allow pattern matches • unix • perl • Excel • Access • … • Pattern matching programs use automata • pattern rex nfa dfa transition table driver
A Biology Example – BLAST Given a protein or DNA sequence, find others that are likely to be evolutionarily close to it. ESGHDTTTYYNKNRYPAGWNNHHDQMFFWV Build a DFSM that can examine thousands of other sequences and find those that match any of the selected patterns.
Using Regular Expressions in the Real World Matching numbers: -? ([0-9]+(\.[0-9]*)? | \.[0-9]+) Matching ip addresses: S !<emphasis> ([0-9]{1,3} (\ . [0-9] {1,3}){3}) </emphasis> !<inet> $1 </inet>! Finding doubled words: \< ([A-Za-z]+) \s+ \1 \> From Friedl, J., Mastering Regular Expressions, O’Reilly,1997.
More Regular Expressions Identifying spam: \badv\(?ert\)?\b Trawl for email addresses: \b[A-Za-z0-9_%-]+@[A-Za-z0-9_%-]+ (\.[A-Za-z]+){1,4}\b
Using Substitution Building a chatbot: On input: <phrase1> is <phrase2> the chatbot will reply: Why is <phrase1> <phrase2>?
Chatbot Example <user> The food there is awful <chatbot> Why is the food there awful? Assume that the input text is stored in the variable $text: $text =~ s/^([A-Za-z]+)\sis\s([A-Za-z]+)\.?$/ Why is \1 \2?/ ;
Regular Grammars A regular grammar G is a quadruple (V, T, S, P) that is either consistently right-linear or consistently left-linear. ● V - Variables ●T – Terminals ● S - Start variable, S V ● P - Productions
Right-Linear Grammar All production rules are of the form: A xB or A x A,B V A and B are variables x T* x is a string in the alphabet Example: G = ({S}, {a, b}, S, P) P: S abS | a Corresponding Regular Expression: (ab)*a
Left-Linear Grammar All production rules are of the form: A Bx or A x A,B V A and B are variables x T* x is a string in the alphabet Example: G = ({S, S1, S2}, {a, b}, S, P) P: S S1ab S1 S1ab | S2 S2 a Corresponding Regular Expression: aab(ab)*
Focus on Right-Linear Grammars A language generated by a right-linear grammar is always regular. Proof by construction of FA on page 91 of text. Example: Construct an FA that accepts the language generated by the grammar: V0 aV1 V1 abV0| b
Focus on Right-Linear Grammars V0 aV1 V1 b V1 abV0 Complete FA: