590 likes | 608 Views
Learn about Regular Expressions (REs) and their algebraic descriptions of regular languages, along with the equivalent Deterministic Finite Automata (DFA), Non-deterministic Finite Automata (NFA), and epsilon-NFA. Explore the recursive definitions, basic operations, building REs, and more.
E N D
Regular Expressions: 4th way to define regular languages For DFA, L={w|d(q0,w)F} For NFA, L={w|δ(q0,w)F is not nil} For e-NFA, L={w|CL(δ(q0,w)F is not nil} We will show that for every language defined by regular expressions there is an equivalent DFA, NFA and e-NFA.
RE’s: Introduction • Regular expressions algebraic ways to describe sets of strings that are regular languages (denoted by L(RE)). • RE’s and their languages are defined recursively.
Introduction 2 • 3 basic operations between languages derived from RE’s: • Union denoted by L(RE1)+L(RE2) • Concatenation denoted by L(RE1).L(RE2) or L(RE1)L(RE2) • Closure denoted by L*(RE).
Definition of +, . and * operations L + M the is set of all strings either in L or in M or in both Example: {001,10,111} + {e,001}={e,10,001,111} L.M or simply LM is the set of all string that can be formed by concatenating any string in L with any string in M. Example: {001,10,111}.{e,001}= {001,10,111,001001,10001,111001} Note! left-right order is preserved L* is set of strings obtained by taking any number of strings from L and forming all possible concatenation.
L* relation to powers of L • L* = Uk>0 Lk • Union of all powers of L (including zero) • L0 = {e}, • L* contains {e} for any L • L1 = L • Lk (k>1) concatenation of k copies of L • If L={0,11}, L2 = {0,11}{0,11} ={00,011,110,1111} • L(∅) is the empty language (no strings) • L(∅)*={e} rare example of finite closure • L+ is the same as L* except no empty string
Building regular expressions • Like all algebras, RE’s are made up of constants and variables connected by operators. • Parentheses are used to group terms
Elementary components of RE’s • Basis 1: any symbol, a, is a RE. • L(RE)={a}is language containing one string of length 1. • Basis 2: e is a RE. • L(RE)={e} consists of empty string only • Basis 3: ∅ is a RE. • L(RE) = ∅ has no strings
Recursive Definitions of RE’s • Induction 1: If E1 and E2 are RE’s, then E1+E2 is a RE, and L(E1+E2) = L(E1)+L(E2) • Induction 2: If E1 and E2 are RE’s then E1.E2 is a RE, and L(E1.E2) =L(E1).L(E2)
Closure, or “Kleene closure” named for originator of * operation Recursive Definition of RE 2 • Induction 3: If E is a RE, then E* is a RE, and L(E*) = (L(E))* or simply L(E)*
Precedence of Operators • precedence of operations • * highest • . (or juxtaposition) next • + lowest Parentheses are used as needed to influence the precedence of operators.
Associative & Distributive Laws • Distribution of concatenation over union • a(b+c) = ab + ac • Concatenation is associative. • 0(12) = (01)2 • Union is associative. • (a+b)+c = a+(b+c)
Examples: RE -> L(RE) • L(01) = ? • L(01+0) = ? • L(0(1+0)) = ? • L(0*) = ? • L(01*) = ? • L((01)*) = ? • L((01)+) = ?
Examples: RE -> L(RE) • L(01) = {01}. • L(01+0) = {01, 0}. • L(0(1+0)) = {0}{0,1}={00, 01}. • L(0*) = {ε, 0, 00, 000,… }. • L(01*) = 0{e,1,11,…}={0,01,011,…} • L((01)*) = {e,01,0101,…} • L((01)+) = {01,0101,…}
Given L, what RE will generate the strings in L? Example: L = strings alternating 0’s and 1’s
Given L, what RE will generate the strings in L? Example: L = strings alternating 0’s and 1’s L={e,0,1}+strings alternating 0’s and 1’s length >1 L={e,0,1,01,10} + strings alternating 0’s and 1’s length >2 L={e,0,1,01,10,010,101} + strings alternating 0’s and 1’s length > 3 Strings with even number of characters either begin 0 end 1 or begin 1 end 0 Strings with odd number of characters either begin 0 end 0 or begin 1 end 1 Work with union of RE’s to achieve desired result
Example:L={e,0,1}+strings alternating 0’ and 1’s length >1 • L((01)*)={e,01,0101,010101,…} begins 0 and ends 1. (all even # of characters) • L((10)*)={e,10,1010,101010,…} begins 1 and ends 0. (all even) • L(0(10)*)=0{e,10,1010…}={0,010,01010,…} begins 0 and ends 0 (all odd) • L(1(01)*)=1{e,01,0101…}={1,101 10101,…} begins 1 and ends 1 (all odd) • L is the union of 4 cases={e,0,1,01,10,010,101,…} RE = (01)*+(10)*+0(10)*+1(01)* To exclude {e,0,1} same RE with * replaced by +
Equivalence of REs RE1 = (01)*+(10)*+0(10)*+1(01)* L={e,0,1}+strings alternating 0’ and 1’s length >1 RE2=(e+1)(01)*(e+0) Is L(RE2) = L(RE1)?
L={e,0,1}+strings alternating 0’ and 1’s length >1 • Consider RE=(e+1)(01)*(e+0) • Does this RE define the strings in L? • Distributing concatenation over union gives the 4 cases • (01)*(e+0)=((01)*+ (01)*0) • (e+1)((01)*+ (01)*0)= • (01)*+(01)*0 + 1(01)*+ 1(01)*0 • same language as (01)*+(10)*+0(10)*+1(01)*? • Only 2 terms are the same (01)* and1(01)* • (01)*={e,01,…} even, begin=0, end=1 • (01)*0={0,010,…} odd, begin=0, end=0 • 1(01)*={1,101,…} odd, begin=1, end=1 • 1(01)*0 ={10,1010,…} even, begin=1, end=0 • Different REs can define the same language
Review Precedence of operations • * (highest) operates on smallest sequence of symbols to its left that is a legal RE • Example: 01* closure on 1 only • After grouping all *’s to their operands, group all concatenations to their operands (0 to 1* in RE=01*) • Finally, group unions (+) with operands; 01*+1=0{e,1,11,…}+1={0,01,011,…}+1 ={1,0,01,011,…}
Precedence matters: • E=01*+1=0{e,1,11,…}+1={0,01,011…}+1={1,0,01,011..} • E=0(1*+1) = ?
Precedence matters: • E=01*+1=0{e,1,11,…}+1={0,01,011…}+1={1,0,01,011..} • Override precedence by () • E=0(1*+1)=01*=0{e,1,11,…}={0,01,011,…} • Note: 1* and (1*+1) are the same
Equivalence of RE’s and FA’s • Will show that for every RE, there is an FA that defines the same language. • Sufficient to show for ε-NFA’s. • Will show that for every FA, there is a RE that defines the same language. • Sufficient to show for DFA’s.
DFA-to-RE • Rename the states of the DFA to be 1,2,…,n. • construct RE’s from the labels of a restricted sets of paths called k-paths. • k-path is a path between specified states that goes though no state numbered higher than k. • The RE of a k-path will contain the REs of paths numbered less than k between the same end points. • Endpoints of k-paths are not restricted; they can be any pair of states or the same state (i.e. a loop) • Endpoints can also be intermediates.
1 1 2 0 0 0 1 1 3 Example: k-Paths • 0-paths from 2 to 3 • no intermediates • R230=RE from labels (only one in this case) = 0. • 1-paths from 2 to 3 • direct and around outside • R231=RE from labels = 0+11 • Which state is the intermediate?
1 1 2 0 0 0 1 1 3 Example: k-Paths • 2-paths from 2 to 3: • R232=RE from labels = (10)*0+1(01)*1 • (10)* and (01)* allow for zero or more loops through 1 and 2 before going to 3. • Does this RE contain REs for k=0 and k=1 paths? • 3-paths from 2 to 3: • All paths, no restrictions, k=n
1 1 2 0 0 0 1 1 3 More: k-Paths R120= ?. R121= ?. R122= ?. R130= ?. R131= ?. R132= ?.
1 1 2 0 0 0 1 1 3 More: k-Paths R112= ?. R222= ?. R332= ?.
1 1 2 0 0 0 1 1 3 Formal development: DFA to RE • Let Rijk be the RE from the set of labels of k-paths from state i to state j. • Basis: k=0 Rij0 = sum of labels on arcs from i to j; ∅ if no such arc; add ε if i=j • Examples: • R110 = ∅ + ε = ε • R120 = 0 • R130 = 1 • R210 = 1 • 5 more
Goes from i to k the first time Then, from k to j Doesn’t go through k Zero or more times from k to k Induction: relate k-path to (k-1)paths • A k-path from i to j either: • Never goes through state k, or • Goes through k one or more times. Rijk = Rijk-1 + Rikk-1(Rkkk-1)* Rkjk-1.
Paths not going through k From k to k zero or more times From i to k From k to j i j k States < k
Final step in DFA to RE • Rijn is the RE that defines the same language as the DFA where: • n is the number of states in the DFA • i is the start state. • j is one of the final states. • For multiple final states j, k,… REeq = Rijn + Rikn + …
Useful relationships in simplification of Rijk L+ = LL* = L*L ∅* = ε Derive (R+e)*=R* and (R+e)+=R* ∅ is the identity for + R + ∅ = R ε is the identity for concatenation. εR = Rε = R ∅ is the annihilator for concatenation. ∅R = R∅ = ∅
Note! e is not the identity element of union Prove by exhaustive enumeration: (R+e)* = R* (R+e)+ = R*
1 1 2 0 0 0 1 1 3 Example: Starting from 2-pathsdo n=3 with start=2, accept=3, Rijk = Rijk-1 + Rikk-1(Rkkk-1)* Rkjk-1 Simplify R233 = R232 + R232(R332)*R332 Then substitute R232 and R332
1 1 2 0 0 0 1 1 3 Example: Starting from 2-pathsdo n=3 with start=2, accept=3, Simplified R233 = R232(R332)* Now substitute R232 and R332 • R232 = (10)*0+1(01)*1 (see slide 26) • R332 = 0(01)*(1+00) + 1(10)*(0+11) • R233 = [(10)*0+1(01)*1][(0(01)*(1+00) + 1(10)*(0+11))]* • Probably can be simplified. Please don’t unless I ask.
CptS 317 Fall 2019 Assignment 6, Due 10-7-19 Exercise 3.2.1 (a) and (b), Text p107 Graph the DFA with states named 1, 2, and 3 (a)Evaluate all Rij0 (b)Simplify the Rij1 formulas as much as possible. Substitute Rij0 results into the simplified Rij1 formulas. Simplify further if possible.
Example 3.5 text pp 95-97 Discussed in class
Alternate method for FA to RE The method of k-paths always works but may be time consuming since about n3 RE’s must be constructed for an n-state DFA. An alternate method “eliminating states” usually is faster. 38
DFA to RE by Eliminating States Basic principle: After state s is eliminated, RE’s on the residual arcs must define a transition function that supports the same language as before. This requirement can be satisfied by considering the states qi that are precursors to s and states pj that are successors to s 39
DFA to RE by Eliminating States Let Qi be RE for labels on arcs from predecessor qi to eliminated state s Let Pj be RE for labels on arcs from eliminated state s to successor pj Let S be RE for labels on a loop on s, if present Let Rij be RE for labels on any direct paths between qi and pj, if present Then the RE for path between qi and pj without s is Rij +QiS* Pj. Some parts may be ∅ 40
DFA to RE by Eliminating States Continue eliminating states until only “start” and any one of the accepting states {qk} remains. For each accepting state qk, the state-elimination process will result in a generic one-state (if q0=qk ) or generic two-state automaton 41
Generic two-state Generic one-state R REk = R* REk =(R+SU*T)*SU* R U S 1 • Actual values of R,S,T, and U are problem specific and some may be ∅ • Let L(REk) be the language of strings accepted by qk • The RE equivalent to DFA is sum over k of REk (union of all L(REk)) 1 2 T
CptS 317 Fall 2019 Assignment 7, Due 10-11-19 Exercise 3.2.1 (e) Text p107 (Same DFA as in assignment 6) 4 sets of predecessor-successors associated with removal of q2. In each case, name q and p, list Q, P, S, and Rqp ,and evaluate Rqp+QS*P. Graph the generic 2-state form that results from removal of q2. List R, S, T, and U. Substitute to get equivalent RE. Do not simplify. 1 0 0 0 q1 q2 q3 1 1
RE to equivalent FA To complete proof of equivalence, we show by construction that for every RE, there is an FA that accepts the same language that the RE defines. It is sufficient to construct a e-NFA type with the following restriction: One accepting state No arcs into “start” state No arcs out of accepting state 44
Converting a RE to an ε-NFA • Formal statement: if L(RE) is a language defined by RE, then there exist an ε-NFA, denoted by e-NFAeq, such that L(e-NFAeq)=L(RE) • Proof is by constructive induction on the number of operators (+, concatenation, *) in the RE. • Basis: For L(RE)={a} and {e}, e-NFAeq consist of single arc between “start” and accepting states labeled by a and e, respectively • Same for L(RE)=∅ except no arc
e-NFA for E2 e-NFA for E1 (IH):assume theorem true for subexpressions E1 and E2 in RE • Show how these e-NFA’s are used to build e-NFA’s for E1+E2, E1E2, and E1* • e-NFAeq is built by linking these intermediate e-NFA’s as determined by operations in RE • Links are spontaneous transition between circles inside the e-NFAs (left=start, right=finish) of the subexpression E1 and E2
e-NFAeq E1 e-NFAeq E2 ε ε ε ε E1+ E2 RE to ε-NFA: Induction 1 – Union
e-NFAeq E1 e-NFAeq E2 ε E1E2 RE to ε-NFA: Induction 2 – Concatenation
ε e-NFAeq E1 ε ε ε E1* RE to ε-NFA: Induction 3 – Closure
CptS 317 Fall 2019 Assignment 8, Due 10-14-19 Exercise 3.2.4 (a & b) Text p108 Graph components, link by e-transitions, indicate start and final states