390 likes | 514 Views
Theory of Regular Expressions Finite Automata. CMSC 330: Organization of Programming Languages. Regular Expression Review. Terms Alphabet String Language Regular expression ( “ regex ” ) Operations Concatentation Union Kleene closure Ruby vs. theoretical regexes.
E N D
Theory of Regular Expressions Finite Automata CMSC 330: Organization of Programming Languages
CMSC 330 Regular Expression Review Terms Alphabet String Language Regular expression (“regex”) Operations Concatentation Union Kleene closure Ruby vs. theoretical regexes
CMSC 330 Previous Course Review {s | cond(s)} means the set of all strings s such that the given condition applies s A means s is an element of the set A De Morgan’s Laws: (A ∩ B)C = AC ∪ BC (A ∪ B)C = AC ∩ BC Quantifiers and Qualifiers Existential quantifier (“there exists”): ∃ Universal quantifier (“forall”): ∀ Qualifier (“such that”): s.t.
CMSC 330 Implementing Regular Expressions We can implement regular expressions by turning them into a finite automaton A “machine” for recognizing a regular language
CMSC 330 Example Machine starts in start or initial state Repeat until the end of the string is reached: Scan the next symbol s of the string Take transition edge labeled with s The string is accepted if the automaton is in a final or accepting state when the end of the string is reached Transition on 1 Start state Final state States
CMSC 330 Example 0 0 1 0 1 1 accepted
CMSC 330 Example 0 0 1 0 1 0 not accepted
CMSC 330 What Language is This? All strings over Σ = {0, 1} that end in 1 What is a regular expression for this language? (0|1)*1
CMSC 330 Formal Definition A deterministic finite automaton (DFA) is a 5-tuple (Σ, Q, q0, F, δ) where Σ is an alphabet Q is a nonempty set of states q0 ∊ Q is the start state F ⊆ Q is the set of final states δ : Q x Σ→ Q specifies the DFA's transitions
CMSC 330 More on DFAs A finite state automaton can have more than one final state: A string is accepted as long as there is at least one path to a final state
CMSC 330 Our Example, Stated Formally Σ = {0, 1} Q = {S0, S1} q0 = S0 F = {S1} δ = { (S0, 0, S0), (S0, 1, S1), (S1, 0, S0), (S1, 1, S1) }
CMSC 330 Another Example string state at end accepts? aabcc S2 Y acc S2 Y bbc S2 Y aabbb S1 Y aa S0 Y ε S0 Y acba S3 N (a,b,c notation shorthand for three self loops)
CMSC 330 Another Example (cont’d) What language does this DFA accept? a*b*c* S3 is a dead state – a nonfinal state with no transition to another state
CMSC 330 Shorthand Notation If a transition is omitted, assume it goes to a dead state that is not shown is short for Language? Strings over {0,1,2,3} with alternating even and odd digits, beginning with odd digit
CMSC 330 What Lang. Does This DFA Accept? a*b*c* again, so DFAs are not unique
CMSC 330 Equivalent DFAs = These DFAs are equivalentin the sense that they accept the same language.
CMSC 330 DFA State Semantics S0 = “Haven’t seen anything yet” OR “seen zero or more b’s” OR “Last symbol seen was a b” S1 = “Last symbol seen was an a” S2 = “Last two symbols seen were ab” S3 = “Last three symbols seen were abb” Language? (a|b)*abb
CMSC 330 Review Basic parts of a regular expression? What does a DFA do? Basic parts of a DFA? concatenation, |, *, , , {a} implements regular expressions; accepts strings alphabet, set of states, start state, final states, transition function (Σ, Q, q0, F, δ)
CMSC 330 Notes about the DFA definition Can not have more than one transition leaving a state on the same symbol the transition function must be a valid function Can not have transitions with no or empty labels the transitions must be labeled by alphabet symbols
CMSC 330 Nondeterministic Finite Automata (NFA) An NFA is a 5-tuple (Σ, Q, q0, F, δ) where Σ is an alphabet Q is a nonempty set of states q0∊ Q is the start state F ⊆ Q is the set of final states δ : Q x (Σ∪ {ε}) →Q specifies the NFA's transitions Transitions on ε are allowed – can optionally take these transitions without consuming any input Can have more than one transition for a given state and symbol An NFA accepts s if there is at least one path from its start to final state on s
CMSC 330 NFA for (a|b)*abb ba Has paths to either S0 or S1 Neither is final, so rejected babaabb Has paths to different states One leads to S3, so accepted
CMSC 330 Why are NFAs useful? • Sometimes an NFA is much easier to understand than its equivalent DFA
CMSC 330 Language? (ab|aba)* Another example DFA
CMSC 330 NFA for (ab|aba)* aba Has paths to states S0, S1 ababa Has paths to S0, S1 Need to use ε-transition (ab(ε|a))*
CMSC 330 Relating R.E.'s to DFAs and NFAs Regular expressions, NFAs, and DFAs accept the same languages! can transform DFA NFA can transform can transform (we’ll discuss this next) RE
CMSC 330 Reducing Regular Expressions to NFAs Goal: Given regular expression e, construct NFA: <e> = (Σ, Q, q0, F, δ) Remember, REs are defined inductively; i.e. recursively Base case: a <a> = ({a}, {S0, S1}, S0, {S1}, {(S0, a, S1)} )
CMSC 330 Reduction (cont’d) Base case: ε <ε> = (ε, {S0}, S0, {S0}, ∅) Base case: ∅ <∅> = (∅, {S0, S1}, S0, {S1}, ∅)
CMSC 330 Reduction (cont’d) Concatenation: AB <A> <B>
CMSC 330 Reduction (cont’d) Concatenation: AB <A> = (ΣA, QA, qA, {fA}, δA) <B> = (ΣB, QB, qB, {fB}, δB) <AB> = (ΣA∪ΣB, QA∪QB, qA, {fB}, δA∪δB∪ {(fA, ε, qB)} ) <A> <B>
CMSC 330 Practice Draw the NFA for these regular expressions using the reduction method: ab hello Write the formal (5-tuple) NFA for the same regular expressions
CMSC 330 Reduction (cont’d) Union: (A|B)
CMSC 330 Union: (A|B) <A> = (ΣA, QA, qA, {fA}, δA) <B> = (ΣB, QB, qB, {fB}, δB) <(A|B)> = (ΣA∪ΣB, QA∪ QB∪ {S0,S1}, S0, {S1}, δA∪δB∪ {(S0,ε,qA), (S0,ε,qB), (fA,ε,S1), (fB,ε,S1)}) Reduction (cont’d)
CMSC 330 Practice Draw the NFA for these regular expressions using exactly the reduction method: ab|bc hello|hi Write the formal NFA for the same regular expressions
CMSC 330 Reduction (cont’d) Closure: A*
CMSC 330 Reduction (cont’d) Closure: A* <A> = (ΣA, QA, qA, {fA}, δA) <A*> = (ΣA, QA∪ {S0,S1}, S0, {S1}, δA∪ {(fA,ε,S1), (S0,ε,qA), (S0,ε,S1), (S1,ε,S0)})
CMSC 330 Practice Draw the NFA for these regular expressions using exactly the reduction method: (ab|bc*)* hello|(hi)* Write the formal NFA for the same regular expressions
CMSC 330 Reduction Complexity Given a regular expression A of size n... Size = # of symbols + # of operations How many states+transitionsdoes <A> have? 2+1 for each symbol 0+1 for each concatenation 2+4 added for each union 2+4 added for each closure O(n) That’s pretty good!
CMSC 330 -closure We say: if it is possible to transition from state p to state q taking only -transitions if p, p1, p2, … pn, q Q (p q) such that For any state p, the -closure of p is defined as the set of states q such that {q | }
CMSC 330 Example What’s the -closure of S2 in this NFA? {S2, S0}