360 likes | 520 Views
Regular Languages and Expressions. Surinder Kumar Jain, University of Sydney. Regular Languages & Expressions. Automaton DFA NFA Ε - NFA CFG as a DFA Equivalence Minimal DFA Expressions Definition Conversion from/to Automaton Regular Langauges
E N D
RegularLanguages and Expressions Surinder Kumar Jain, University of Sydney
Regular Languages & Expressions • Automaton • DFA • NFA • Ε-NFA • CFG as a DFA • Equivalence • Minimal DFA • Expressions • Definition • Conversion from/to Automaton • Regular Langauges • Pumping Lemma – proving regularness • Closures • Equivalence
Deterministic Finite Automaton • A system with many states • Can transition from one state to another • Usually caused by external input • Set of states is finite • System is in one state at any given time
DFA • Mathematical Definition of a DFA • A = (Q, Σ,δ, q0,F) • Q : States, DFA is in one of these finite states at any time. • Σ : Input symbols, DFA changes its state from one state to another state on consuming an input symbol. • δ : Transition function. • Given a state and an input symbols, gives the next DFA state • Function over QxΣ -> Q. • q0 : Initial DFA state • F : Accepting states. Once DFA reaches one of these states, it may not accept any more input symbols.
DFA Example Q = { waiting, pending, rejected, approved, paid } Σ = {receive, reject, accept, pay } δ : (waiting -> receive -> pending), (pending -> reject -> rejected), (pending -> accept -> accepted), (accepted -> pay -> paid) q0 : {waiting} F : { rejected, paid }
Transition Diagrams start receive accept Accepted pay Waiting Pending Paid Paid reject Paid Rejected Q = { waiting, pending, rejected, approved, paid } Σ = {receive, reject, accept, pay } δ : (waiting -> receive -> pending), (pending -> reject -> rejected), (pending -> accept -> accepted), (accepted -> pay -> paid) q0 : {waiting} F : { rejected, paid }
Language • Set of alphabets • Concatenation (joining) • Strings • A subset of strings is a language • A DFA defines a language • Alphabet set is the set of input symbols • Concatenation - one symbol follows another • Acceptance – sequence of symbols takes DFA from start state to one of the accepting states
Non-deterministic Finite Automaton (DFA) • Five-tuple like a DFA, (Q, Σ,δ, q0,F) • Transition function returns a set not one state • Several outgoing arcs with same symbol • In several states at the same time • Language of NFA
Equivalence of DFA & NFA • Any NFA language can be described by some DFA • Adding non-determinism does not give any thing more • Why use NFAs then : • Easier to make for some languages • May have fewer states and less complex • Algorithm to convert NFA to DFA • For n state NFA,DFA may have up to 2n states • Can throw away inaccessible states • Observation : DFA has practically the same number of states as NFA though it often has more transitions
NFA to DFA conversion • For an NFA, N = {Q, Σ, δ, q0, F}, • Construct the DFA, D = {Qd, Σ, δd, {q0}, Fd} • Qd = Powerset of Q • δd(S, a) = Up in S δ(p,a) for every S in Qd. • Fd = S : S is subset of Q and S has an accepting state of NFA • DFA operates on one state at a time, NFA operates on sets of states. • Given a state, NFA gives a set of new states • Make all possible sets of DFA states as NFA states • Transit from one set of states to a new set of all possible state set • Any set with an accepting state is the accepting state in NFA
NFA to DFA conversion complexity • O(2n) (number of subsets of a set) • Efficient algorithm • Do not construct the entire power set • Start with start state • Only construct subsets that can reach an accepting state from the start state • The number of states in DFA is much less than 2n. • DFA has practically the same number of states as NFA though it often has more transitions
εpsilon - NFA • Includes ε (the empty string, not in alphabet set) as a transition • ε is identity in concatenation • a.ε = ε.a = a for all a • Spontaneous transition without an input
Equivalence to NFA • An ε-NFA language can be described by some NFA • Every NFA can be described by some DFA • Adding ε transition does not give any thing more • Why use ε-NFAs then : • Easier to make for some languages • Useful in proving equivalence of languages
Conversion to NFA • Conversion aims to remove ε transitions • Define a new set of states • ε are contained inside the set • No ε arc leaves or enters the new set of states • Epsilon closure (eclose) • For a state, set of all states reachable spontaneously • Follow the ε arcs recursively and include reachable states in the epsilon closure
epsilon-NFA to DFA conversion • For an ε-NFA, N = {Q, Σ, δ, q0, F}, • Construct the DFA, D = {Qd, Σ, δd, {eclose(q0)}, Fd} • Qd = { eclose(q) | q = eclose(q) and q in Q } • δd(S, a) = Up in S δ(p,eclose(a)) for every S in Qd. • Fd = S : S is subset of Q and S has an accepting state of NFA • DFA operates on one state at a time, ε-NFA operates on sets of states with no ε transition leaving the set • Make all eclose sets as DFA states • Transit from one set of states to a new set of all eclose state set • Any set with an accepting state is the accepting state in NFA
Programs as Automatan • An imperative program can be represented as a Control Flow Graph (CFG) with • statements at nodes and • predicates at edges • It can be converted into a CFG with • both statements and predicates at edges • by pushing node statements up incoming edges • Such a CFG is a DFA • Program points are States • Statements are input symbols that change program state from program point to point
Regular Expression • Algebraic expression to denote languages • Composed of symbols “ε”, “Ø”, “+”, “*”, “.”, “(“, “)” and alphabets • The language is generated using rules : • L(ε) = empty set • L(Ø) = empty set • L(a) = a for all alphabets a • L(p+q) = L(p) U L(q) • L(p.q) = { p’.q’ | p’ in L(p) & q’ in L(q) } • L(p*) = { qn | q in L(p) and n >= 0 }, q0= ε, qk=q.qk-1
Regular Expression Example a+b.c The language generated is : { a, b.c } a.b.c*.d the language generated is : { a.b.d, a.b.c.d, a.b.c.c.d, a.b.c.c.c.d, … } A finite way to express an infinite language
Equality of Languages DEFINITION • Two regular expression (or automaton) • are EQUAL • if they both generate same languages Thus (a.b)* + (b.a)* + a.(b.a)* + b.(b.a)* = (ε + b).(a.b)*.(ε+a)
Algebraic laws of regular expressions • p + q = q + p • (p + q) + r = p + (q + r) • (p.q).r = p.(q.r) • Ø + p = p + Ø = p • ε.p = p.ε = p • Ø.p = p.Ø = Ø • p.(q=r) = p.q + p.r • (p + q).r = p.r + q.r • p + p = p • (p*)* = p* • Ø* = ε • ε* = ε • p.p* = p*.p • (p + q)* = (p*.q*)*
Finite Automaton and Regular Expressions • Every language • defined by a finite automaton is also defined by some regular expression • defined by a regular expression is also defined by some DFA
DFA to Regular expression • Hopcroft’s formula • Rij(k) = Rij(k-1)+Rik(k-1).(Rkk(k-1))*.Rkj(k-1) • Rij(n) is the regular expression of all paths from i to j. (n is the number of states) • States are sorted in some order and numbered 1 to n • Rij(k) is regular expression of all paths from i to j passing thru nodes whose sort order is less than k • Computed for all i,j for k=0, then k=1,…,k=n • Rs,f1(n)+…+Rs,fk(n) is the regular expression of the DFA • s is the start state, f1,…,fk are accepting states, n is the number of states.
DFA to RE - complexity • Hopcroft formula is O(n34n), • n3 to compute the table and • 4n as size of regular expression grows by 4 every time. • In practice it is close to O(n3) • By simplifying the regular expression at every step and • using judicious algorithm avoiding recomputation of Rkk(k) • Most DFAs have almost n and not 2n accessible states • A faster state elimination method close to O(n2) is also available
RE to Automatan conversion • Regular expression is converted to ε-NFA • ε-NFA can the be converted to NFA and to DFA • RE to ε-NFA conversion rules : • ε -> One edge (two state) DFA with ε transition • Ø -> Two state DFA with no edges • a -> Two state with “a” transition • + -> A new start/accept statejoining two arguments of + in parallel • . -> Accept of first is start of second • * -> An ε edge joining star/accept of argument and a new start/accept state • Convert resulting ε-NFA to a DFA
Direct conversion • Augment regular expression r to (r).# • Position number for each occurrence of alphabet • Compute for each node of syntax tree • nullable (ε in the language) • firstpos (set of possible first alphabets) • lastpos (set of possible last alphabets) • Compute for each position • followpos (set of possible next alphabet after this position) • Construct the DFA
Applications • Unix text search, search matching patterns (grep) • Lexical/Parser analysis • Parse text against a regular expression • find set of first tokens at this expression root • find set of last tkens at this expression root • can the expression at this root be null set • find set of next tokens after an alphabet position in a regular expression • Efficient search of patterns in very large repository (web text search)
Regular Language DEFINITION • A language (a set of strings) • is defined to be a regular language if • it can be defined by a finite automaton • by a DFA or • by an NFA or • by an ε-NFA or • by a regular expression • Four different ways to describe a regular language
Pumping Lemma • If L is a regular language then there exists • integer n such that • for every string w in L • we can break w into x, y, z such that w=x.y.z • y ε • |x.y| =< n • x.yk.z is in L (for all k >= 0) • Proof based on • For a DFA of length n • any string of length > n • must revisit a state • Used to prove that a language is not regular
Closure property • Language is a set of string over finite alphabets • Language operators : • Union of two languages L(A B) = L(A) L(B) - re • Intersection • Concatenation L(A.B) = { a.b | a in A, b in B} • Kleene Closure L(A*) = { an | a in A, n >= 0 } • a0 = ε for all a and an = an-1 • Compliment L(A’) = { a | a not in A } (with respect to some overall alphabet set) - dfa • Difference L(A-B) = L(A) – L(B) - dfa switch q0 F • Reversal L (A) = { ak.ak-1…a1 | a1…ak-1.ak in A } • Homomorphism – replace an alphabet with another regular expression • Inverse homomorphism
Decision properties • Is the language described empty? • Is a particualr string in the described language? • Do two different of languages actually describe the same language?
Conversions • Decision properties may require conversion between various forms. • Can the conversion be done in reasonable time?
Equivalence of automata • Equivalence of two states • States p and q in an automaton are Defined to be equivalent if • For all input strings applied at state p or q • p ends up in an accepting state • if and only if • q also ends up in an accepting state • The accepting state reached by p does not have to be same accepting state as that reached by q
Minimization of DFA • If two states p and q are equivalent • we can combine them together into a single state • it wont affect the language accepted by the DFA • This process of combining states together is called Minimization • Table-filling algorithm can find if two states are equivalent or not. Complexity O(n2) • Non-equivalent pairs are distinguishable
MinimuMDFA • Minimum DFA is unique • Eliminate all states not reachable from start • Determine which states are equivalent • Partition states into blocks of equivalent states • Equivalence is transitive • Thus no state is in two blocks • Equivalence of two Regular Languages • Convert them into their minimum DFAs • and check for isomorphism • Union method • Make a minimum DFA of the union of the two • Start state of the two original DFAs must be equivalent if and only if DFAs are equivalent