590 likes | 793 Views
Uses of DFAs. DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given input string is a legal identifier in some language is a legal literal of a particular type in some particular programming language
E N D
Uses of DFAs • DFAs are a way of handling certain specialized issues involving character strings • For example, one might want to determine whether a given input string • is a legal identifier in some language • is a legal literal of a particular type in some particular programming language • is a possible word in a given natural language • contains a particular substring
Recognizing legal identifiers/literals • Typical sets of legal identifers (or literals) include those • strings beginning with a letter followed by some digits • strings consisting only of letters • strings that containing at most one digit • strings that begin with the underscore character • Note that all of these sets are languages
What’s a DFA (roughly)? • A DFA is an abstraction of a machine • It corresponds more to a program than to a (programmable) computer • It answers questions about membership in certain languages • That is, it answers yes/no questions of the form: is a string in a particular set of strings? • Note that the examples given above all have this form
An abstract machine for a simple switch • An abstraction of light switch might have two states • One would correspond to the “on” position • There would be transitions possible between the states • It would need to be clear which state was the initial state • This information could be summarized as in the diagram below
Labeled transitions • This machine is not yet a DFA, since it has nothing to do with strings or languages. • By labeling the transitions with symbols from an alphabet, we’d get a DFA (shown below) • If both transitions were labeled with a 0, the resulting DFA would correspond to the set of strings of odd length over the alphabet {0}. • That is, sequences of transitions that took us from the initial state to the “on” state would correspond to sequences of 0’s of odd length
A DFA for a simple switch • This DFA was constructed with JFLAP
Specifying a DFA • Just as in our example, specifying a DFA requires specifying • An input alphabet ({0} in our example) • A set of states • An initial state (the “off” state in our example) • A set of final states (consisting only of the “on” state in our example) • A set of transitions from state to state, each labeled with a symbol from the input alphabet
DFA Pragmatics • Realistic examples can have large alphabets and thus ugly diagrams • so we’ll use simple alphabets in most examples • States from which no final states are reachable can be omitted from diagrams • We call these dead states (or trap states)
Generalizing the alphabet • Suppose we generalized our example to a question about a larger alphabet – say {0,1} • Let’s (arbitrarily) say that we want to recognize those strings with an odd number of both 0’s and 1’s. • As you might guess, we need more than two states to do this. • A DFA for this language is given below • Note the conventions for initial and final states, and for state names
What’s a DFA – precisely? • A DFA (deterministic finite acceptor – or automaton) consists of • A finite (input) alphabet S • A finite set Q of states • An initial (or start) state q0 (from Q) • A set F of final states (a subset of Q) • A transition functiond : Q x S -> Q • Note that both the alphabet and set of states are finite -- a DFA has a finite description
Tabular representation of DFAs • Tabular representation of the DFA above: • d | 0 1 • q0 | q1 q2 • q1 | q0 q3 • q2 | q3 q0 • F q3 | q2 q1 • Note that the table implicitly gives the states, the start state, and the input alphabet • The final states have to be explicitly labeled
States and strings • Note that states in the DFA above correspond to strings as follows: • q0: even number of 0's; even number of 1's • q1: odd number of 0's; even number of 1's • q2: even number of 0's; odd number of 1's • q3: odd number of 0's; odd number of 1's • DFA states often have a (fairly) simple representation in terms of the strings that are taken there
The extended transition function • d may be extended to handle strings • This gives a function d*: Q x S* -> Q • d* is defined for all states q, strings w, and symbols a by • d*(q,l) = q • d*(q,wa) = d(d*(q,w), a) • Note that d* extends d in the sense that d*(q,a) = d(q,a) • to see this, let w = l above
Notational conventions • Lower-case letters near q in the English alphabet are used for states, perhaps with subscripts • We use S, Q, q0,F, d, and d* without comment if only one DFA is under discussion
Acceptance of a string • A string x is accepted by a DFA M iff d*(q0,x) e F, • where d* is the extended transition function • Fact: For this function d*, d*(q,xy) = d* (d*(q,x), y) for any q, x, and y
Acceptance of a language • A language L is accepted (or recognized) by M iff every string in L is accepted by M, and no string outside of L is accepted by M • A DFA M accepts exactly one language. • This language is denoted L(M) • A language is regular iff it is accepted by some DFA. • Fact: All finite languages are regular • Two DFAs are equivalent iff they accept the same language.
More examples of DFAs • strings alternating 0's and 1's State q3 is a dead state
Still another example of a DFA • For binary representations of integers (without leading 0’s) • all strings over {0,1} beginning with 1 • dead state omitted
Yet another example of a DFA • Strings over {0,1,.} that contain at most one period • This DFA is not minimal
A DFA for binary multiples of 3 • Here we identify l with 0 – a good exercise is to redo this example without this assumption
Regular expressions • Regular expressions are used to represent languages. • It turns out that every regular expression represents a regular language, • Also, every regular language may be represented by a regular expression. • The language represented by the regular expression r is denoted L(r).
Regular expressions defined • An expression is a regular expression iff it has one of the following forms: • f, l, or a, where a is a member of some S • r+s, rs, r*, or (r), where r and s are regular expressions. • The expressions f, l and a respectively represent f, {l}, and {a} • The expressions r+s, rs, r*, and (r) respectively representL(r) U L(s), L(r)L(s), L(r)*, and L(r)
About regular expressions • Don’t confuse the empty language f with the nonempty language {l} • The precedence of the * operator is higher than for juxtaposition; for +, it’s lower. • These pairs of regular expressions are equivalent (represent the same languages) • rl and r; and r and lr • (rs)t and r(st) • (r+s)t and rt + st; and r(s+t) and rs + rt • r* and (r*)*; and (r+s)* and (r*s*)*
Examples of regular expressions • (0+1+2)* • all strings over {0,1,2} • a(a+b+c)* • all strings over {a,b,c} starting with a • (0+1)*010(0+1)* • all strings over {0,1} having 010 as a substring • 000+001+010+011+100+101+110+111 • all bit strings of length 3 • (0+1)(0+1)(0+1)(0+1) • all bit strings of length 4
Regular expressions and regular languages • For every regular expression r, L(r) is a regular language. • If L is a regular language, then L = L(r) for some regular expression r. • We will soon see constructive proofs of these claims. • The proof of the first claim requires a new concept -- nondeterminism
Problems with constructing DFAs • For some languages, the FAs that are most natural are incorrect. For example, • for (a+b)*ab(a+b)*, the natural FA below isn't deterministic (d is not a function) • for 0*1*, the natural FA recognizes (0+1)* • for 10+(12)*, the natural FA isn't deterministic, and accepts too many strings • for (10)*12, the natural FA isn't deterministic
Nondeterminism • One way of dealing with this issue is to allow finite automata a choice of moves for a given state and symbol. • This approach is called nondeterminism. • In some cases, it is also useful to allow l-moves that don't consume an input symbol. • It turns out that adding nondeterminism (even with l-moves) does not allow us to accept any new languages.
Modeling nondeterminism • Ways of thinking of nondeterminism: • omniscience • suppose that you could always guess correctly • branching processes • allocate a new processor for each choice • simultaneous processes • allow being in several states at once • backtracking • try each choice one after another
Sets of states • Our treatment of nondeterminism is based on the third approach above. • We will allow the computation to be in several states at once (or no state at all). • This requires a new transition function: • d: Q x (S U {l})-> 2Q • Note that l is a legal 2nd argument, and that the value is a (possibly empty) set of states.
NFAs • If our definition of a DFA is modified to use the new transition function, we get an NFA • Note that the “N” stands for “nondeterministic” rather than “not”. • The FA shown above for L((a+b)*ab(a+b)*) is a legal NFA for that language, with table: d | a b l q0 | {q0,q1} {q0} {} q1 | {} {q2} {} F q2 | {q2} {q2} {} U U
Acceptance by NFAs • An NFA accepts a string x iff d*(q0, x) contains a state of F, where d* is defined informally in Linz (p. 51). More precisely, d*(q, l) is the set of states accessible from q by zero or more l moves d*(q, a) is the set of states accessible from q by zero or more l moves, then one a move and then zero or more l moves d*(q, wa) is the union over all p in d*(q, w) of d*(p, a) d*(S, x) is the union over all q in S of d*(q, x)
An NFA for L((10)*12) • A table for an NFA for (10)*12: • δ | 0 1 2 l • q0 | {} {q1,q2} {} {} • q1 | {q0} {} {} {} • q2 | {} {} {q3} {} • F q3 | {} {} {} {}
The use of l moves in NFAs • A bad NFA for 0*1*: • A good NFA for 0*1*:
Constructing DFAs • We’ll soon have algorithms to find equivalent DFAs for NFAs or regular expressions. • In other cases, try starting with a start state. • For every new state, determine destination states from this state on every symbol • Tricky part: must these destination states be new states, or can old states be reused? • Useful observation: if x and y go to the same state, so do xz and yz for all z
Equivalence of NFAs and DFAs • Claim: Any language accepted by an NFA is regular • To show the claim, we need to find, for any NFA MN, a DFA MD with L(MD) = L(MN) • Idea: let states of the DFA correspond to sets of states of the NFA • we’ll use brackets to denote states of MD • Although there will be 2m states (if MN has m states), many will be dead or inaccessible
Constructing equivalent DFAs • [q0] will be the start state of MD • [S] is a final state iff S contains a final state of MN, or if [S] = [q0] and MN accepts l • The transition function dD is defined by letting dD([S],a) correspond to the union over all q in S of dN*(q,a), where dN is MN’s d function • We use dN* rather than dN to allow l moves • By induction, dD*([S],x)= [dN*(S,x)] for all x with |x| > 0, so that dD*([q0],x) = [dN*(q0,x)]
Example 1 of an equivalent DFA • The NFA below accepts L(0*1*) • δ | 0 1 l • q0 | {q0} {} {q1} • F q1 | {} {q1} {} • The equivalent DFA is given below • F [q0] | [q0,q1] [q1] • F [q0,q1] | [q0,q1] [q1] • F [q1] | [] [q1] • [] | [] [] • Note: the 4th state is a trap (dead) state
Example 2 of an equivalent DFA • The DFA equivalent to the NFA given above for L((10)*12) is: • δ | 0 1 2 • [q0] | [] [q1, q2] [] • [q1, q2] | [q0] [] [q3] • F [q3] | [] [] [] • Here the dead state and inaccessible states have been omitted
Another example with l moves • The NFA below accepts L(10+(12)*)
The equivalent DFA • The equivalent DFA is: • δ | 0 1 2 • F [q0] | [] [q2,q5] [] • [q2,q5] | [q3] [] [q4] • F [q3] | [] [] [] • F [q4] | [] [q5] [] • [q5] | [] [] [q4]
Breadth-first search (BFS) • Several steps in the algorithms above require finding all states accessible from some state q. • These steps can be implemented by a breadth-first search (BFS) algorithm • this algorithm is a specialization of the algorithm of Linz, p. 9 • it can be used in step 2 of Linz’s nfa-to-dfa algorithm of p. 59
The BFS algorithm • The algorithm maintains a queue of generated states whose successors have not been generated. • Until the queue becomes empty, the next state is dequeued and marked, and its successors found. • Those successors that are not marked are enqueued. • The queue will eventually empty, since there are only finitely many states
Using the BFS algorithm • BFS can find all l-paths leaving a state q • a “successor” is the destination of a l-transition • the queue is initialized with q only • Or all accessible states of a DFA • a “successor” is the destination of any transition • the queue is initialized with q0 only • Or all nondead states of a DFA • q’s “successor” is the source of any transition to q • the queue is initialized with all states of F
Two useful notions • A useful notion in both DFA minimization and in finding regular expressions for DFAs is identifying states with the strings taken there • That is, we identify q with {x | d*(q0,x) = q} • Another useful notion for minimization is distinguishability • For a language L, two strings x and y in S* are distinguishable iff for some z in S*, exactly one of {xz, yz} is in L
Minimal DFAs and distinguishability • A DFA separates x and y iff d*(q0,x) /= d*(q0,y) • DFAs must separate distinguishable strings • DFAs may separate indistinguishable strings • One might expect minimal DFAs to separate only distinguishable strings • Key to seeing that this is both correct and feasible is the notion of equivalence relations
Equivalence relations and minimization • Note first that indistinguishability is an equivalence relation on S*. • We’ll use the infix operator ~ for this relation • Also, being taken to the same state by a DFA M is an equivalence relation on S*. • The equivalence classes are the states of M • We’ll use the infix operator ~M for this relation • We’ll use the convention that [x] is the (equivalence) class containing x
The minimal DFA for L • Note that for a given L, ~M refines ~ • That is, if x ~M y, then x ~ y • So every class of ~M is in a single class of ~ • And ~ has no more classes than any ~M • If the classes of ~ can represent the states of some DFA for L, this DFA would be minimal. • And they can – we define a DFA M^ to have this set of states, and alphabet S, and …