110 likes | 279 Views
Two issues in lexical analysis Specifying tokens (regular expression) Identifying tokens specified by regular expression. How to recognize tokens specified by regular expressions?
E N D
Two issues in lexical analysis • Specifying tokens (regular expression) • Identifying tokens specified by regular expression.
How to recognize tokens specified by regular expressions? • A recognizer for a language is a program that takes a string x as input and answers “yes” if x is a sentence of the language and “no” otherwise. • In the context of lexical analysis, given a string and a regular expression, a recognizer of the language specified by the regular expression answer “yes” if the string is in the language. • A regular expression can be compiled into a recognizer (automatically) by constructing a finite automata which can be deterministic or non-deterministic.
Non-deterministic finite automata (NFA) • A non-deterministic finite automata (NFA) is a mathematical model that consists of: (a 5-tuple • a set of states Q • a set of input symbols • a transition function that maps state-symbol pairs to sets of states. • A state q0 that is distinguished as the start (initial) state • A set of states F distinguished as accepting (final) states. • An NFA accepts an input string x if and only if there is some path in the transition graph from the start state to some accepting state. • Show an NFA example (page 116, Figure 3.21).
An NFA is non-deterministic in that (1) same character can label two or more transitions out of one state (2) empty string can label transitions. • For example, here is an NFA that recognizes the language ???. • An NFA can easily implemented using a transition table. State a b 0 {0, 1} {0} 1 - {2} 2 - {3} a 2 3 1 0 a b b b
The algorithm that recognizes the language accepted by NFA. • Input: an NFA (transition table) and a string x (terminated by eof). • output “yes” if accepted, “no” otherwise. S = e-closure({s0}); a = nextchar; while a != eof do begin S = e-closure(move(S, a)); a := next char; end if (intersect (S, F) != empty) then return “yes” else return “no” Note: e-closure({S}) are the state that can be reached from states in S through transitions labeled by the empty string.
Example: recognizing ababb from previous NFA • Example2: Use the example in Fig. 3.27 for recognizing ababb Space complexity O(|S|), time complexity O(|S|^2|x|)??
Construct an NFA from a regular expression: • Input: A regular expression r over an alphabet • Output: An NFA N accepting L( r ) • Algorithm (3.3, pages 122): • For , construct the NFA • For a in , construct the NFA • Let N(s) and N(t) be NFA’s for regular s and t: • for s|t, construct the NFA N(s|t): • For st, construct the NFA N(st): • For s*, construct the NFA N(s*): a N(s) N(t) N(s) N(t) N(s)
Example: r = (a|b)*abb. • Example: using algorithm 3.3 to construct N( r ) for r = (ab | a)*b* | b.
Using NFA, we can recognize a token in O(|S|^2|X|) time, we can improve the time complexity by using deterministic finite automaton instead of NFA. • An NFA is deterministic (a DFA) if • no transitions on empty-string • for each state S and an input symbol a, there is at most one edge labeled a leaving S. • What is the time complexity to recognize a token when a DFA is used?
Algorithm to convert an NFA to a DFA that accepts the same language (algorithm 3.2, page 118) initially e-closure(s0) is the only state in Dstates and it is unmarked while there is an unmarked state T in Dstates do begin mark T; for each input symbol a do begin U := e-closure(move(T, a)); if (U is not in Dstates) then add U as an unmarked state to Dstates; Dtran[T, a] := U; end end; Initial state = e-closure(s0), Final state = ?
Example: page 120, fig 3.27. • Question: • for a NFA with |S| states, at most how many states can its corresponding DFA have?