140 likes | 151 Views
Learn how to automate the scanning process using RE, NFA, DFA, and Hopcroft's algorithm to build a mini-scanner recognizing specific strings. Explore flex scanner generator and error detection in the practice. Let's delve into DFA optimization!
E N D
The scanning process • Goal: automate the process • Idea: • Start with an RE • Build a DFA • How? • We can build a non-deterministic finite automaton (Thompson's construction) • Convert that to a deterministic one (Subset construction) • Minimize the DFA (Hopcroft's algorithm) • Implement it • Existing scanner generator: flex
The scanning process: step 1 • Let's build a mini-scanner that recognizes exactly those strings of as and bs that end in ab • Step 1: Come up with a Regular Expression (a|b)*ab
The scanning process: step 2 • Step 2: Use Thompson's construction to create an NFA for that expression • We want to be able to automate the process • Thompson's construction gives a systematic way to create an NFA from a RE. • It builds the NFA in a bottom-up manner. • At any time during construction • there is only one final state • no transitions leave the final state • components are linked together using -productions.
The scanning process: step 2 • Step 2: Use Thompson's construction to create an NFA for that expression a a a b b b a|b (a|b)*
The scanning process: step 2 • Step 2: Use Thompson's construction to create an NFA for that expression a a b b (a|b)*ab
The scanning process: step 3 • Step 3: Use subset construction to convert the NFA to a DFA • Observation: • Two states qi, qk, linked together with an -productions in the NFA should be the same state in the DFA because the machine goes from qi to qk without consuming input. • The -closure() function takes a state q and returns all the states that can be reached from q on -productions only.
The scanning process: step 3 • Step 3: Use subset construction to convert the NFA to a DFA • Observation: • If, on some input a, the NFA can go to any one of k states, then those k state should be represented by a single state in the DFA. • The () function takes as input a state q and a character x and returns all states that we can go to from q when reading a single x.
The scanning process: step 3 • Step 3: Use subset construction to convert the NFA to a DFA • The start state Qo of the DFA is the -closure of the start state q0 of the NFA • Compute -closure((Q0, x)) for each valid input character x. This will generate new states. • Systematically compute -closure((Qi, x)) until no new states can be created. • The final states of the DFA are those that contain final states of the NFA.
a 3 5 a b 1 2 7 8 9 10 11 12 b 4 6 The scanning process: step 3 • Step 3: Use subset construction to convert the NFA to a DFA -closure(1) = {1, 2, 3, 4, 8, 9}
a 3 5 a b 1 2 7 8 9 10 11 12 b 4 6 The scanning process: step 3 Q0 = {1,2,3,4,8,9} (Q0, a) = {5,7,8,9,2,3,4,10,11} = Q1 (Q0, b) = {6,7,8,9,2,3,4} = Q2 (Q1, a) = Q1 (Q1, b) = {6,7,8,9,2,3,4,12} = Q3 (Q2, a) = Q1 (Q2, b) = Q2 (Q3, a) = Q1 (Q3, b) = Q2
a a 3 5 a b a a 1 b 1 2 7 8 9 10 11 12 b 0 a 3 b b 4 6 2 b The scanning process: step 3
a a a 1 b 0 a 3 b b 2 b The scanning process: step 4 • Step 4: Use Hopcroft's algorithm to minimize the DFA States Q0 and Q2 behave the same way, so they can be merged. Note that even though Q3 also behaves the same way, it cannot be merged with Q0 or Q2 because Q3 is a final state while Q0 and Q2 are not. (Q0, a) = Q1 (Q0, b) = Q2 (Q2, a) = Q1 (Q2, b) = Q2 a a a 1 b 0 3 b b
In practice • flex is a scanner generator that takes a RE specification and follows the described process to generate a DFA. • The user additionally specifies • actions to be performed whenever a valid string has been recognized • e.g. insert identifier in symbol table • error messages to be generated when the input string is invalid.
In practice • Errors that are typically detected during scanning include • Unterminated strings • Unterminated comments • Invalid characters