450 likes | 587 Views
Three New Algorithms for Regular Language Enumeration. Margareta Ackerman University of Waterloo Waterloo, ON. Erkki Makinen University of Tempere Tempere , Finland. 0. B. 0. A. D. 1. What kind of words does this NFA accepts?. 1. 1. C. 1. E. 0. 0. B. 0. A. D. 1.
E N D
Three New Algorithms for Regular Language Enumeration Margareta Ackerman University of Waterloo Waterloo, ON ErkkiMakinen University of TempereTempere, Finland
0 B 0 A D 1 What kind of words does this NFA accepts? 1 1 C 1 E 0
0 B 0 A D 1 ε0 00 11 000 110 0000 1100 1110 00000 .... Cross-section problem: enumerate all words of length n accepted by the NFA in lexicographic order. 1 1 C 1 E 0
0 B 0 A D 1 ε0 00 11 000 110 0000 1100 1110 00000 .... Enumeration problem: enumerate the first m words accepted by the NFA in length-lexicographic order. 1 1 C 1 E 0
0 B 0 A D 1 ε0 00 11 000 110 0000 1100 1110 00000 .... Min-word problem: find the first word of length n accepted by the NFA. 1 1 C 1 E 0
Applications • Correctness testing, provides evidence that an NFA generates the expected language. • An enumeration algorithm can be used to verify whether two NFAs accept the same language (Conway, 1971). • A cross-section algorithm can be used to determine whether every word accepted by a given NFA is a power - a string of the from wnfor n>1, |w|>0. (Anderson, Rampersad, Santean, and Shallit, 2007) • A cross-section algorithm can be used to solve the “k-subset of an n-set” problem: Enumerate all k-subset of a set in alphabetical order. (Ackerman & Shallit, 2007)
Objectives Find algorithms for the three problems that are • Asymptotically efficient in • Size of the NFA (s states and d transitions) • Output size (t) • The length of the words in the cross-section (n) • Efficient in practice
Previous Work • A cross-section algorithm, where finding each consecutive word is super-exponential in the size of the cross-section (Domosi, 1998). • A cross-section algorithm that is exponential in n (length of words in the cross-section) is found in the Grail computation package. • “Breast-First-Search” approach • Trace all paths of length n in the NFA, storing the paths that end at a final state. • O(dσn+1), where d is the number of transitions in the NFA and σ is the alphabet size.
Previous Polynomial Algorithms: Makinen, 1997 • Dynamic programming solution • Min-word O(dn2) • Cross-section O(dn2+dt) • Enumeration O(d(e+t)) e:the number of empty cross-section encountered d: the number of transitions in the NFA n: the length of words in the cross-section t:the number of characters in the output Quadratic in n
Previous Polynomial Algorithms: Ackerman and Shallit, 2007 • Linear in the length of words in the cross-section • Min-word: O(s2.376n) • Cross-section: O(s2.376n+dt) • Enumeration: O(s2.376c+dt) c:the number of cross-section encountered d: the number of transitions in the NFA n: the length of words in the cross-section t:the number of characters in the output Linear in n
Previous Polynomial Algorithms: Ackerman and Shallit, 2007 • The algorithm uses “smart breadth first search,” following only those paths that lead to a final state. • Main idea: compute a look-ahead matrix, used to determine whether there is a path of length istarting at state s and ending at a final state. • In practice, Makinen’s algorithm (slightly modified) is usually more efficient, except on some boundary cases.
Contributions Present 3 algorithms for each of the enumeration problems, including: • O(dn) algorithm for min-word • O(dn+dt) algorithm for cross-section • Algorithms with improved practical performance for each of the enumeration problems
Contributions: Detailed • We present three sets of algorithms • AMSorted: - An efficient min-word algorithm, based on Makinen’s original algorithm. - A cross-section and enumeration algorithms based on this min-word algorithm. • AMBoolean: - A more efficient min-word algorithm, based on minWordAMSorted. - A cross-section and enumeration algorithms based on this min-word algorithm. 3. Intersection-based: - An elegant min-word algorithm. - A cross-section algorithm based on this min-word algorithm.
Key ideas behind our first two algorithms • Makinen’s algorithm uses simple dynamic programming, which is efficient in practice on most NFAs. • The algorithm by Ackerman & Shallit uses “smart breadth first search,” following only those paths that lead to a final state. • We build on these ideas to yield algorithms that are more efficient both asymptotically and in practice.
Makinen’s original min-word algorithm S[i] stores a representation of the minimal word w of length i that appears on a path from S to a final state. 2 B 0 A 1 3 C 1
Makinen’s original min-word algorithm The minimal word of length n can be found by tracing back from the last column of the start state. 2 B 0 A 1 3 C 1
Makinen’s original min-word algorithm • Initialize the first column • For columns i = 2...n • For each state S Find S[i] by comparing all words of length i appearing on paths from S to a final state. 2 B 0 A 1 3 C 1
Makinen’s original min-word algorithm • Initialize the first column • For columns i = 2...n • For each state S Find S[i] by comparing all words of length i appearing on paths from S to a final state. i operations 2 B 0 A 1 3 C 1
Makinen’s original min-word algorithm • Initialize the first column • For columns i = 2...n • For each state S Find S[i] by comparing all words of length i appearing on paths from S to a final state. i operations • Theorem: Makinen’s original min-word algorithm is O(dn2).
New min-word algorithm: MinWordAMSorted Idea: Sort every columns by the words that the entries represent. 2 B 0 321 A 1 031 3 120 C 1
New min-word algorithm: MinWordAMSorted • We define an order on {S[i] : S a state in N}. • If A[1]=a and B[1]=b, where a<b, then A[1]<B[1]. • For i > 1, A[i] = (a, A’) and B[i] = (b, B’) • If a<b, then A[i] < B[i]. • If a = b, and A’[i-1] < B’[i-1], then A[i] < B[i]. • If A[i] is defined, and B[i] is undefined, then A[i] > B[i].
New min-word algorithm: MinWordAMSorted • Initialize the first column • For columns i = 2...n • For each state S • Find S[i] using only column i-1 and the edges leaving S. • Sort column i 2 B 0 A 1 3 C 1
New min-word algorithm: MinWordAMSorted d operations • Initialize the first column • For columns i = 2...n • For each state S • Find S[i] using only column i-1 and the edges leaving S. • Sort column i s log s operations Theorem: The algorithm minWordAMSorted is O((s log s +d) n).
New cross-section algorithm: crossSectionAMSorted • A state S is i-complete if there exists a path of length i from state S to a final state. • To enumerate all words of length n: 1. Call minWordAMSorted (create a table) 2. Perform a “smart BFS”: - Begin at the start state. - Follow only those paths of length n that end at a final state, by using the table to identify i-complete states. O((s log s +d) n). O(dt) Theorem: The algorithm crossSectionAMSorted is O(n (s log s + d) + dn).
New enumeration algorithm: enumAMSorted Run the cross-section algorithm until the required number of words are listed, while reusing the table. Theorem: The algorithm enumAMSorted is O(c (s log s + d)+ dt). • c:the number of cross-section encountered • d: the number of transitions in the NFA • t:the number of characters in the output
What have we got so far? • c:the number of cross-section encountered • e:the number of empty cross-section encountered • d: the number of transitions in the NFA • n: the length of words in the cross-section • t:the number of characters in the output
New min-word algorithm: minWordAMBoolean Idea: instead of using a table to find the minimal word, construct a table whose only purpose is to determine i-complete states. Can be done using a similar algorithm to minWordAMSorted, but more efficiently, since there is no need to sort.
New min-word algorithm: minWordAMBoolean B 0 A 1 3 C
New min-word algorithm: minWordAMBoolean • Fill in the first column • For i=2 ... n • For every state S • Determine whether S is i-complete using only the transitions leaving S and column i-1 • Starting at the start state, follow minimal transitions to paths that can complete a word of length n (using the table). B 0 A 1 3 C
New min-word algorithm: minWordAMBoolean d operations • Fill in the first column • For i=2 ... n • For every state S • Determine whether S is i-complete using only the transitions leaving S and column i-1 • Starting at the start state, follow minimal transitions to paths that can complete a word of length n (using the table). B 0 A 1 3 C
New min-word algorithm: minWordAMBoolean d operations • Fill in the first column • For i=2 ... n • For every state S • Determine whether S is i-complete using only the transitions leaving S and column i-1 • Starting at the start state, follow minimal transitions to paths that can complete a word of length n (using the table). Theorem: The algorithm minWordAMBooleanis O(dn).
New cross-section algorithm: crossSectionAMBoolean O(dn). O(dt) Theorem: The algorithm crossSectionAMBoolean is O(dn+dt). • Extend to a cross-section algorithm using the same approach as the Sorted algorithm. • To enumerate all words of length n: • Call minWordAMBoolean (create a table) • Perform a “smart BFS”: - Begin at the start state. - Follow only those paths of length n that end at a final state, by using the table to identify i-complete states.
New enumeration algorithm: enumAMBoolean Theorem: The algorithm enumAMBoolean is O(de+ dn). • e:the number of empty cross-section encountered • d: the number of transitions in the NFA • n: the length of words in the cross-section • t:the number of characters in the output Run the cross-section algorithm until the required number of words are listed, while reusing the table.
What have we got so far? • c:the number of cross-section encountered • e:the number of empty cross-section encountered • d: the number of transitions in the NFA • n: the length of words in the cross-section • t:the number of characters in the output
Intersection-Based Algorithms • We present surprisingly elegant min-word and cross-section algorithms that have the asymptotic efficiency of the Boolean-based algorithms. • However, these algorithms are not as efficient in practice as the Boolean-based and Sorted-based algorithms.
New min-word algorithm: minWordIntersection Let N be the input NFA, and A be the NFA that accepts the language of all words of length n. • Let C = N x A • Remove all states of C that cannot be reached from the final states of C using reversed transitions. • Starting at the start state, follow the minimal n consecutive transitions to a final state.
New min-word algorithm: minWordIntersection Let N be the input NFA, and A be the NFA that accepts the language of all words of length n. • Let C = N x A • Remove all states of C that cannot be reached from the final states of C using reversed transitions. • Starting at the start state, follow the minimal n consecutive transitions to a final state. Let n = 2 Automaton A Automaton N 1 0 1 B 1 A 0 1 0 1 0 C 0
New min-word algorithm: minWordIntersection Let N be the input NFA, and A be the NFA that accepts the language of all words of length n. • Let C = N x A • Remove all states of C that cannot be reached from the final states of C using reversed transitions. • Starting at the start state, follow the minimal n consecutive transitions to a final state. Automaton C Automaton N 1 0 1 B 1 A 0 1 0 C 0
New min-word algorithm: minWordIntersection Let N be the input NFA, and A be the NFA that accepts the language of all words of length n. • Let C = N x A • Remove all states of C that cannot be reached from the final states of C using reversed transitions. • Starting at the start state, follow the minimal n consecutive transitions to a final state. 1 1
New min-word algorithm: minWordIntersection Let N be the input NFA, and A be the NFA that accepts the language of all words of length n. • Let C = N x A • Remove all states of C that cannot be reached from the final states of C using reverse transitions. • Starting at the start state, follow the minimal n consecutive transitions to a final state. 1 1 Thus the minimal word of length 2 accepted by N is “11”
Asymptotic running time of minWordIntersection • Let C = N x A • Remove all states of C that cannot be reached from the final states of C using reverse transitions. • Starting at the start state, Follow the minimal n consecutive transitions to final. Concatenatencopies of N. Each step is proportional to size of C, which is O(nd). Theorem: The algorithm minWordIntersection is O(dn).
New cross-section algorithm: crossSectionIntersection • To enumerate all words of length n, perform BFS on C = N x A, and remove all states not reachable from final state removed (using reverse transitions). • Since all paths of length n starting at the start state lead to a final state, there is no need to check for i-completness. Theorem: The algorithm crossSectionIntersection is O(dn+dt).
Practical Performance • We compared Makinen’s, Ackerman-Shallit, AMSorted, and AMBoolean, and Intersection-based algorithms. • Tested the algorithms on a variety of NFAs: dense, sparse, few and many final states, different alphabet size, worst case for Makinen’s algorithm, ect… • Here are the best performing algorithms: • Min-word: AMSorted • Cross-section: AMBoolean • Enumeration: AMBoolean
Summary • c:the number of cross-section encountered • e:the number of empty cross-section encountered • d: the number of transitions in the NFA • n: the length of words in the cross-section • t:the number of characters in the output : most efficient in practice
Open problems • Extending the intersection-based cross-section algorithm to an enumeration algorithm. • Lower bounds. • Can better results be obtained using a different order? • Restricting attention to a smaller family of NFAs.