270 likes | 431 Views
Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations. Shuhei Denzumi 1 , Ryo Yoshinaka 2, 1 , Shin-ichi Minato 1,2 , and Hiroki Arimura 1
E N D
Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations Shuhei Denzumi1, Ryo Yoshinaka2, 1, Shin-ichi Minato1,2, and Hiroki Arimura1 1) Hokkaido University2) JST ERATO Minato Discrete Structure Manipulation System Project
Background • Researches on string processing become active. • Massive online data: The internet and sensing networks. • String matching and string mining problems. • Data mining • Input data should be represented in compact form • Computation under compressed structure is needed Input Data Structure Result Input Compress Operation Input Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011
Manipulatable & Compact • Manipulatable Compact data structure • Represent data in compressed form • Have operations to manipulate data in compacted style • Get much attention for recent years • Binary Decision Diagram (BDD) • LSI area • Deterministic Finite Automata (DFA) • Natural Language Processing area Input D 1 Data Structure Input Compaction Operation D 3 Input D 2 Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011
What is Sequence BDD? • Sequence Binary Decision Diagram (SeqBDD, SDD). • Loekito, Bailey, and Pei (2009) • Graph structure • Represent finite sets of stringswith finite length • SDD’s basic properties are unknown • Minimization • Size complexity • Operation time • Application • Data mining • Graph mining • Human genome sequencing Text Text Text … Sequence BinaryDecision Diagram Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011
Family of BDDs • Compact representation for discrete structure • With rich algebraic operations BDD [Bryant 1986] Boolean functions xy ∨ yz∨ zx ¬xyz ∨ x¬yz∨ xy¬z SDD [Loekito, et.al 2009] Sets of strings ZDD [Minato 1993] Sets of combinations {{a}, {b}, {a, b}} {abc, acb, bac, bca} {{a}, {b}, {c}, {a, b, c}} {a, b, ab, bab, abbab} Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011
Result • Relationship to Acyclic Deterministic Finite Automata (ADFA) • Translation from an SDD to an ADFA and vice versa • An SDD is never larger than an ADFA • An SDD can be |Σ| times smaller than an ADFA • Computational complexity of binary set operations • Generalize eight set operations • Tight analysis on time complexity for binary set operation algorithm • Experimental results • SDDs can be smaller than ADFAs • Binary operation time Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011
Preliminary Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011
Definition a b … z 1 0 • Σ: alphabet (totally ordered by ≺) • Internal node: , , , , 1/0 - terminal node: / • 1/0 - edge: / • SDD: directed acyclic graph • Internal node S, τ(S) ↦ 〈S.lab, S.1, S.0〉 • S.lab: label • S.1: 1-child • S.0: 0-child • Ordering rule • N.lab ≺ (N.0).lab S S.1 S.lab ≺ ≺ ≺ a a b … z b c S.0 1 0 Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011
Semantics • L(N): set of strings N represents • L( ) = {ε} • L( ) = {} • L(N) = N.lab・L(N.1)∪L(N.0) • A path from the root to the 1-terminal noderepresent a string. 1 {aa, ab, bb} {aa, ab, bb} {aa, ab, bb} {aa, ab, bb} 0 a a a a {a, b} {a, b} {a, b} {a, b} a a a a b b b b {bb} {bb} {bb} {bb} {b} {b} {b} {b} b b b b {ε} {ε} {ε} {ε} 1 1 1 1 0 0 0 0 {} {} {} {} Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011
Comparison to ADFA • accept state • reject state 1 0 {aa, ab, bb} a a b b c c {aa, ab, bb} {a, b} {b} b a b c a {a, b} {bb} a b a a b c b a b b 1 0 Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011
a・{} ∪ L(N.0) = L (N.0) Reduction process N’ • Suppression • N.1 ≠ 0-terminal node • In ADFA, removing edges pointing dead state • Merging • τ(N) = τ(N’) ⇒ N = N’ • In ADFA, share all equivalent nodes • Theorem • Under these rules, SDD is unique and minimal • Like ADFA’s have unique canonical form x N.0 N.0 N N N.1 N.1 x x a N.0 N.0 0 Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011
Characteristic • Almost isomorphic to Acyclic Deterministic Finite Automata • BDD/ZDD techniques are applicable • Binary form • Simple recursive algorithm • Easy to implement • Rich collections of operations • Use of hash tables • To share equivalent nodes • To share intermediate computations BDD/ZDD ADFA SDD Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011
Relationship toAcyclic Automata Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011
Size • An SDD node correspond to an ADFA edge • The description size is proportional to|N|: the number of internal nodes in SDD N|A|: the number of edges in ADFA A a b c a b c Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011
Theorem: Size compare • For equivalent an SDD and an ADFA • From an ADFA A to an SDD N • From an SDD N to an ADFA A • SDD |Σ| times can be smaller than ADFA Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011
0-child sharing a e c c d e a b b d d c e Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011
Example {anbicj, n = 0, …, 4, i, j = 0, 1} ADFA A SDD S a a a 1 c a b b c c a b c c b a b c b c a a |S| = 6 |A| = 14 Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011
Experiment • Input: Canterbury corpus • BibleAll: bible.txt, BibleBi: all bigrams from bible.txt, Ecoli: E.coli.txt • Fac means store all fanctors of input data Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011
Binary Set Operation Algorithm Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011
Set operation P Q • A binary set operation♢ ∈ {∪, ∩, \, …} • Input: two SDDs P, Q • Output: SDD Rsuch thatL(R) = L(P) ♢ L(Q) Binary Set Operation P ♢Q Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011
Apply algorithm • Originally for BDD [Bryant 1986], applied to SDD • Based on the definition L(N) = N.lab ・ L(N.1) ∪ L(N.0) • In operation, (when P.lab = Q.lab)L(P) ♢ L(Q) = P.lab ・ (L(P.1) ♢ L(Q.1)) ∪ (L(P.0) ♢ L(Q.0)) P Q P♢Q a a a P1 P0 ♢ Q1 P1♢Q1 Q0 P0♢Q1 Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011
Hash table technique • Key-Value hashtables • Uniquetable • Key: 〈letter x, SDD node N1, SDD node N0〉 • Value: SDD node N with τ(N) = 〈x, N1, N0〉 • Opcache • Key: 〈operation id ♢, SDD node P, SDD node Q〉 • Value: SDD node R which is R = P ♢ Q P ♢ Q N1 P ♢Q N x Uniquetable Opcache Key (triple) Key (triple) 〈♢, P, Q〉 〈x, N1, N0〉 N0 Value (node) Value (node) R N Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011
Node create process • Any SDD node needed during computation is created via this process • Once an internal node is registered in Uniquetable, equivalent nodes will not created anymore. Check the Uniquetable for key 〈x, N1, N0〉. Exist Not exist Return it. Create a new node and return it. Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011
Time complexity • When P ♢ Q is executed • Every operation use Opcache • At most |P| ×|Q| different instances of recursive calls invoke • (Assume that the access time to hash tables is constant) • Naïve method • Prepare |P| × |Q| size table • This method • No useless or redundant node • Theorem • Worst case O(|P| |Q|) time • Example needs Ω(|P| |Q|) time exist • Lower and upper bound got Check the Opcachefor key 〈♢, P, Q〉. Exist Not exist P ♢ Q is already done, return it. Continue to computation on 0-side and 1-side. Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011
Experiment • Operation time • Prepare two SDDs for all factors of random texts of length n • Time to compute operation Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011
Conclusion • Relationship to Acyclic Automata • An SDD can be |Σ| times smaller than an ADFA • For real data, SDDs are 10~20 % more compact than ADFAs • Computational complexity of binary set operations • Worst case time complexity is quadratic • Tight time bound is analyzed • In our experiment, operation time is almost linear • Future work • Efficient implement of various operations • Propose substring index on SDD • Factor SDD construction algorithm Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011