1 / 27

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations. Shuhei Denzumi 1 , Ryo Yoshinaka 2, 1 , Shin-ichi Minato 1,2 , and Hiroki Arimura 1

markku
Download Presentation

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations Shuhei Denzumi1, Ryo Yoshinaka2, 1, Shin-ichi Minato1,2, and Hiroki Arimura1 1) Hokkaido University2) JST ERATO Minato Discrete Structure Manipulation System Project

  2. Background • Researches on string processing become active. • Massive online data: The internet and sensing networks. • String matching and string mining problems. • Data mining • Input data should be represented in compact form • Computation under compressed structure is needed Input Data Structure Result Input Compress Operation Input Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

  3. Manipulatable & Compact • Manipulatable Compact data structure • Represent data in compressed form • Have operations to manipulate data in compacted style • Get much attention for recent years • Binary Decision Diagram (BDD) • LSI area • Deterministic Finite Automata (DFA) • Natural Language Processing area Input D 1 Data Structure Input Compaction Operation D 3 Input D 2 Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

  4. What is Sequence BDD? • Sequence Binary Decision Diagram (SeqBDD, SDD). • Loekito, Bailey, and Pei (2009) • Graph structure • Represent finite sets of stringswith finite length • SDD’s basic properties are unknown • Minimization • Size complexity • Operation time • Application • Data mining • Graph mining • Human genome sequencing Text Text Text … Sequence BinaryDecision Diagram Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

  5. Family of BDDs • Compact representation for discrete structure • With rich algebraic operations BDD [Bryant 1986] Boolean functions xy ∨ yz∨ zx ¬xyz ∨ x¬yz∨ xy¬z SDD [Loekito, et.al 2009] Sets of strings ZDD [Minato 1993] Sets of combinations {{a}, {b}, {a, b}} {abc, acb, bac, bca} {{a}, {b}, {c}, {a, b, c}} {a, b, ab, bab, abbab} Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

  6. Result • Relationship to Acyclic Deterministic Finite Automata (ADFA) • Translation from an SDD to an ADFA and vice versa • An SDD is never larger than an ADFA • An SDD can be |Σ| times smaller than an ADFA • Computational complexity of binary set operations • Generalize eight set operations • Tight analysis on time complexity for binary set operation algorithm • Experimental results • SDDs can be smaller than ADFAs • Binary operation time Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

  7. Preliminary Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

  8. Definition a b … z 1 0 • Σ: alphabet (totally ordered by ≺) • Internal node: , , , , 1/0 - terminal node: / • 1/0 - edge: / • SDD: directed acyclic graph • Internal node S, τ(S) ↦ 〈S.lab, S.1, S.0〉 • S.lab: label • S.1: 1-child • S.0: 0-child • Ordering rule • N.lab ≺ (N.0).lab S S.1 S.lab ≺ ≺ ≺ a a b … z b c S.0 1 0 Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

  9. Semantics • L(N): set of strings N represents • L( ) = {ε} • L( ) = {} • L(N) = N.lab・L(N.1)∪L(N.0) • A path from the root to the 1-terminal noderepresent a string. 1 {aa, ab, bb} {aa, ab, bb} {aa, ab, bb} {aa, ab, bb} 0 a a a a {a, b} {a, b} {a, b} {a, b} a a a a b b b b {bb} {bb} {bb} {bb} {b} {b} {b} {b} b b b b {ε} {ε} {ε} {ε} 1 1 1 1 0 0 0 0 {} {} {} {} Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

  10. Comparison to ADFA •  accept state •  reject state 1 0 {aa, ab, bb} a a b b c c {aa, ab, bb} {a, b} {b} b a b c a {a, b} {bb} a b a a b c b a b b 1 0 Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

  11. a・{} ∪ L(N.0) = L (N.0) Reduction process N’ • Suppression • N.1 ≠ 0-terminal node • In ADFA, removing edges pointing dead state • Merging • τ(N) = τ(N’) ⇒ N = N’ • In ADFA, share all equivalent nodes • Theorem • Under these rules, SDD is unique and minimal • Like ADFA’s have unique canonical form x N.0 N.0 N N N.1 N.1 x x a N.0 N.0 0 Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

  12. Characteristic • Almost isomorphic to Acyclic Deterministic Finite Automata • BDD/ZDD techniques are applicable • Binary form • Simple recursive algorithm • Easy to implement • Rich collections of operations • Use of hash tables • To share equivalent nodes • To share intermediate computations BDD/ZDD ADFA SDD Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

  13. Relationship toAcyclic Automata Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

  14. Size • An SDD node correspond to an ADFA edge • The description size is proportional to|N|: the number of internal nodes in SDD N|A|: the number of edges in ADFA A a b c a b c Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

  15. Theorem: Size compare • For equivalent an SDD and an ADFA • From an ADFA A to an SDD N • From an SDD N to an ADFA A • SDD |Σ| times can be smaller than ADFA Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

  16. 0-child sharing a e c c d e a b b d d c e Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

  17. Example {anbicj, n = 0, …, 4, i, j = 0, 1} ADFA A SDD S a a a 1 c a b b c c a b c c b a b c b c a a |S| = 6 |A| = 14 Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

  18. Experiment • Input: Canterbury corpus • BibleAll: bible.txt, BibleBi: all bigrams from bible.txt, Ecoli: E.coli.txt • Fac means store all fanctors of input data Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

  19. Binary Set Operation Algorithm Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

  20. Set operation P Q • A binary set operation♢ ∈ {∪, ∩, \, …} • Input: two SDDs P, Q • Output: SDD Rsuch thatL(R) = L(P) ♢ L(Q) Binary Set Operation P ♢Q Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

  21. Apply algorithm • Originally for BDD [Bryant 1986], applied to SDD • Based on the definition L(N) = N.lab ・ L(N.1) ∪ L(N.0) • In operation, (when P.lab = Q.lab)L(P) ♢ L(Q) = P.lab ・ (L(P.1) ♢ L(Q.1)) ∪ (L(P.0) ♢ L(Q.0)) P Q P♢Q a a a P1 P0 ♢ Q1 P1♢Q1 Q0 P0♢Q1 Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

  22. Hash table technique • Key-Value hashtables • Uniquetable • Key: 〈letter x, SDD node N1, SDD node N0〉 • Value: SDD node N with τ(N) = 〈x, N1, N0〉 • Opcache • Key: 〈operation id ♢, SDD node P, SDD node Q〉 • Value: SDD node R which is R = P ♢ Q P ♢ Q N1 P ♢Q N x Uniquetable Opcache Key (triple) Key (triple) 〈♢, P, Q〉 〈x, N1, N0〉 N0 Value (node) Value (node) R N Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

  23. Node create process • Any SDD node needed during computation is created via this process • Once an internal node is registered in Uniquetable, equivalent nodes will not created anymore. Check the Uniquetable for key 〈x, N1, N0〉. Exist Not exist Return it. Create a new node and return it. Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

  24. Time complexity • When P ♢ Q is executed • Every operation use Opcache • At most |P| ×|Q| different instances of recursive calls invoke • (Assume that the access time to hash tables is constant) • Naïve method • Prepare |P| × |Q| size table • This method • No useless or redundant node • Theorem • Worst case O(|P| |Q|) time • Example needs Ω(|P| |Q|) time exist • Lower and upper bound got Check the Opcachefor key 〈♢, P, Q〉. Exist Not exist P ♢ Q is already done, return it. Continue to computation on 0-side and 1-side. Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

  25. Experiment • Operation time • Prepare two SDDs for all factors of random texts of length n • Time to compute operation Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

  26. Conclusion • Relationship to Acyclic Automata • An SDD can be |Σ| times smaller than an ADFA • For real data, SDDs are 10~20 % more compact than ADFAs • Computational complexity of binary set operations • Worst case time complexity is quadratic • Tight time bound is analyzed • In our experiment, operation time is almost linear • Future work • Efficient implement of various operations • Propose substring index on SDD • Factor SDD construction algorithm Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

  27. Thank you!

More Related