1 / 19

Succinct Representations of Dynamic Strings

Succinct Representations of Dynamic Strings. Meng He and J. Ian Munro University of Waterloo. Background: Succinct Data Structures. What are succinct data structures ( Jacobson 1989 ) Representing data structures using ideally information-theoretic minimum space

dino
Download Presentation

Succinct Representations of Dynamic Strings

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Succinct Representations of Dynamic Strings Meng He and J. Ian Munro University of Waterloo

  2. Background: Succinct Data Structures • What are succinct data structures (Jacobson 1989) • Representing data structures using ideally information-theoretic minimum space • Supporting efficient navigational operations • Why succinct data structures • Large data sets in modern applications: textual, genomic, spatial or geometric

  3. Strings: Definitions • Notation • Alphabet: [σ]={1, 2, …, σ} • String: S[1..n] • Operations: • access(i): S[i] • rank(α, i): number of occurrences of α in S[1..i] • select(α, i): position of the ithoccurrence of α in S

  4. Strings: An Example S = a a b a c cc d a d d a b bb c string_access(8) = d string_rank(a, 8) = 3 string_select(b, 3) = 14

  5. Succinct Representations of Strings • Information-theoretic minimum: nlgσbits • Succinct representation (Grossi et al. 2003) • Space: nH0+o(n)∙lgσ bits • Time: O(lgσ) • There are many more results. • The case in which σ = 2 (bit vector) is even more fundamental! • Jacobson 1989

  6. Applications of Strings and Bit Vectors • Ordinal trees on n nodes • Standard approach: 3nlgn bits • Succinct data structures: 2n + o(n) bits (Jacobson 1989, Munro & Raman 1997, Benoit et al. 1999…) • Full text indexes for text string from [σ]n • Suffix trees can use as much as 4nlgn to6nlgn bits! • Succinct data structures: nlgσ+o(nlgσ) bits (Grossi et al. 2003, González and Navarro 2009…) • Labeled trees, planar graphs, binary relations, permutations, functions, …

  7. Our Problem: Dynamic Strings • Motivation: In many applications, data are also updated frequently • For strings, we also consider the following update operations: • insert(α, i), which inserts character α between S[i-1] and S[i] • delete(i), which deletes S[i] from S

  8. Comparisons lgσ lgσ lgσ lgσ lgσ lgσ O(lgn ( ──── + 1)) O(──── ( ──── + 1)) O(lgn ( ──── + 1)) O(──── ( ──── + 1)) O(lgn ( ──── + 1)) O(lgn ( ──── + 1)) lglg n lglg n lglg n lglg n lglg n lglg n amortized lgn lgn lglg n lglg n For the special cases in which σ = polylog (n) or 2 (bit vector!), our results also improve previous results

  9. Searchable Partial Sums • Data • A sequence Q of n nonnegative integers • Operations • sum(i): Q[1] + Q[2] + … + Q[i] • search(x): the smallest isuch that sum(i) ≥ x • update(i, δ): Q[i] ← Q[i] + δ • Raman et al. 2001 • Assumptions: |Q| = O(lgε n), |δ| ≤ lg n • Space: O(lg1+ε n) bits, with a universal table of size O(nε’) bits • Operations: O(1) time

  10. Collections of Searchable Partial Sums • Data • d sequences of k-bit nonnegative integers of length n each • Operations • sum, search, update: supported on each sequence • insert, delete: operated simultaneously on the same positions of all the sequences, but only 0’s can be inserted or deleted • González and Navarro 2009 (CSPSI) 8 2 9 5 11 9 0 7 3 6 1 5 3 12 4 0 0 0 5 12 0 3 1 19 0 4 2 8 3 5 4 1 0 sum(2, 5) = 25 insert(6) delete(6)

  11. Our results on CSPSI • Assumptions • d = O(lgηn) • |δ| ≤ lg n • Space • O(kdn + w) bits, where w is the word size • Buffer: O(nlgn) bits • Time • All operations: lg n O ( ──── ) lglg n

  12. Data Structures for Dynamic Strings Over a Small Alphabet of size O(lg1/2 n) • Main data structure: a B-tree constructed over S • Leaf • Each leaf stores a superblock of at most 2L bits which encodes a substring of S (L = ) • The numbers of occurrences of each character in all the superblocks form an integer sequence • Maintain the above sequences for all the characters in the alphabet in a CSPSI structure E • Internal node v (lg1/2 n ≤ degree(v) ≤ 2lg1/2 n) • U(v): U(v)[i] = number of leaves of the subtree rooted at the i-th child of v • I(v): I(v)[i] = number of characters stored in the subtree rooted at the i-th child of v lg2n ──── lglgn

  13. Supporting Queries • rank(α, i) • Perform a top-down traversal with the help of I(v)’s • Locate the superblock, j, containing S[i] with the help of U(v)’s • Perform sum(α, j) operation on E to count the number of occurrences of αin superblocks 1, 2, … j-1 • Read superblock j in blocks of size (lg n) / 2 bits • The support for access and select is similar v … …

  14. Insert, delete and deamortization • Supporting insert and delete requires traversing and updating the B-tree and updating E • It is however much more complicated • Merging and splitting B-tree nodes • Deamortization

  15. Succinct Global Rebuilding • A key technique for deamortizing operations on B-trees is global rebuilding (Overmars and van Leeuwen 1981) • Global rebuilding • Rebuild the B-tree after the number of update operations performed exceeds half the initial length of the string • A new copy and an old copy of the B-tree: more space • A buffer of O(nlgn) bits is required • Succinct global rebuilding • Only one copy of the data: no duplication • During rebuilding, queries and updates are performed on either the new part or the old part • No buffer required

  16. Putting Everything Together • Dynamic strings over an alphabet of size O(lg1/2 n) • Space: nH0+o(n)∙lgσ bits • Time: • This can be extended to general alphabets using wavelet trees • Space: nH0+o(n)∙lgσ bits • Time: • When σ = polylog (n) or 2 (bit vectors) • Space: nH0+o(n)∙lgσ bits • Time: lgσ lg n lg n O ( ──── ) O(──── ( ──── + 1)) O ( ──── ) lglg n lglg n lglg n lgn lglg n

  17. Applications • Dynamic text collections • Data: a collection of text strings • Operations • Pattern search • Display a substring • Insert/delete a text string • Compressed construction of full-text indexes • Working space: nHk+o(n)∙lgσ bits • Time: lgσ O(──── ( ──── + 1)) lglg n nlgn lglg n

  18. Conclusions • We designed a succinct representation of dynamic strings that provide more efficient operations than previous results • This structure can be directly applied to improve previous results on text indexing • We expect our results to play an important role in the design of dynamic succinct data structures • We expect succinct global rebuilding to be useful for the deamotization of algorithms on dynamic succinct data structures

  19. Thank you!

More Related