310 likes | 323 Views
An Improved Succinct Dynamic k-Ary Tree Representation ( work in progress ). Diego Arroyuelo Department of Computer Science, Universidad de Chile. Roadmap. Succinct data structures Static tree representations Dynamic tree representations Our basic dynamic tree representation
E N D
An Improved Succinct Dynamic k-Ary Tree Representation(work in progress) Diego Arroyuelo Department of Computer Science, Universidad de Chile
Roadmap • Succinct data structures • Static tree representations • Dynamic tree representations • Our basic dynamic tree representation • Representing blocks • Representing the frontier of blocks • Representing inter-block pointers • Solving operations • Basic operations • Specialized operations • Discussion
Roadmap • Succinct data structures • Static tree representations • Dynamic tree representations • Our basic dynamic tree representation • Representing blocks • Representing the frontier of blocks • Representing inter-block pointers • Solving operations • Basic operations • Specialized operations • Discussion
Succinct data structures • In a k-ary tree each node has at most k children, each children labeled with a symbol in the set {1,…, k} (tries) • A succinct data structure requires space close to the information-theoretic lower bound • There are different k-ary trees with n nodes • Therefore, the information-theoretical lower bound is about bits if k is not a constant with respect to n
Succinct data structures • We are interested in succinct representation that can be navigated • We are interested in operations • parent(x): parent of node x • child(x, i): ith child of node x • child(x, a): child of node x by label a • depth(x) • degree(x) • subtree-size(x) • preorder(x) • is-ancestor(x, y): is node x an ancestor of node y? • insertions (assume in the leaves) • deletions (just for unary nodes and leaves) The traditional representation of trees requires nlog n bits for (almost) each operation
Succinct tree representations • Succinct representations for static trees: • LOUDS [Jacobson, FOCS’89] • Balanced Parentheses [MR, STOC’97] • DFUDS [Benoit et al., Algorithmica 2005] • xbw [Ferragina et al., FOCS’05] • Ultra succinct trees [Jansson et al., SODA’07] • These must be rebuilt from scrath upon insertion or deletion of nodes
Succinct tree representations • The case of succinct dynamic trees has been studied only for binary trees • Munro, Raman, and Storm [SODA’01] • 2n + o(n) bits • parent, child in constant time • Updates and subtree-size in O(polylog(n)) time • Raman and Rao [ICALP’03] • 2n + o(n) bits • Parent, child, preorder, and subtree-size in O(1) time • Updates in O((loglog n)1+e) amortized (O(log n loglog n) worst case) • k-ary trees: basic navigation in O(k) time (assume k is not a constant)
Dynamic balanced parentheses • Chan et al. [TALG 2007] define a dynamic representation for balanced parentheses • This can be used to represent a dynamic k-ary tree using O(n) bits of space • The time for all operations is related to the number of nodes in the tree rather than to k (O(log n) time) • This data structure cannot take advantage when k is asymptotically smaller than n (e.g., k = O(polylog(n))) We look to achieve o(log n) time whenever log k=o(log u)
Motivations • This work is motivated by previous works on LZ-indices • Space-efficient construction of LZ-index [AN, ISAAC’05] • Very preliminary representation: enlog n bits for pointers, child operation and insertions in O(k) worst-case time • LZ-index on disk [AN, CPM’07] • Basic operations in O(1) CPU time, yet enlog n bits are needed for pointers and does not support insertions nor deletions
Roadmap • Succinct data structures • Static tree representations • Dynamic tree representations • Our basic dynamic tree representation • Representing blocks • Representing the frontier of blocks • Representing inter-block pointers • Solving operations • Basic operations • Specialized operations • Discussion
Our basic tree representation • We incrementally divide the tree into disjoint blocks[MRS, RR, AN] • Every block represents a subtree of N nodes such that Nmin ≤ N ≤ Nmax • We arrange these blocks in a tree by adding inter-block pointers (entire tree is tree of subtrees)
frontier of the block duplicated nodes Our basic tree representation
Our basic tree representation • We define Nmin (minimum block size) as follows • Inter-block pointers should require o(n) bits • Therefore we define Nmin = Q(log2n)(In general, Nmin = Q(log n f(n)), for f(n) = w(1)) • In this way we have (worst case) one pointer out of Q(log2n) nodes • And hence o(n) bits for pointers
… Our basic tree representation We define Nmax (maximum block size) as follows • In case of block overflow we should be able to create a new block of size at least Nmin from the full block • In the worst case, the root of the block has its k children, all of them having a subtree of the same size • By choosing Nmax= Q(klog2n) we solve this problem
Our basic tree representation • The blocks cannot be as small as we would like • We support dynamic operations on the tree by: • Dividing the tree into blocks (we only need to rebuild a block upon updates) • Making these smaller trees dynamic (different to other approaches) • We represent the blocks using a dynamic DFUDS representation on top of Chan et al.’s [TALG, 2007] • We solve the basic navigation inside blocks in O(log N) = O(log k + loglog n) • Insertions can be also handled in the same time • We require overall 2n+o(n) bits
Roadmap • Succinct data structures • Static tree representations • Dynamic tree representations • Our basic dynamic tree representation • Representing blocks • Representing the frontier of blocks • Representing inter-block pointers • Solving operations • Basic operations • Specialized operations • Discussion
Representing the blocks • We represent the symbols Sp labeling the arcs of the trie with a data structure for rank and select [GN, submitted] • We compute childp(x, a) by • rank and select on Sp • childp(x, i) on p • childp(x, a) can be computed in O(log N log k / loglog N) = O((log2k + loglog n) / log(logk + log log n)) time • The space requirement is nlog k + o(nlog k) bits
Roadmap • Succinct data structures • Static tree representations • Dynamic tree representations • Our basic dynamic tree representation • Representing blocks • Representing the frontier of blocks • Representing inter-block pointers • Solving operations • Basic operations • Specialized operations • Discussion
Representing the frontier of a block • We need to indicate which nodes in a block have a pointer to a child block • This can be done by using a bit vector • However this would require 3n+o(n) bits overall for the tree structure • We define array Fp storing the preorders of the nodes having a child pointer • Since there are O(n/log2n) pointers, this requires o(n) bits
Representing the frontier of a block Array Fp is represented in differential form with a data structure for Searchable Partial Sums O(log N) time Tp: (((())(()))((()))) Fp: We must change all the preorders in FP from this position • 3 5 8 4 • (3) (8) (16) (20) • 3 6 8 4 • (3) (9) (17) (21)
Roadmap • Succinct data structures • Static tree representations • Dynamic tree representations • Our basic dynamic tree representation • Representing blocks • Representing the frontier of blocks • Representing inter-block pointers • Solving operations • Basic operations • Specialized operations • Discussion
Representing inter-block pointers • Pointers to child blocks • We store the pointers to child blocks in array PTRp • Increasingly sorted according to the preorders of the nodes in the frontier • Pointers to parent block • In each block p we need a pointer to the representation of the root of p in the parent block • However the position of a node change upon updates • A parent pointer is composed of • A pointer to the parent block q • If p is the j-th child of q, then we store value j in p
Tp: (((())(()))((()))) Fp: PTRp: p,1 p,2 p,4 p,3 Representing inter-block pointers p 1 2 3 4
Roadmap • Succinct data structures • Static tree representations • Dynamic tree representations • Our basic dynamic tree representation • Representing blocks • Representing the frontier of blocks • Representing inter-block pointers • Solving operations • Basic operations • Specialized operations • Discussion
Solving the basic operations • child(x, i): • Look for preorder of x in Fp • If we find it, follow child pointer to block q and apply childq on the root of q • Otherwise, use childp operation • This takes O(log N) = O(log k + loglog n) time • child(x,a) is solved in the same way, but using childp(x,a) instead • parent(x): if x is the root of block, follow parent pointer to block p. Then apply parentp(x)
Solving the basic operations • Insert: • We use the corresponding insertion operation on the block • When a block p becomes full • Choose node z in block p • Reinsert the nodes in the subtree of z in a new block q (along with the corresponding part in the frontier of p) • Delete the subtree of z from p • Total cost is O(log k + loglog n) amortized (if we are able to spend time proportional to the size of the subtree of z) • List of candidates subtrees in each block (o(n) bits overall)
Roadmap • Succinct data structures • Static tree representations • Dynamic tree representations • Our basic dynamic tree representation • Representing blocks • Representing the frontier of blocks • Representing inter-block pointers • Solving operations • Basic operations • Specialized operations • Discussion
x Sizep Solving specialized operations • We can solve other operations by using this representation • degree(x) • depth(x) • subtree-size(x)
Solving specialized operations • We can solve other operations by using this representation • preorder(x) • is-ancestor(x, y) • lca(x, y)
Conclusions • We have defined a representation for dynamic k-ary trees requiring space close to the information-theoretical lower bound • We can profit from smaller alphabets • o(log n) time for operations whenever log k = o(log n) • In particular, O(loglog n) time for k=O(polylog(n)) • Versus O(log n) time of Chan et al. for any alphabet size • We need extra o(nlog k) bits of space
Discussion • What happens if we have external pointers to the tree nodes? • Can we compress the dynamic DFUDS representation of blocks? (just as in [JSS, SODA’07]) • Suffix links in little space? (assuming a suffix-closed trie)