310 likes | 338 Views
A work in progress discussing an enhanced dynamic k-ary tree representation for improved data structure efficiency. The research explores navigation, operations, and performance of dynamic tree structures. It presents an innovative method for representing blocks, block frontiers, and inter-block pointers. The work aims to optimize operations including insertions, deletions, and specialized functions, while reducing space complexity compared to traditional tree representations.
E N D
An Improved Succinct Dynamic k-Ary Tree Representation(work in progress) Diego Arroyuelo Department of Computer Science, Universidad de Chile
Roadmap • Succinct data structures • Static tree representations • Dynamic tree representations • Our basic dynamic tree representation • Representing blocks • Representing the frontier of blocks • Representing inter-block pointers • Solving operations • Basic operations • Specialized operations • Discussion
Roadmap • Succinct data structures • Static tree representations • Dynamic tree representations • Our basic dynamic tree representation • Representing blocks • Representing the frontier of blocks • Representing inter-block pointers • Solving operations • Basic operations • Specialized operations • Discussion
Succinct data structures • In a k-ary tree each node has at most k children, each children labeled with a symbol in the set {1,…, k} (tries) • A succinct data structure requires space close to the information-theoretic lower bound • There are different k-ary trees with n nodes • Therefore, the information-theoretical lower bound is about bits if k is not a constant with respect to n
Succinct data structures • We are interested in succinct representation that can be navigated • We are interested in operations • parent(x): parent of node x • child(x, i): ith child of node x • child(x, a): child of node x by label a • depth(x) • degree(x) • subtree-size(x) • preorder(x) • is-ancestor(x, y): is node x an ancestor of node y? • insertions (assume in the leaves) • deletions (just for unary nodes and leaves) The traditional representation of trees requires nlog n bits for (almost) each operation
Succinct tree representations • Succinct representations for static trees: • LOUDS [Jacobson, FOCS’89] • Balanced Parentheses [MR, STOC’97] • DFUDS [Benoit et al., Algorithmica 2005] • xbw [Ferragina et al., FOCS’05] • Ultra succinct trees [Jansson et al., SODA’07] • These must be rebuilt from scrath upon insertion or deletion of nodes
Succinct tree representations • The case of succinct dynamic trees has been studied only for binary trees • Munro, Raman, and Storm [SODA’01] • 2n + o(n) bits • parent, child in constant time • Updates and subtree-size in O(polylog(n)) time • Raman and Rao [ICALP’03] • 2n + o(n) bits • Parent, child, preorder, and subtree-size in O(1) time • Updates in O((loglog n)1+e) amortized (O(log n loglog n) worst case) • k-ary trees: basic navigation in O(k) time (assume k is not a constant)
Dynamic balanced parentheses • Chan et al. [TALG 2007] define a dynamic representation for balanced parentheses • This can be used to represent a dynamic k-ary tree using O(n) bits of space • The time for all operations is related to the number of nodes in the tree rather than to k (O(log n) time) • This data structure cannot take advantage when k is asymptotically smaller than n (e.g., k = O(polylog(n))) We look to achieve o(log n) time whenever log k=o(log u)
Motivations • This work is motivated by previous works on LZ-indices • Space-efficient construction of LZ-index [AN, ISAAC’05] • Very preliminary representation: enlog n bits for pointers, child operation and insertions in O(k) worst-case time • LZ-index on disk [AN, CPM’07] • Basic operations in O(1) CPU time, yet enlog n bits are needed for pointers and does not support insertions nor deletions
Roadmap • Succinct data structures • Static tree representations • Dynamic tree representations • Our basic dynamic tree representation • Representing blocks • Representing the frontier of blocks • Representing inter-block pointers • Solving operations • Basic operations • Specialized operations • Discussion
Our basic tree representation • We incrementally divide the tree into disjoint blocks[MRS, RR, AN] • Every block represents a subtree of N nodes such that Nmin ≤ N ≤ Nmax • We arrange these blocks in a tree by adding inter-block pointers (entire tree is tree of subtrees)
frontier of the block duplicated nodes Our basic tree representation
Our basic tree representation • We define Nmin (minimum block size) as follows • Inter-block pointers should require o(n) bits • Therefore we define Nmin = Q(log2n)(In general, Nmin = Q(log n f(n)), for f(n) = w(1)) • In this way we have (worst case) one pointer out of Q(log2n) nodes • And hence o(n) bits for pointers
… Our basic tree representation We define Nmax (maximum block size) as follows • In case of block overflow we should be able to create a new block of size at least Nmin from the full block • In the worst case, the root of the block has its k children, all of them having a subtree of the same size • By choosing Nmax= Q(klog2n) we solve this problem
Our basic tree representation • The blocks cannot be as small as we would like • We support dynamic operations on the tree by: • Dividing the tree into blocks (we only need to rebuild a block upon updates) • Making these smaller trees dynamic (different to other approaches) • We represent the blocks using a dynamic DFUDS representation on top of Chan et al.’s [TALG, 2007] • We solve the basic navigation inside blocks in O(log N) = O(log k + loglog n) • Insertions can be also handled in the same time • We require overall 2n+o(n) bits
Roadmap • Succinct data structures • Static tree representations • Dynamic tree representations • Our basic dynamic tree representation • Representing blocks • Representing the frontier of blocks • Representing inter-block pointers • Solving operations • Basic operations • Specialized operations • Discussion
Representing the blocks • We represent the symbols Sp labeling the arcs of the trie with a data structure for rank and select [GN, submitted] • We compute childp(x, a) by • rank and select on Sp • childp(x, i) on p • childp(x, a) can be computed in O(log N log k / loglog N) = O((log2k + loglog n) / log(logk + log log n)) time • The space requirement is nlog k + o(nlog k) bits
Roadmap • Succinct data structures • Static tree representations • Dynamic tree representations • Our basic dynamic tree representation • Representing blocks • Representing the frontier of blocks • Representing inter-block pointers • Solving operations • Basic operations • Specialized operations • Discussion
Representing the frontier of a block • We need to indicate which nodes in a block have a pointer to a child block • This can be done by using a bit vector • However this would require 3n+o(n) bits overall for the tree structure • We define array Fp storing the preorders of the nodes having a child pointer • Since there are O(n/log2n) pointers, this requires o(n) bits
Representing the frontier of a block Array Fp is represented in differential form with a data structure for Searchable Partial Sums O(log N) time Tp: (((())(()))((()))) Fp: We must change all the preorders in FP from this position • 3 5 8 4 • (3) (8) (16) (20) • 3 6 8 4 • (3) (9) (17) (21)
Roadmap • Succinct data structures • Static tree representations • Dynamic tree representations • Our basic dynamic tree representation • Representing blocks • Representing the frontier of blocks • Representing inter-block pointers • Solving operations • Basic operations • Specialized operations • Discussion
Representing inter-block pointers • Pointers to child blocks • We store the pointers to child blocks in array PTRp • Increasingly sorted according to the preorders of the nodes in the frontier • Pointers to parent block • In each block p we need a pointer to the representation of the root of p in the parent block • However the position of a node change upon updates • A parent pointer is composed of • A pointer to the parent block q • If p is the j-th child of q, then we store value j in p
Tp: (((())(()))((()))) Fp: PTRp: p,1 p,2 p,4 p,3 Representing inter-block pointers p 1 2 3 4
Roadmap • Succinct data structures • Static tree representations • Dynamic tree representations • Our basic dynamic tree representation • Representing blocks • Representing the frontier of blocks • Representing inter-block pointers • Solving operations • Basic operations • Specialized operations • Discussion
Solving the basic operations • child(x, i): • Look for preorder of x in Fp • If we find it, follow child pointer to block q and apply childq on the root of q • Otherwise, use childp operation • This takes O(log N) = O(log k + loglog n) time • child(x,a) is solved in the same way, but using childp(x,a) instead • parent(x): if x is the root of block, follow parent pointer to block p. Then apply parentp(x)
Solving the basic operations • Insert: • We use the corresponding insertion operation on the block • When a block p becomes full • Choose node z in block p • Reinsert the nodes in the subtree of z in a new block q (along with the corresponding part in the frontier of p) • Delete the subtree of z from p • Total cost is O(log k + loglog n) amortized (if we are able to spend time proportional to the size of the subtree of z) • List of candidates subtrees in each block (o(n) bits overall)
Roadmap • Succinct data structures • Static tree representations • Dynamic tree representations • Our basic dynamic tree representation • Representing blocks • Representing the frontier of blocks • Representing inter-block pointers • Solving operations • Basic operations • Specialized operations • Discussion
x Sizep Solving specialized operations • We can solve other operations by using this representation • degree(x) • depth(x) • subtree-size(x)
Solving specialized operations • We can solve other operations by using this representation • preorder(x) • is-ancestor(x, y) • lca(x, y)
Conclusions • We have defined a representation for dynamic k-ary trees requiring space close to the information-theoretical lower bound • We can profit from smaller alphabets • o(log n) time for operations whenever log k = o(log n) • In particular, O(loglog n) time for k=O(polylog(n)) • Versus O(log n) time of Chan et al. for any alphabet size • We need extra o(nlog k) bits of space
Discussion • What happens if we have external pointers to the tree nodes? • Can we compress the dynamic DFUDS representation of blocks? (just as in [JSS, SODA’07]) • Suffix links in little space? (assuming a suffix-closed trie)