200 likes | 358 Views
Space Efficient Data Structures for Dynamic Orthogonal Range Counting. Meng He and J. Ian Munro University of Waterloo. Dynamic Orthogonal Range Counting. A fundamental geometric query problem Definitions Data sets : a set P of n points in the plane
E N D
Space Efficient Data Structures for Dynamic Orthogonal Range Counting Meng He and J. Ian Munro University of Waterloo
Dynamic Orthogonal Range Counting • A fundamental geometric query problem • Definitions • Data sets: a set P of n points in the plane • Query: given an axis-aligned query rectangle R, compute the number of points in P∩R • Update: insertion or deletion of a point • Applications • Geometric data processing (GIS, CAD) • Databases
Classic Solutions and Our Result * For integer coordinates. • Matches the lower bound under the group modelPătraşcu (2007)
Background: Succinct Data Structures • What are succinct data structures (Jacobson 1989) • Representing data structures using ideally information-theoretic minimum space • Supporting efficient navigational operations • Why succinct data structures • Large data sets in modern applications: textual, genomic, spatial or geometric • A novel and unusual way of using succinct data structures (this paper) • Matching the storage cost of standard data structures • Improving the time efficiency
Dynamic Range Sum • Data • A 2D array A[1..r, 1..c] of numbers • Operations • range_sum(i1, j1, i2, j2): the sum of numbers in A[i1..i2, i2.. j2] • modify(i, j, δ): A[i, j] ← A[i, j] + δ • insert(j): insert a 0 between A[i, j-1] and A[i, j] for i = 1, 2, …, r. • delete(j): delete A[i, j] for fori = 1, 2, …, r. To perform this, A[i, j] must be 0 for all i. • Restrictions on r, c and δ and operations supported may apply.
Dynamic Range Sum: An Example 8 2 9 5 4 9 0 7 3 1 1 5 3 10 -2 2 9 1 8 0 0 0 0 0 5 12 0 3 1 0 0 4 2 8 3 5 4 1 0 4 1 0 18 5 5 range_sum(2, 3, 3, 6) = 25 insert(6) modify(2, 6, 5) modify(2, 6, -5) delete(6) range_sum(2, 3, 3, 7) = 30
Dynamic Range Sum in a small 2D Array • Assumptions and restrictions • Word size w: Ω(lg n) • Each number: nonnegative, O(lg n) bits • rc = O(lgλ n) , 0 < λ < 1 • modify(i, j, δ): |δ| ≤ lgn • insert and delete: no support • Our solution • Space: O(lg1+λ n) bits, with an o(n)-bit universal table • Time: modify and range_sum in O(1) time • Generalization of the 1D array version (Raman et al. 2001) • Deamortization is interesting
Range Sum in a Narrow 2D Array • Assumptions and restrictions • b = O(w): number of bits required to encode each number • “Narrow”: r = O(lgγc), 0 < λ < 1 • |δ| ≤ lgc • Our results • Space: O(rcb + w) bits, with an O(clgc)-bit buffer • Operations: O(lgc / lglgc) time • A generalization of the solution to CSPSI problem based on B trees (He and Munro 2010), using our small 2D array structure on each B-tree node
Range Counting in Dynamic Integer Sequences • Notation • Integer range: [1..σ] • Sequence: S[1..n] • Operations: • access(x): S[x] • rank(α, x): number of occurrences of α in S[1..x] • select(α, r): position of the rth occurrence of α in S • range_count(p1, p2, v1, v2): number of entries in S[p1.. p2] whose values are in the range [v1.. v2]. • insert(α, i): insert α between S[i-1] and S[i] • delete(i): delete S[i] from S
Range Counting in Integer Sequences: An Example S = 5,5,2,5,3,1,3,4,7,6,4,1,2,2,5,8 rank(5, 8) = 3 select(2, 3) = 14 range_count(6, 12, 2, 6) = 4
Range Counting in Sequences of Small Integers • Restrictions • σ = O(lgρn) for any constant 0 < ρ < 1 • Our result • Space: nH0 + o(nlgσ) + O(w) bits • Time: O(lg n / lglg n) • This is achieved by combining: • Our solution to range sum on narrow 2D arrays • A succinct dynamic string representation (He and Munro 2010)
Dynamic Range Counting: An Augmented Red Black Tree • Tx: A red black tree storing all the x-coordinates • Each node also stores the number of its descendants • Purpose: conversions between real x-coordinates and rank space in O(lg n) time
Dynamic Range Counting: A Range Tree • Ty: A weight balanced B-tree (Arge and Vitter 2003) constructed over all the y-coordinates • Branching factor d = Θ(lgεn) for constant 0 < ε< 1 • Leaf parameter: 1 • The levels are numbered 0, 1, … from top to bottom • Essentially a range tree • Each node represents a range of y-coordinates • Choice of weight balanced B-tree: amortizing a rebuilding cost
Dynamic Range Counting: A Wavelet Tree • Ideas from generalized wavelet trees (Ferragina et al. 2006) • For each node v of Ty, construct a sequence Sv: • Each entry of Sv corresponds to a point whose y-coordinate is in the range represented by node v • Sv [i] corresponds to the point with the ith smallest x-coordinate among all these points • Sv [i] indicates which child of v contains the y-coordinate of the above point • For each level m, construct a sequence Lm[1..n] of integers from [1..4d] by concatenating the all the Sv’s constructed at level m • Lm : stored as dynamic sequences of small integers • Space: O(n lg d + w) bits per level, O(n) words overall
Range Counting Queries • Query range: [x1..x2] × [y1..y2] • Use Tx to convert the query x-range to a range in rank space • Perform a top-down traversal to locate the (up to two) leaves in Ty whose ranges contain y1and y2 • Perform range_count on Sv for each node v visited in the above traversal • Sum up the query results to get the answer • Time: O(lg n / lglg n) per level, O(lg n / lglg n) levels
Insertions and Deletions • More complicated: splits and merges; changes to child ranks • The choice of storing Ty as weight balanced B-tree allows us to amortize the updating cost of subsequences of Lm’s • Additional techniques supporting batch updating of integer sequences are also developed
Our Results • Dynamic Orthogonal Range Counting • Space: O(n) words • Time: O((lg n / lglg n)2) • Points on a U×U grid • Space: O(n) words • Time (worst-case): O(lgn lgU / (lglg n)2) • Succinct representations of dynamic integer sequences • Space: nH0 + o(nlgσ) + O(w) bits • Time (including range_count): lgσ O(──── ( ──── + 1)) lglg n lgn lglg n
Conclusions • Results • The best result for dynamic orthogonal range counting • Same problem for points on a grid • The first succinct representations of dynamic integer sequences supporting range counting • Two preliminary results on dynamic range sum • Techniques • The first that combines wavelet trees with range trees • Deamortization on 2D arrays • Future work • Lower bound • Use techniques from succinct data structures to improve standard data structures