170 likes | 371 Views
Succinct Orthogonal Range Search Structures on a Grid with Applications to Text Indexing. Prosenjit Bose, Carleton University Meng He , Unversity of Waterloo Anil Maheshwari and Pat Morin, Carleton University. 2D Orthogonal Range Search. A fundamental geometric query problem
E N D
Succinct Orthogonal Range Search Structures on a Grid with Applications to Text Indexing Prosenjit Bose, Carleton University Meng He, Unversity of Waterloo Anil Maheshwari and Pat Morin, Carleton University
2D Orthogonal Range Search • A fundamental geometric query problem • Data sets: A set, N, of n points in the plane • Query: Given an orthogonal query rectangle R, return information about the points in N∩R • Orthogonal range counting queries • Orthogonal range reporting queries • k: size of the output
Example Range counting query: 5 Range reporting query
Range Search on an n×n Grid • A special case: points coordinates are from [1..n]×[1..n] (rank space) • The general problem can be reduced to this special case using a standard approach • Alstrup et al. 2000 • Orthogonal range search structures in the rank space and succinct data structures
Background: Succinct Data Structures • What are succinct data structures (Jacobson 1989) • Representing data structures using ideally information-theoretic minimum space • Supporting efficient navigational operations • Why succinct data structures • Large data sets in modern applications: textual, genomic, spatial or geometric
Succinct Orthogonal Range Search Structures in rank space • Wavelet Trees (Grossi et al. 2003) • Space: n lgn + o (n lgn) bits • Query time for orthogonal range search (Makinen and Navarro 2006): • Restriction: no points have the same x or y coordinates • Counting: O(lg n) • Reporting: O(k lg n) • Applications • Space-efficient text indexes: Makinen and Navarro 2006, Chien et al. 2008
Support counting: an Overview • Reduce orthogonal range counting to Dominance counting • Design a succinct data structure supporting dominance counting on a narrow grid, i.e. an n×t grid where t = O(lgε n) (0<ε<1). We also assume that each point has a distinct x-coordinate • Recursively divide the n×ngrid into narrow grids and use the above structure at each level • Remove the restriction that each point has a distinct x-coordinate
Range counting on a Narrow Grid S = 2 3 4 4 1 3 1 1 3 2 4 2 3… Divide the grid into blocks of size lg2 n × t A 2D array A: A[i,j] stores the result of dominance counting when (i lg2 n+1, j) is given as the query point Divide each block into subblocks of size lgλ n × t (0<λ< ε) A 2D array B: B[i,j] stores, when (ilgλ n+1, j) is given as a query point, the result of dominance counting inside the block containing this point A table C that stores for each possible set of lgλ n points on a lgλ n × t grid and each query point in the grid, the result of dominance counting Space: n lg t + o(n) bits Time: O(1)
Range Counting on an n×nGrid Transform the original grid into a narrow grid by grouping y-coordinates into ranges of size n/t Construct orthogonal range search structures for this narrow grid and recurse Number of levels: log t n Time: O(log t n) Space: n lg n + o(n lg n) bits
More results • The restriction that each point has a distinct x-coordinate can be removed using 2n+o(n) extra bits • The support for range reporting is based on similar ideas but is more complicated • Our main result • Space: n lgn + o (n lgn) bits • Query time for orthogonal range • Counting: O(lg n / lglg n) • Reporting: O(k lg n / lglg n)
Applications: Substring Search • Notation: • T-text, n-text size, σ-alphabet size • P-pattern, m-pattern length • occ-number of occurrences • Query: report the occurrences of P in T • Chien et al. 2008: O(n lg σ) bits, O(m + lg n × (logσn + occ lg n)) time • Our results: O(n lg σ) bits, O(m + lg n × (logσn + occ lg n) / lglg n) time
Applications: Position-Restricted Substring Search • Query: Given a pattern P and a range [i, j], how many times does P occur in T[i, j]? • Makinen and Navarro 2006 • Space: 3n lg n + o(n lg n) bits • Time: O(m + occ lg n) • Our results: • Space: 3n lg n + o(n lg n) bits • Time: O(m + occ lg n / lglg n)
Applications: Representing Small Integers • Data: A sequence S of n numbers in [1..s], where s = polylog (n) • Ferragina et al. 2007 • Space: nH0(S) + o(n) bits • Operations: rank/select in O(1) time • Our result: • New operation: Given a range of position [p1..p2] and a range of values [v1..v2], retrieve the entries in S[p1..p2] whose values are in [v1..v2] • Time: O(1) for counting, O(1) per entry for reporting
Applications: A Restricted Versions of Range Search • Restriction: the query rectangle is defined by two points in the given point set • Notation: • c: the number of bits required to encode the coordinates of a point • Space: cn + n lg n + o(n lg n) bits • Time: • Counting: O (lg n / lglg n) • Reporting: O(k lg n / lglg n)
Conclusions • We designed a succinct data structure for orthogonal range search on an n×ngrid that provides more efficient support for both counting and reporting queries • This structure can be used to improve and extend previous results on succinct data structures, such as succinct text indexes and sequence representation.