1 / 17

Succinct Orthogonal Range Search Structures on a Grid with Applications to Text Indexing

Succinct Orthogonal Range Search Structures on a Grid with Applications to Text Indexing. Prosenjit Bose, Carleton University Meng He , Unversity of Waterloo Anil Maheshwari and Pat Morin, Carleton University. 2D Orthogonal Range Search. A fundamental geometric query problem

chinue
Download Presentation

Succinct Orthogonal Range Search Structures on a Grid with Applications to Text Indexing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Succinct Orthogonal Range Search Structures on a Grid with Applications to Text Indexing Prosenjit Bose, Carleton University Meng He, Unversity of Waterloo Anil Maheshwari and Pat Morin, Carleton University

  2. 2D Orthogonal Range Search • A fundamental geometric query problem • Data sets: A set, N, of n points in the plane • Query: Given an orthogonal query rectangle R, return information about the points in N∩R • Orthogonal range counting queries • Orthogonal range reporting queries • k: size of the output

  3. Example Range counting query: 5 Range reporting query

  4. Classic Solutions

  5. Range Search on an n×n Grid • A special case: points coordinates are from [1..n]×[1..n] (rank space) • The general problem can be reduced to this special case using a standard approach • Alstrup et al. 2000 • Orthogonal range search structures in the rank space and succinct data structures

  6. Background: Succinct Data Structures • What are succinct data structures (Jacobson 1989) • Representing data structures using ideally information-theoretic minimum space • Supporting efficient navigational operations • Why succinct data structures • Large data sets in modern applications: textual, genomic, spatial or geometric

  7. Succinct Orthogonal Range Search Structures in rank space • Wavelet Trees (Grossi et al. 2003) • Space: n lgn + o (n lgn) bits • Query time for orthogonal range search (Makinen and Navarro 2006): • Restriction: no points have the same x or y coordinates • Counting: O(lg n) • Reporting: O(k lg n) • Applications • Space-efficient text indexes: Makinen and Navarro 2006, Chien et al. 2008

  8. Support counting: an Overview • Reduce orthogonal range counting to Dominance counting • Design a succinct data structure supporting dominance counting on a narrow grid, i.e. an n×t grid where t = O(lgε n) (0<ε<1). We also assume that each point has a distinct x-coordinate • Recursively divide the n×ngrid into narrow grids and use the above structure at each level • Remove the restriction that each point has a distinct x-coordinate

  9. Range counting on a Narrow Grid S = 2 3 4 4 1 3 1 1 3 2 4 2 3… Divide the grid into blocks of size lg2 n × t A 2D array A: A[i,j] stores the result of dominance counting when (i lg2 n+1, j) is given as the query point Divide each block into subblocks of size lgλ n × t (0<λ< ε) A 2D array B: B[i,j] stores, when (ilgλ n+1, j) is given as a query point, the result of dominance counting inside the block containing this point A table C that stores for each possible set of lgλ n points on a lgλ n × t grid and each query point in the grid, the result of dominance counting Space: n lg t + o(n) bits Time: O(1)

  10. Range Counting on an n×nGrid Transform the original grid into a narrow grid by grouping y-coordinates into ranges of size n/t Construct orthogonal range search structures for this narrow grid and recurse Number of levels: log t n Time: O(log t n) Space: n lg n + o(n lg n) bits

  11. More results • The restriction that each point has a distinct x-coordinate can be removed using 2n+o(n) extra bits • The support for range reporting is based on similar ideas but is more complicated • Our main result • Space: n lgn + o (n lgn) bits • Query time for orthogonal range • Counting: O(lg n / lglg n) • Reporting: O(k lg n / lglg n)

  12. Applications: Substring Search • Notation: • T-text, n-text size, σ-alphabet size • P-pattern, m-pattern length • occ-number of occurrences • Query: report the occurrences of P in T • Chien et al. 2008: O(n lg σ) bits, O(m + lg n × (logσn + occ lg n)) time • Our results: O(n lg σ) bits, O(m + lg n × (logσn + occ lg n) / lglg n) time

  13. Applications: Position-Restricted Substring Search • Query: Given a pattern P and a range [i, j], how many times does P occur in T[i, j]? • Makinen and Navarro 2006 • Space: 3n lg n + o(n lg n) bits • Time: O(m + occ lg n) • Our results: • Space: 3n lg n + o(n lg n) bits • Time: O(m + occ lg n / lglg n)

  14. Applications: Representing Small Integers • Data: A sequence S of n numbers in [1..s], where s = polylog (n) • Ferragina et al. 2007 • Space: nH0(S) + o(n) bits • Operations: rank/select in O(1) time • Our result: • New operation: Given a range of position [p1..p2] and a range of values [v1..v2], retrieve the entries in S[p1..p2] whose values are in [v1..v2] • Time: O(1) for counting, O(1) per entry for reporting

  15. Applications: A Restricted Versions of Range Search • Restriction: the query rectangle is defined by two points in the given point set • Notation: • c: the number of bits required to encode the coordinates of a point • Space: cn + n lg n + o(n lg n) bits • Time: • Counting: O (lg n / lglg n) • Reporting: O(k lg n / lglg n)

  16. Conclusions • We designed a succinct data structure for orthogonal range search on an n×ngrid that provides more efficient support for both counting and reporting queries • This structure can be used to improve and extend previous results on succinct data structures, such as succinct text indexes and sequence representation.

  17. Thank you!

More Related