10 likes | 130 Views
I/O-Efficient Spatial Data Structures: Observations on the d-Dimensional Grid File. Stuart A. MacGillivray and Bradford G. Nickerson Faculty of Computer science, University of New Brunswick, Fredericton, New Brunswick, Canada. Structure Definition and Motivations.
E N D
I/O-Efficient Spatial Data Structures: Observations on the d-Dimensional Grid File Stuart A. MacGillivray and Bradford G. Nickerson Faculty of Computer science, University of New Brunswick, Fredericton, New Brunswick, Canada Structure Definition and Motivations • The Grid File is a linear-space structure for storing multidimensional point data on a disk, allowing I/O-efficient search. • Points are stored on the disk in cell-blocks of fixed size. • Subdirectories stored on the disk in fixed blocks contain pointers to these cell-blocks, and linear scales describing their extent. • Main memory M contains pointers to subdirectories and coarser linear scales. • Retrieval of a given point is thus possible in two disc accesses: one to retrieve the appropriate subdirectory, and one to retrieve the block of points. • Cells and subdirectories are spatially determined. Partitioning takes place dynamically as points are added. • Primary motivation: Storage of large amounts of data (e.g. millions to billions of data points) and retrieval of data with minimal I/O operations • Assumptions for tests: Disk page size of 4 kB, two-dimensional 32-bit indexing, 24 bytes per point. B = 900 blocks per subdirectory, C=170 points per block. R1 y linear scale R2 Test data and map courtesy of USGS Navigation Boomer Survey, http://quashnet.er.usgs.gov/data/1999/99023/navigation/boomer/index.htm x linear scale Range Queries and Limitations Theoretical Extensions • Range searches require few disk accesses; worst case scenario, i.e. R2, would require 2d times as many disk accesses as necessary for retrieval of a single point. • Range search is possible in O(2d + K/B) I/Os. • Main directory is limited by constraints of main memory. • Assuming main memory M of 4 GB and test assumptions, a grid file could index a maximum of 1.53 x 1014 points, around 150 terabytes of data. • Other limitations include the structure itself; as currently written, divisions between blocks and subdirectories are determined as points are added to the file. • Poorly distributed points may result in an inefficient structure. • Extensible with additional layers of subdirectories, i.e. k layers. • Every additional layer increases the number of disk accesses needed to retrieve a point by 1, and increases the capacity by a factor of B • k layers store up to N = BkCM points, e.g. k=4 => N=1.2x1020 • Sufficient extension in this regard gives logarithmic time for point and range queries • Single point retrieval: 1+k disk accesses, k at least logB( ) • N_ • CM • References: • Jürg Nievergelt, Hans Hinterberger, and Kenneth C. Sevcik. The grid file: An adaptable, symmetric multikey file structure. ACM Trans. Database Syst., 9(1):38-71, 1984. • Klaus Hinrichs. Implementation of the grid file: Design concepts and experience. BIT, 25(4):569-592, 1985. Sponsored by: