250 likes | 824 Views
The Quad tree. The index is represented as a quaternary tree Each internal node has four children, one per quadrant NW, NE, SW, SE Each leaf is associated a disk page, which stores the index entries. The kd-tree. A binary tree
E N D
The Quad tree • The index is represented as a quaternary tree • Each internal node has four children, one per quadrant • NW, NE, SW, SE • Each leaf is associated a disk page, which stores the index entries
The kd-tree • A binary tree • At each node, a dimension is selected to partition the space into two • Disk-based variants: KDB-tree, skd x1 X=x1 Y=y1 y1 y2
Mapping -- Indexing using PAMs • Map the MBR in 2-d into a point in 4-d: [(x1, x2), (y1, y2)] (x1, x2, y1, y2) • Transform the query into the new space • Use a 4-d PAM to answer queries. b Q Q a a b
Space Filling Curves • Assumption: att. values can be represented with some fixed # of bits • Space domain on each dimension: 2k values • Linearize the doman • Each point can be represented by a single dimensional value
Z-ordering 11 10 01 00 00 01 10 11
Z-ordering • The z-value is obtained by interleaving the bits. • Eg. X=01, Y=11 z-value = 0111 = 7 • Clustering effect on X-Y and z-values can be indexed using B+-trees • Range queries: problematic?
Z-ordering & Locational Keys 11 142 144 120 140 10 141 143 100 01 110 130 00 00 01 10 11 120 140 011* 110*
Hilbert Curve 111 110 101 100 011 010 001 000 100 011 010 001 000 110 111 101
Grid Files • Based on extendible hashing • Design principle: any point query can be answered in at most 2 disk accesses. • Two structures: k-dimensional array and k 1-dimensional array
Scales, Directory, Bucket • Data structures: • Linear scales • directory: an array whose elements are one-to-one correspondence with the grid cells; each entry points to a data bucket • data buckets
Grid Files • One page is associated with each cell • When a cell overflows, it is split into two cells and the points are assigned to the new cell • Two adjacent cells (buddies) can reference the same page
Grid Files... • Repetitive splitting by halving • Merging based on buddy system • Regions are represented as (cx, cy, dx, dy) • point queries: cx-dx <= qx <= cx+dx, • & cy-dy <= qy <= cy+dy
Grid Files... dx cy E A D F E F B C D B cx qx dy A C B D cx C E A F cy qy