830 likes | 852 Views
Explore the concepts of R-trees, Z-ordering, and other spatial indexing methods to efficiently organize geometric objects for spatial queries. Learn about insertion, splitting, search, and deletion algorithms in multidimensional spaces.
E N D
Spatial Indexing Many slides taken from George Kollios, Boston University
Spatial Indexing • Point Access Methods can index only points. What about regions? • Use the transformation technique • Z-ordering and quadtrees • New methods: Spatial Access Methods SAMs • R-tree and variations
Problem • Given a collection of geometric objects (points, lines, polygons, ...) • organize them on disk, to answer spatial queries (range, nn, etc)
R-tree • z-ordering: cuts regions to pieces -> dup. elim. • how could we avoid that? • Idea: try to extend/merge B-trees and k-d trees • R-tree: a generalization of the B+-tree for multidimensional spaces
Kd-Btrees • [Robinson, 81]: if f is the fanout, split point-set in f parts; and so on, recursively
Kd-Btrees • But: insertions/deletions are tricky (splits may propagate downwards and upwards) • no guarantee on space utilization
R-trees • [Guttman 84] Main idea: allow parents to overlap! • => guaranteed 50% utilization • => easier insertion/split algorithms. • (only deal with Minimum Bounding Rectangles - MBRs)
R-trees • A multi-way external memory tree • Index nodes and data (leaf) nodes • All leaf nodes appear on the same level • Every node contains between m and M entries • The root node has at least 2 entries (children)
Example • eg., w/ fanout 4: group nearby rectangles to parent MBRs; each group -> disk page I C A G H F B J E D
A H D F G B E I C J Example • F=4 P1 P3 I C A G H F B J E P4 P2 D
P1 A H D F P2 G B E I P3 C J P4 Example • F=4 P1 P3 I C A G H F B J E P4 P2 D
P1 A P2 B P3 C P4 R-trees - format of nodes • {(MBR; obj_ptr)} for leaf nodes x-low; x-high y-low; y-high ... obj ptr ...
P1 A P2 B P3 C P4 R-trees - format of nodes • {(MBR; node_ptr)} for non-leaf nodes x-low; x-high y-low; y-high ... node ptr ...
P1 A H D F P2 G B E I P3 C J P4 R-trees:Search P1 P3 I C A G H F B J E P4 P2 D
P1 A H D F P2 G B E I P3 C J P4 R-trees:Search P1 P3 I C A G H F B J E P4 P2 D
R-trees:Search • Main points: • every parent node completely covers its ‘children’ • a child MBR may be covered by more than one parent - it is stored under ONLY ONE of them. (ie., no need for dup. elim.) • a point query may follow multiple branches. • everything works for any(?) dimensionality
P1 A H D F P2 G B E I P3 C J P4 R-trees:Insertion Insert X P1 P3 I C A G H F B X J E P4 P2 D X
P1 A H D F P2 G B E I P3 C J P4 R-trees:Insertion Insert Y P1 P3 I C A G H F B J Y E P4 P2 D
P1 A H D F P2 G B E I P3 C J P4 R-trees:Insertion • Extend the parent MBR P1 P3 I C A G H F B J Y E P4 P2 D Y
R-trees:Insertion • How to find the next node to insert the new object? • Using ChooseLeaf: Find the entry that needs the least enlargement to include Y. Resolve ties using the area (smallest) • Other methods (later)
P1 A H D F P2 G B E I P3 C J P4 R-trees:Insertion • If node is full then Split : ex. Insert w P1 P3 K I C A G W H F B J K E P4 P2 D
R-trees:Split • Split node P1: partition the MBRs into two groups. • (A1: plane sweep, • until 50% of rectangles) • A2: ‘linear’ split • A3: quadratic split • A4: exponential split: • 2M-1 choices P1 K C A W B
seed2 R R-trees:Split • pick two rectangles as ‘seeds’; • assign each rectangle ‘R’ to the ‘closest’ ‘seed’ seed1
seed2 R R-trees:Split • pick two rectangles as ‘seeds’; • assign each rectangle ‘R’ to the ‘closest’ ‘seed’: • ‘closest’: the smallest increase in area seed1
R-trees:Split • How to pick Seeds: • Linear:Find the highest and lowest side in each dimension, normalize the separations, choose the pair with the greatest normalized separation • Quadratic: For each pair E1 and E2, calculate the rectangle J=MBR(E1, E2) and d= J-E1-E2. Choose the pair with the largest d
R-trees:Insertion • Use the ChooseLeaf to find the leaf node to insert an entry E • If leaf node is full, then Split, otherwise insert there • Propagate the split upwards, if necessary • Adjust parent nodes
R-Trees:Deletion • Find the leaf node that contains the entry E • Remove E from this node • If underflow: • Eliminate the node by removing the node entries and the parent entry • Reinsert the orphaned (other entries) into the tree using Insert • Other method (later)
R-trees: Variations • R+-tree: DO not allow overlapping, so split the objects (similar to z-values) • R*-tree: change the insertion, deletion algorithms (minimize not only area but also perimeter, forced re-insertion ) • Hilbert R-tree: use the Hilbert values to insert objects into the tree
R-trees - variations • what about static datasets (no ins/del/upd)? • Q: Best way to pack points?
R-trees - variations • what about static datasets (no ins/del/upd)? • Q: Best way to pack points? • A1: plane-sweep great for queries on ‘x’; terrible for ‘y’
R-trees - variations • what about static datasets (no ins/del/upd)? • Q: Best way to pack points? • A1: plane-sweep great for queries on ‘x’; bad for ‘y’
R-trees - variations • what about static datasets (no ins/del/upd)? • Q: Best way to pack points? • A1: plane-sweep great for queries on ‘x’; terrible for ‘y’ • Q: how to improve?
R-trees - variations • A: plane-sweep on HILBERT curve!
R-trees - variations • A: plane-sweep on HILBERT curve! • In fact, it can be made dynamic (how?), as well as to handle regions (how?)
R-trees - variations • Dynamic (‘Hilbert R-tree): • each point has an ‘h’-value (hilbert value) • insertions: like a B-tree on the h-value • but also store MBR, for searches
R-trees - variations • what about other bounding shapes? (and why?) • A1: arbitrary-orientation lines (cell-tree, [Guenther] • A2: P-trees (polygon trees) (MB polygon: 0, 90, 45, 135 degree lines)
R-trees - variations • A3: L-shapes; holes (hB-tree) • A4: TV-trees [Lin+, VLDB-Journal 1994] • A5: SR-trees [Katayama+, SIGMOD97] (used in Informedia)
R-trees - conclusions • Popular method; like multi-d B-trees • guaranteed utilization • good search times (for low-dim. at least) • Informix ships DataBlade with R-trees
Spatial Queries • Given a collection of geometric objects (points, lines, polygons, ...) • organize them on disk, to answer • point queries • range queries • k-nn queries • spatial joins (‘all pairs’ queries)
Spatial Queries • Given a collection of geometric objects (points, lines, polygons, ...) • organize them on disk, to answer • point queries • range queries • k-nn queries • spatial joins (‘all pairs’ queries)
Spatial Queries • Given a collection of geometric objects (points, lines, polygons, ...) • organize them on disk, to answer • point queries • range queries • k-nn queries • spatial joins (‘all pairs’ queries)
Spatial Queries • Given a collection of geometric objects (points, lines, polygons, ...) • organize them on disk, to answer • point queries • range queries • k-nn queries • spatial joins (‘all pairs’ queries)
Spatial Queries • Given a collection of geometric objects (points, lines, polygons, ...) • organize them on disk, to answer • point queries • range queries • k-nn queries • spatial joins (‘all pairs’ queries)
R-trees - Range search pseudocode: check the root for each branch, if its MBR intersects the query rectangle apply range-search (or print out, if this is a leaf)
P1 P3 I C A G H F B J q E P4 P2 D R-trees - NN search
P1 P3 I C A G H F B J q E P4 P2 D R-trees - NN search • Q: How? (find near neighbor; refine...)
R-trees - NN search • A1: depth-first search; then, range query P1 I P3 C A G H F B J E P4 q P2 D
R-trees - NN search • A1: depth-first search; then, range query P1 P3 I C A G H F B J E P4 q P2 D
R-trees - NN search • A1: depth-first search; then, range query P1 P3 I C A G H F B J E P4 q P2 D