1 / 37

Spatial Indexing I R-trees

Learn about the R-tree data structure for organizing geometric objects on disk to efficiently answer spatial queries such as range and nearest neighbor searches. Understand the grouping of objects close in space in nodes and the insertion/split algorithms.

nannier
Download Presentation

Spatial Indexing I R-trees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Spatial Indexing IR-trees

  2. Problem • Given a collection of geometric objects (points, lines, polygons, ...) • organize them on disk, to answer efficiently spatial queries (range, nn, etc)

  3. R-tree • In multidimensional space, there is no unique ordering! Not possible to use B+-trees • [Guttman 84] R-tree! • Group objects close in space in the same node • => guaranteed page utilization • => easy insertion/split algorithms. • (only deal with Minimum Bounding Rectangles - MBRs)

  4. R-tree • A multi-way external memory tree • Index nodes and data (leaf) nodes • All leaf nodes appear on the same level • Every node contains between m and M entries • The root node has at least 2 entries (children)

  5. Example • eg., w/ fanout 4: group nearby rectangles to parent MBRs; each group -> disk page I C A G H F B J E D

  6. A H D F G B E I C J Example • F=4 P1 P3 I C A G H F B J E P4 P2 D

  7. A H D F G B E I C J Example • F=4| m=2, M=4 P5 P6 I P6 P1 P2 P3 P4 C A P1 G P3 H F B J E P4 P2 D P5

  8. P1 A P2 B P3 C P4 R-trees - format of nodes • {(MBR; obj_ptr)} for leaf nodes x-low; x-high y-low; y-high ... obj ptr ...

  9. P1 A P2 B P3 C P4 R-trees - format of nodes • {(MBR; node_ptr)} for non-leaf nodes x-low; x-high y-low; y-high ... node ptr ...

  10. A H D F G B E I C J R-trees:Search P5 P6 I P1 P2 P3 P4 C P1 A G H P3 F B J E P4 P2 D

  11. A D H F G B E I C J R-trees:Search P5 P6 I P1 P2 P3 P4 C P1 A G H P3 F B J E P4 P2 D

  12. R-trees:Search • Main points: • every parent node completely covers its ‘children’ • nodes in the same level may overlap! • a child MBR may be covered by more than one parent - it is stored under ONLY ONE of them. (ie., no need for dup. elim.) • a point query may follow multiple branches. • everything works for any(?) dimensionality

  13. P1 A H D F P2 G B E I P3 C J P4 R-trees:Insertion Insert X P1 P3 I C A G H F B X J E P4 P2 D X

  14. P1 A H D F P2 G B E I P3 C J P4 R-trees:Insertion Insert Y P1 P3 I C A G H F B J Y E P4 P2 D

  15. P1 A H D F P2 G B E I P3 C J P4 R-trees:Insertion • Extend the parent MBR P1 P3 I C A G H F B J Y E P4 P2 D Y

  16. R-trees:Insertion • How to find the next node to insert a new object Y? • Using ChooseLeaf: Find the entry that needs the least enlargement to include Y. Resolve ties using the area (smallest) • Other methods (later)

  17. P1 A H D F P2 G B E I P3 C J P4 R-trees:Insertion • If node is full then Split : ex. Insert w P1 P3 K I C A G W H F B J K E P4 P2 D

  18. Q1 P3 P1 D H A C F Q2 P5 G K B E I P2 W J P4 R-trees:Insertion • If node is full then Split : ex. Insert w P3 P5 I K C A P1 G W H F B J E P4 P2 D Q2 Q1

  19. R-trees:Split • Split node P1: partition the MBRs into two groups. • (A1: plane sweep, • until 50% of rectangles) • A2: ‘linear’ split • A3: quadratic split • A4: exponential split: • 2M-1 choices P1 K C A W B

  20. seed2 R R-trees:Split pick two rectangles as ‘seeds’for group 1 and group 2; seed1

  21. seed2 R R-trees:Split • pick two rectangles as ‘seeds’for group 1 and group 2; • assign each rectangle ‘R’ to the ‘closest’‘group’: • ‘closest’: the smallest increase in area seed1

  22. R-trees:Linear Split • How to pick Seeds: • Find the rects with the highest low and lowest high sides in each dimension • Normalize the separations by dividing by the width of all the rects in the corresponding dim • Choose the pair with the greatest normalized separation • PickNext: pick one of the remaining rects and add it to the closest group

  23. R-trees: Quadratic Split • How to pick Seeds: • For each pair E1 and E2, calculate the rectangle J=MBR(E1, E2) and d= J-E1-E2. Choose the pair with the largest d • PickNext: • For each remaining rect calculate the area increase to include it in group d1 and d2 • Choose rect with highest difference: |d1-d2| • Assign to closest group

  24. R-trees:Insertion • Use the ChooseLeaf to find the leaf node to insert an entry E • If leaf node is full, then Split, otherwise insert there • Propagate the split upwards, if necessary • Adjust parent nodes

  25. R-Trees:Deletion • Find the leaf node that contains the entry E • Remove E from this node • If underflow: • Eliminate the node by removing the node entries and the parent entry • Reinsert the orphaned (other entries) into the tree using Insert • Other method (later)

  26. R-trees: Variations • R+-tree: DO not allow overlapping, so split the objects (similar to z-values) • R*-tree: change the insertion, deletion algorithms (minimize not only area but also perimeter, forced re-insertion )

  27. For simplicity node capacity = 3 In practice, in the order of 100

  28. R-Tree, Leaf Nodes

  29. R-Tree – Intermediate Nodes

  30. R-tree, Range Query y axis 10 m g h l 8 k f e E 2 6 i j E d 1 4 b a 2 c x axis 10 0 8 2 4 6 Root E E 1 2 E E E E E E 1 E 3 4 5 6 7 2 e a c d g b f j m i l h k E E E E E 4 5 3 6 7

  31. Range Query y axis 10 m g h l 8 k f e E 2 6 i j E d 1 4 b a 2 c x axis 10 0 8 2 4 6 Root E E 1 2 E E E E E E 1 E 3 4 5 6 7 2 e a c d g b f j m i l h k E E E E E 4 5 3 6 7

  32. R-tree • The original R-tree tries to minimize the area of each enclosing rectangle in the index nodes. • Is there any other property that can be optimized? R*-tree  Yes!

  33. R*-tree • Optimization Criteria: • (O1) Area covered by an index MBR • (O2) Overlap between directory MBRs • (O3) Margin of a directory rectangle • (O4) Storage utilization • Sometimes it is impossible to optimize all the above criteria at the same time!

  34. R*-tree • ChooseLeaf: • If next node is not a leaf node, choose the node using the following criteria: • Least overlap enlargement • Least area enlargement • Smaller area • Else • Least area enlargement • Smaller area

  35. R*-tree • SplitNode • Choose the axis to split • Choose the two groups along the chosen axis • ChooseSplitAxis • Along each axis, sort rectangles and break them into two groups (M-2m+2 possible ways where one group contains at least m rectangles). Compute the sum S of all margin-values (perimeters) of each pair of groups. Choose the one that minimizes S • ChooseSplitIndex • Along the chosen axis, choose the grouping that gives the minimum overlap-value

  36. R*-tree • Forced Reinsert: • defer splits, by forced-reinsert, i.e.: instead of splitting, temporarily delete some entries, shrink overflowing MBR, and re-insert those entries • Which ones to re-insert? • The ones that are further way from the center • How many? A: 30%

More Related