170 likes | 308 Views
The R+-Tree A Dynamic Index for Multi-Dimensional Objects. Timos K. Sellis et al. VLDB 1987 Jae- hoon Kim. Introduction. DBMS store one-dimensional data Integers Real numbers Strings DBMS do not handle sufficiently multi-dimensional data Boxes Polygons
E N D
The R+-Tree A Dynamic Index for Multi-Dimensional Objects Timos K. Selliset al. VLDB 1987 Jae-hoon Kim
Introduction DBMS store one-dimensional data • Integers • Real numbers • Strings DBMS do not handle sufficiently multi-dimensional data • Boxes • Polygons • Points in multi-dimensional space
Method for Multi-dimensional Data Common case of multi-dimensional data is points Main idea is divide the whole space into disjoint sub-region Sub-region contains no more than C points • C is capacity of disk page Insertion of new points → partitioning of a region (split)
Classification of known methods Position • Fixed : position of the splitting hyperplane is predetermined (grid file) • Adaptable : data points determine the position of the hyperplane (k-d tree) Dimensionality • 1-d cut : k-d tree • K-d cut : quad-tree, oct-tree Locality • Grid : splits not only the affected region, but also all the regions • Brickwall : restrict the splitting hyperplane to extend solely inside the region
Methods for Rectangles Transform into points in a higher dimension space • 2-d rectangle → a point in 4-d space • k-d trees, or grid file after a rotation of the axes Use space filling curve • Map a k-d space to a 1-d space • Transform k-dimensional object to line segment (z-transform) Divide the original space into sub-regions • Disjoint : can use method mentioned before • Overlapping : cut in two pieces and tag • R-tree : First proposed use of overlapping sub-region
R-Tree a1 a2 Extension of b-tree Height balanced tree Nodes are consist of MBR Guarantee that space utilization is at least 50%
R-Tree Split New entry Requirement of “good” split • Minimize the whole area • Minimize the overlap
R-Tree Insert & Split 8 3 4 1 7 2 A 5 6 B A B 1 2 3 4 5 5 6 7 8
Bad Search in R-Tree 8 3 4 1 7 2 A 5 6 B A B 1 2 3 4 5 6 7 8
R+-Tree Variant of R-tree Avoid overlapping of internal nodes by inserting an object into multiple leaves Leaf node : (oid, RECT) RECT : (xlow, xhigh, ylow, yhigh) Intermediate node : (p, RECT) p → pointer to a lower level node
Properties of R+-Tree Properties • Subtree rooted at the node pointed to by p contains a rectangle R if and only if R is covered by RECT → only exception is when R is at a leaf node • Intermediate node (p1, RECT1) and (p2, RECT2) → overlap between RECT1 , RECT2 is “0” • Root has at least two children unless it is a leaf • All leaves are at the same level
R+-Tree 8 B 3 4 1 C 7 2 A 5 6 A B C 1 2 3 4 6 7 8 4 5
Operations to keep the R+-tree Searching operation • First decompose the search space into disjoint sub-region • Descend the tree until the actual data object are found in the leaves Insertion operation • Searching the tree and adding the rectangle in leaf nodes • Difference from R-tree → add to more than one leaf node Deletion operation • Locating the rectangle that must be deleted and then removing it from leaf node Node Splitting operation • Two sub-nodes cover disjoint areas • Contrary to R-tree → downward propagation
Packing Algorithm Reduce the coverage of “dead space” Reduce the height expansion of R+-tree Packing algorithm • Pack attempts to set up an R+-tree with good search performance • Partition, Sweep, Pack Selection of x_ or y_ cut for Partition • Nearest neighbor • Minimal total x- and y- displacement • Minimal total space coverage accured by the two sub-regions • Minimal number of rectangle splits
Operations to build the R+-tree Partition operation • Decompose the total space into a locally optimal (search performance) • Use the sweep routine that parallel to x or y axis Sweep operation • Used to scan the rectangles and identify points where space partitioning is possible Pack operation • Pack is to organize a R+-tree depends on a set S of rectangles and the fill-factor ff of the tree. • Recursively pack the entries of each level of the tree from bottom up • In each level, partitioning non-leaf nodes and some of the rectangles have been split because of the chosen partition, recursively propagate the split downward and if necessary propagate the changes upward also.
Analysis Disk access for Two-Size Segments : Point Query Disk access for Two-Size Segments : Segment Query
Summary Advantage of R+-tree • Improved search performance, especially in point query • More than 50% saving in disk access Disadvantage of R+-tree • Tree height is more than R-tree • Use more space (duplicate node)