E N D
1. R-Tree Index Basics, Variations, and Cost
By: Michael Lindemuth & Mark Turner
2. Overview Spatial Data
Spatial Queries
R-Tree Index
Queries and Complexities
Variations
Implementations
Conclusions
3. Spatial Data Any Type of Geometry
Point
City
Line
Trail
Polygon
Border
A Collection of Geometries
Ski Resort Trails
Any Coordinate System
Meters
Pixels
WGS84 (GPS)
4. Spatial Queries Standard Insert, and Delete Queries
Spatial Range Queries
Find all cities within 20 miles of Tampa
Nearest Neighbor Queries
Find the closest pizza place to my address
Spatial Join Queries
Find all neighborhoods that are within 20 miles of a university
Geometry Set Operations
Equal(), Disjoint(), Intersect(), Touch(), Cross(), Within(), Contains(), Overlap(), Distance(), Buffer(), ConvexHull(), Intersection(), Union(), Difference(), SymmDiff(),…
OGIS Standard for SQL (http://www.opengeospatial.org/standards/sfs)
5. R-Tree Overview Proposed by
Antonin Guttman
UC Berkley
ACM SIGMOD 1984
All Spatial Data Enveloped
Minimum Bounding Rectangle (MBR)
Stored and Indexed According to MBR
Structure Resembles B+-tree
Height Balanced
Dynamic Index
Order of Queries Makes No Difference
6. R-Tree Index Structure For an index record <I, tuple-identifier>
I = (I0, I1, … In)
n = Number of Dimensions in the Geometry
Each I is a set of the form [a,b] describing the range of the rectangle along the dimension
a or b can be equal to infinity
Tuple-identifier points to a record
Non-leaf nodes are in the form: <I, child-pointer>
Same space complexity as a B+-tree, O(n)
7. Six R-Tree Properties Given
M is the maximum number of entries in one node
Parameter m = M/2 specifies the minimum number of entries in a node
Every Leaf Node Contains Between m and M index records unless it is root.
For each index record, <I, tuple-identifier> in a leaf node is the smallest rectangle that spatially contains the n-dimensional data object.
Every non-leaf node has between m and M children unless it is the root.
For each entry <I, child-pointer> in a non-leaf node, I is the smallest rectangle that spatially contains the rectangles in the child nodes.
The root node has at least two children unless it is a leaf.
All leaves appear on the same level.
8. R-Tree Structure An Example Structure of an R-Tree
Source:
http://en.wikipedia.org/wiki/Image:R-tree.jpg
9. Queries For all queries, it is possible to check if a point is within a rectangle in linear time.
Query Types To Be Reviewed
Insert
Delete
Nearest Neighbor
Multidimensional Range Queries
10. Insert Query Very Similar to B+-Trees
Start at the Root Node
Select the child that needs the least enlargement in order to fit the new geometry.
Repeat until at a leaf node.
If leaf node has available space insert
Else split the entry into two nodes
Update parent nodes
Update the entry that pointed to the node with a new MBR
Add a new entry for the second new node
If there is no space in the parent node, split and repeat
11. Insert Query Complexity IMPORTANT: Make sure nodes are split so they cover the smallest possible area.
Minimize search time
Example from Textbook Slides
Given
N = Number of entries in each node
T = Tree height
Worst Case
2 * N * T
O(n)
12. Delete Query Also similar to B+-trees
Search for Node to Remove
If node with the removed entry has too few entries, reallocate them
Recursively check the parent nodes until reaching the root
Update all MBR and remove all nodes that underfull
Reinsert all entries removed from the removed nodes according to the INSERT algorithm.
13. Delete Query Complexity Given
N = number of entries in each node
T = tree height
Complexity
2 * N * T
14. Nearest Neighbor Query Two Options
Branch-and-bound search
Best first search
Branch-and-bound
Find two distances to each object
Minimum distance from the search point to any side of the other object’s MBR
Minmax distance
Least of the furthest distance in every dimension
Lowest upper bound on the distance from the point to an object
Best First Search
Calculates minmax distance for all objects
R-Tree sorted by minmax distance
Removes nodes from sorted tree
If node has no children it is the nearest neighbor.
15. Nearest Neighbor Complexity Branch-and-Bound
Takes longer because it searches all nodes that have not been pruned
Best First Search
Investigates only the closest nodes
Large priority queue data structure in memory
Can cause thrashing
Run-time complexity subject to geometries
How many overlap and how large
16. Multidimensional Range Queries If the current node is not a leaf, check all the children with an MBR that overlap the range.
For all entries that overlap, search all children nodes
If a node is a child, check all entries and any that overlap are a match.
17. Multidimensional Complexity Worst Case
Linear Search
Every MBR overlaps the search area
Best Case
No more than one overlap at each level
O(logM n)
Again, dependent on geometries
18. Variations R+-Tree
Split Entries in the tree so that there is no overlap
No more multiple paths to reach a solution
Child pointers duplicated within the tree
R*-Tree
Do not split nodes on insert
Take entries from the overfull node and reinsert them into the tree
Changes MBRs
Saves time and possibly rebalances the tree
19. Implementations PostgreSQL (PostGIS extensions), MySQL, and Oracle All Use R-Trees for Spatial Indexing
Used for
CAD/CAM software
Circuit Design
Geographic Information Systems
Other alternatives
B+-Trees (Single and Multi-dimensional)
Transpose many dimensions to a single using some function.
Hilbert curves
Hard to find nearest neighbors
K-d tree
Nearest neighbors is more difficult
Not Balanced
Grid files
Larger than R-Tree
Not Balanced
20. Conclusions R-Trees are Everywhere
MBRs are the defining concept
Rest is mostly B+-Tree
Good for Defining and Relating Spatial Data
Multiple Variations
Basic still used by commercial DBMS platforms
21. Works Cited N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger, "The R*-tree: an efficient and robust access method for points and rectangles," SIGMOD Rec., vol. 19, pp. 322-331, 1990.
A. Guttman, "R-trees: a dynamic index structure for spatial searching," in Proceedings of the 1984 ACM SIGMOD international conference on Management of data Boston, Massachusetts: ACM, 1984.
C. Murray, Oracle Spatial Developer's Guide, 11g Release 1 (11.1). Redwood City, CA: Oracle USA, Inc., 2007.
T. K. Sellis, N. Roussopoulos, and C. Faloutsos, "The R+-Tree: A Dynamic Index for Multi-Dimensional Objects," in Proceedings of the 13th International Conference on Very Large Data Bases: Morgan Kaufmann Publishers Inc., 1987.
S. Shekhar and S. Chawla, Spatial Databases: A Tour. Upper Saddle River, New Jersey: Pearson Education, Inc., 2003.