310 likes | 514 Views
CSIS 7101: Spatial Data (Part 2) Efficient Processing of Spatial Joins Using R-trees. Rollo Chan Chu Chung Man Mak Wai Yip Vivian Lee Eric Lo Sindy Shou Hugh Wang. Efficient Processing of Spatial Join Using R-trees. What is Spatial Data?
E N D
CSIS 7101:Spatial Data (Part 2)Efficient Processing of Spatial Joins Using R-trees Rollo Chan Chu Chung Man Mak Wai Yip Vivian Lee Eric Lo Sindy Shou Hugh Wang
Efficient Processing of Spatial Join Using R-trees • What is Spatial Data? • Consists of points, lines, rectangles, polygons, surfaces… • Two types of queries in DBS • Single scan and Multiple scan queries • How to retrieve spatial objects in GIS efficiently? • Spatial Access Method (SAM) – eg. R*-tree
What is Spatial Access Method? • Designed to support single scan query • eg. Window query • “Find all objects which intersect a given window” • Attempts to store objects which are close together in the data space on a common page • Reduces number of disk accesses
How is window query processed by SAM? • 1) Filter step • Find all objects whose minimum bounding rectangles intersects the query rectangle • 2) Refinement step • Check whether the objects fulfill the query condition
What is Spatial Join? • To combine two sets of spatial objects according to some spatial properties • It is an important type of query for multiple scanning in spatial DBS
Example of Spatial Join • Two relations: forests, cities (Assume an attributes in each relation represents the borders of forests and cities) • Example query would be: • “Find all forests which are in a city”
Problems when performing Spatial Join • It is too expensive in terms of CPU time and I/O time • Traditional index structure is not efficient for spatial join • How to make it more efficient? • R*-tree
Why using R*-tree for Spatial Join ? • To optimize CPU-time and I/O time • Less comparison than a simple nested loop • Other algorithms cannot be efficiently applied to spatial join
R*-tree Approach for Spatial Join • Suppose there are two R*-trees • R, S • Idea: • To use the property that directory rectangles form the minimum bounding box of data rectangles in the corresponding subtrees. • If the rectangles of two directory entries ER and ES have common intersection then there is a pair (rectR, rectS)
Is there anyway to be more efficient? • There are two areas we need to take into account in order to be more efficient • CPU – Time Tuning • I/O – Time Tuning
CPU – Time Tuning • Two ways to improve CPU – time • Restricting the search space • Spatial sorting and plane sweep
Restricting the search space • Idea: • Scan through each of two nodes marks all entries which are required for performing the join, (i.e. which intersect the intersecting rectangles of two nodes. ) • Then, each marked entry of one node is tested against all marked entries of the other node.
Restricting the search space (cont’d) Original: 7 of R * 7 of S 5 = 49 joins 1 4 6 2 1 2 1 1 5 3 2 6 2 3 7 Now: 3 of R * 2 of S 7 3 =6 joins Plus Scanning: 7 of R + 7 of S 4 = 14 times
Spatial sorting and plane sweep • Idea: • Sort the entries in a node of the R*-tree according to the spatial location of the corresponding rectangles. • Then move the Sweep-Line perpendicular to one of the axes from left to right to compute the intersections.
Example of Sorted Intersection Test r1.xu • t = r1 : r1 <--> s1 • t = s1 : s1 <--> r2 • t = r2 : r2 <--> s2, r2 <--> s3 • t = s2 : - • t = r3: r3 <--> s3 s1.xl < r1.xu s1.xl Sweep-Line
I/O Time Tuning • To achieve good I/O-performance with a buffer size as small as possible • R*-tree might occupy only small portion of LRU-buffer • Compute a read schedule of the pages to minimize the number of disk accesses • Local optimization policy based on spatial locality • Idea of Read Schedule: If a frequently used page always resides in the buffer, the number of disk access can be improved by a lot
Three such techniques • Local plane sweep • Local plane sweep with pinning • Local z-order
Local Plane-Sweep Order • Idea: • Based on spatial ordering, the plane-sweep algorithm creates a sequence of pairs of intersecting rectangles. • This sequence can be used to determine the read schedule of the spatial join.
Read schedule: Local Plane-Sweep Order (cont’d) 6 r3 r3 4 s2 s2 < , , , , , > 3 r1 r1 r4 r4 s1 s1 5 r2 1 r2 2
Local Plane-Sweep Order w/ Pinning • Idea: • Determine a pair of (Er,Es) of entries wrt local plane sweep order. Compute the degree of the rectangles of both entries • Deg(E.rect) = # of intersections between E.rect and the rectangles which belong to entries of the other tree that are not yet processed • Pin the page in the buffer whose corresponding rectangle has maximal degree • Perform spatial join on the pinned page with all other pages
Local Plane-Sweep Order w/ Pinning (cont’d) Er.rect = r1 Es.rect = s2 r3 Es 1 s2 0 2 Deg(r1) = Deg(s2) = 2 Er r1 r4 s1 r2
Local Z-Order • Idea: • Compute the intersections between each rectangle of the one node and all rectangles of the other node • Sort the rectangles according to the spatial location of their centers • Decompose the underlying space into cells of equal size and provide an ordering on this set of cells
Local Z-Order (cont’d) r3 III III s2 II IV IV II r1 r4 s1 I I r2 Read schedule: <s1,r2,r1,s2,r4,r3>
Number of Disk Access > 5384 5290 Size of LRU Buffer < 2392 2373
Number of Disk Access (cont’d) Size of LRU Buffer
Q & A That’s it for the Presentation Any Questions?
Reference • Brinkhoff T., Kriegel H.P., Seeger B. (1993). Institute of Computer Science, University of Munich. Efficient Processing of Spatial Joins Using R-trees. Washington, DC, USA: ACM-SIGMOD.