240 likes | 397 Views
Multiple-Site Distributed Spatial Query Optimization using Spatial Semijoins. Wendy Osborn and Saad Zaamout. Outline. Introduction Related Work Algorithm Performance Evaluation Conclusion and Future Work. Spatial Data. Canadian Cow Country. *borrowed from www.mapquest.ca.
E N D
Multiple-Site Distributed Spatial Query Optimization using Spatial Semijoins Wendy Osborn and SaadZaamout
Outline • Introduction • Related Work • Algorithm • Performance Evaluation • Conclusion and Future Work
Spatial Data Canadian Cow Country..... *borrowed from www.mapquest.ca
Distributed Database Montreal Calgary Toronto *borrowed from docs.google.com
Research Problem • Efficient processing of a distributed spatial query • Cost considerations: • data transmission • CPU • I/O
Related Work • Spatial join • Kang et al. (2002) • Spatial semijoins • Tan, Ooi, Abel (1995, 2000) • Karam and Petry (2006) • Limitations • Two-site distributed spatial queries
The Algorithm - Assumptions • Each site has one participating spatial relation • Each spatial relation has one spatial attribute • All MBRs in a relation are unique • relation cardinality = number of MBRs in relation • Each spatial relation is indexed by an R-tree
Spatial Semijoin Implementation • “Project” spatial attribute from relation R • obtain (MBR,ID) pairs from leaf node of R-tree • Transmit spatial attribute to relation S • Perform semijoin RSA S • Transmit identifiers from RSA whose MBR qualifies in the query back to relation R
Algorithm - Example R2 R3 800 200 R1 100 R4 600 QS
Algorithm - Overview • Sort and group by spatial attribute cardinality • Transmit spatial attributes • Execute spatial semijoins • Transmit qualifying tuples to query site
Algorithm – Stage 1 • All sites (i.e. relations) are sorted in ascending order of spatial attribute cardinality • Divided into two groups • P – the first n/2 sites • Q – the remaining n/2 sites
Algorithm - Stage 2 • Transmit spatial attribute from sites in P to sites in Q in the following manner: • Spatial attribute with smallest cardinality in P sent to site with smallest cardinality in Q • Spatial attribute with next smallest cardinality in P sent to site with next smallest cardinality in Q • and so on…
Algorithm Example P Q R4 R2 R1 R3 SA SA = MBR + ID
Algorithm – Stage 3 • Spatial semijoin performed between spatial attribute and relation at each site in Q • Result: • set of tuples from relation that qualify in the semijoin • set of identifiers from spatial attribute whose MBRs qualify in the semijoin • Identifiers shipped back to originating site in P
Algorithm Example P Q R4 R2 R1 R3 ID
Algorithm – Stage 4 R2 R3 QT R1 QT R4 QT QT QS
Performance Evaluation • comparison vs. naïve approach • six-site distributed spatial query • 100, 200, 400, 600, 800, 1000 tuples • each tuple has the following structure: • MBR, identifier, region name, population, line slope indicator
Cost Calculations • Data sizes: • Character – 1 byte • Integer – 2 bytes • long integer and double float – 8 bytes • Cost of transmitting an identifier • cost(ID) = sizeof(int) • Cost of transmitting a spatial attribute value (MBR) • cost(MBR) = 4 * sizeof(double) + sizeof(int) • Cost of transmitting a tuple • cost(MBR) + 20 * sizeof(char) * sizeof(longint) + sizeof(int)
Cost Calculations • Cost of performing a semijoin and transmitting tuples to query site: cost(X, Y, Z) = number_of_tuples(Y) * cost(MBR) + number_of_qualifiers(X) * cost(ID) + cost(tuple) + number_of_qualifiers(Z) * cost(tuple) • Calculated for all n/2 semijoins
Four- and Six-site Query Test • For the six-site query – 100, 200, 400, 600, 800, 1000 • Optimized = 127,456 bytes • Naïve = 198400 bytes • %improvement = 36%
Conclusions • For multiple-site queries, our algorithm outperforms the naïve approach in all cases • The greater the difference in relation sizes, the greater the reduction in data transmission
Future Work • CPU and I/O costs • Evaluate two-site queries vs. existing strategies • A real distributed database • Development of more multi-site distributed spatial query processing strategies