190 likes | 308 Views
Optimization of Spatial Joins on Mobile Devices. N. Mamoulis 1 , P. Kalnis 2 , S. Bakiras 3 , X. Li 2. 1 Department of Computer Science and Information Systems, University of Hong Kong. 2 Department of Computer Science, National University of Singapore.
E N D
Optimization of Spatial Joins on Mobile Devices N. Mamoulis1, P. Kalnis2, S. Bakiras3, X. Li2 1 Department of Computer Science and Information Systems, University of Hong Kong 2 Department of Computer Science, National University of Singapore 3 Department of Electrical and Electronic Engineering, University of Hong Kong
Restaurants Hotels Motivation • Users are equipped with a mobile device (eg. PDA) • Ad-hoc spatial queries • Combine data from remote servers “Find hotels which are within 500m of a seafood restaurant” • Servers do not collaborate with each other • The query is executed on the mobile device
Cost • Telecommunication companies typically charge by the bulk of transferred data (eg. GPRS), instead of connection time. • Goal: Minimize the amount of transferred data.
Mediators? Restaurants Hotels Mediator • Services may only allow end-user connections (eg., subscribers only) • Access through mediators may be more expensive • Requests are ad-hoc; existing mediators may not support them
Solution • Integrate the statistics retrieval with the query processing phase • Ask aggregate queries to estimate the data distribution • Partition the space recursively to achieve sub-linear transfer cost • Choose the physical operator indepen-dently for each partition
Related Work • Hash-based methods (eg. PBSM): require all data to be transferred • R-tree based methods (eg., [Tan et.al, TKDE, 2000]): require access to internal index • Mediators : • HERMES : Statistics from previous queries • DISCO, Garlic : Statistics during initialization • Tuckila : Optimize parts of the execution tree
Operators • WINDOW query: return all objects intersecting a window w • COUNT query: return the number of objects intersecting w • ε-RANGE query: return all objects within range ε from a point p We do not have access to the internal indices!
Hash based spatial join Each partition must fit in memory
Recursive evaluation Retrieve statistics for each subpart
Nested loop spatial join Recursive HBSJ : 4 QRY + 2 RCV + 5 RCV NLSJ : 2 RCV + 2 SND + 2 RES
Cost Model • TCP/IP: MTU = MSS + BH • c1: download |RW| objects from R and |Sw| objects from S and join them on the PDA • c2: download |RW| objects from R, send them as window queries to S and retrieve the results • c4: repartition w, retrieve detailed statistics and apply the algorithm recursively
MobiJoin algorithm MobiJoin(w, |Rw|, |Sw|) if |Rw|=0 or |Sw|=0 then return compute c1, c2, c3, c4 cmin = min(c1,c2,c3,c4) if cmin = c4 then impose a regular grid over w for each cell w’ in w retrieve |Rw’| and |Sw’| MobiJoin(w’, |Rw’|, |Sw’|) else follow action specified by cmin
Iceberg Spatial Semi-Join SELECT H.id FROM Hotels H, Restaurants R WHERE dist(H.location, R.location) ≤ ε GROUP BY H.id HAVING COUNT(*) ≥ m
Experimental setup • Implementation • Server: Unix • Client: HP-Ipaq PDA (WiFi network, 400MHz RISC CPU, 64MB RAM, Windows Pocket PC) • Datasets: • Synthetic: 1K – 10K points, varying skew • Real: Roads and railways of Germany • Algorithms: • NLSP: Only nested loop spatial join • HBSJ: Only hash-based spatial join
Varying the distance threshold ε PDA buffer = 5%
Varying the data skew Uniform data => MobiJoin reduces to HBSJ
Varying the PDA’s buffer size Packets Bytes Large buffer => HBSJ fails to prune the empty areas
Iceberg queries Uniform data Skewed data Real dataset (35K) joins a synthetic dataset (1K)
Conclusions • Distributed spatial joins on mobile devices • No mediator – non collaborative servers – limited set of supported operators • MobiJoin • Dynamically optimizes the entire process of statistics retrieval and query execution • Single ad-hoc query • Future work • Support multi-way spatial joins • Improve the accuracy of the cost model