190 likes | 206 Views
This research focuses on optimizing spatial joins for ad-hoc queries on mobile devices, aiming to minimize data transfer costs. The proposed MobiJoin algorithm dynamically optimizes query execution and statistics retrieval. Experimental evaluation includes varying data skew and buffer sizes.
E N D
Optimization of Spatial Joins on Mobile Devices N. Mamoulis1, P. Kalnis2, S. Bakiras3, X. Li2 1 Department of Computer Science and Information Systems, University of Hong Kong 2 Department of Computer Science, National University of Singapore 3 Department of Electrical and Electronic Engineering, University of Hong Kong
Restaurants Hotels Motivation • Users are equipped with a mobile device (eg. PDA) • Ad-hoc spatial queries • Combine data from remote servers “Find hotels which are within 500m of a seafood restaurant” • Servers do not collaborate with each other • The query is executed on the mobile device
Cost • Telecommunication companies typically charge by the bulk of transferred data (eg. GPRS), instead of connection time. • Goal: Minimize the amount of transferred data.
Mediators? Restaurants Hotels Mediator • Services may only allow end-user connections (eg., subscribers only) • Access through mediators may be more expensive • Requests are ad-hoc; existing mediators may not support them
Solution • Integrate the statistics retrieval with the query processing phase • Ask aggregate queries to estimate the data distribution • Partition the space recursively to achieve sub-linear transfer cost • Choose the physical operator indepen-dently for each partition
Related Work • Hash-based methods (eg. PBSM): require all data to be transferred • R-tree based methods (eg., [Tan et.al, TKDE, 2000]): require access to internal index • Mediators : • HERMES : Statistics from previous queries • DISCO, Garlic : Statistics during initialization • Tuckila : Optimize parts of the execution tree
Operators • WINDOW query: return all objects intersecting a window w • COUNT query: return the number of objects intersecting w • ε-RANGE query: return all objects within range ε from a point p We do not have access to the internal indices!
Hash based spatial join Each partition must fit in memory
Recursive evaluation Retrieve statistics for each subpart
Nested loop spatial join Recursive HBSJ : 4 QRY + 2 RCV + 5 RCV NLSJ : 2 RCV + 2 SND + 2 RES
Cost Model • TCP/IP: MTU = MSS + BH • c1: download |RW| objects from R and |Sw| objects from S and join them on the PDA • c2: download |RW| objects from R, send them as window queries to S and retrieve the results • c4: repartition w, retrieve detailed statistics and apply the algorithm recursively
MobiJoin algorithm MobiJoin(w, |Rw|, |Sw|) if |Rw|=0 or |Sw|=0 then return compute c1, c2, c3, c4 cmin = min(c1,c2,c3,c4) if cmin = c4 then impose a regular grid over w for each cell w’ in w retrieve |Rw’| and |Sw’| MobiJoin(w’, |Rw’|, |Sw’|) else follow action specified by cmin
Iceberg Spatial Semi-Join SELECT H.id FROM Hotels H, Restaurants R WHERE dist(H.location, R.location) ≤ ε GROUP BY H.id HAVING COUNT(*) ≥ m
Experimental setup • Implementation • Server: Unix • Client: HP-Ipaq PDA (WiFi network, 400MHz RISC CPU, 64MB RAM, Windows Pocket PC) • Datasets: • Synthetic: 1K – 10K points, varying skew • Real: Roads and railways of Germany • Algorithms: • NLSP: Only nested loop spatial join • HBSJ: Only hash-based spatial join
Varying the distance threshold ε PDA buffer = 5%
Varying the data skew Uniform data => MobiJoin reduces to HBSJ
Varying the PDA’s buffer size Packets Bytes Large buffer => HBSJ fails to prune the empty areas
Iceberg queries Uniform data Skewed data Real dataset (35K) joins a synthetic dataset (1K)
Conclusions • Distributed spatial joins on mobile devices • No mediator – non collaborative servers – limited set of supported operators • MobiJoin • Dynamically optimizes the entire process of statistics retrieval and query execution • Single ad-hoc query • Future work • Support multi-way spatial joins • Improve the accuracy of the cost model