1 / 24

Ad-hoc Distributed Spatial Joins on Mobile Devices

Research on ad-hoc distributed spatial joins using mobile devices, optimizing data transfer and querying efficiency. Includes algorithms, cost models, and experimental setups.

cyrilj
Download Presentation

Ad-hoc Distributed Spatial Joins on Mobile Devices

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ad-hoc Distributed Spatial Joins on Mobile Devices Panos Kalnis, Xiaochen Li National University of Singapore Nikos Mamoulis The University of Hong Kong Spiridon Bakiras Hong Kong University of Science and Technology

  2. Restaurants Hotels Motivation • Users are equipped with a mobile device (eg. PDA) • Ad-hoc spatial queries • Combine data from remote servers “Find hotels which are within 500m of a seafood restaurant” • Servers do not collaborate with each other • The query is executed on the mobile device

  3. Mediators? Restaurants Hotels • Services may only allow end-user connections (eg., subscribers only) • Access through mediators may be more expensive • Requests are ad-hoc; existing mediators may not support them Mediator

  4. Cost • Telecommunication companies typically charge by the bulk of transferred data (eg. GPRS), instead of connection time. • Goal: Minimize the amount of transferred data.

  5. Solution • Ask aggregate queries to estimate the data distribution (i.e., statistics) • Partition the space recursively to achieve sub-linear transfer cost • Choose the physical operator indepen-dently for each partition

  6. Related Work • Hash-based methods (eg. PBSM): require all data to be transferred • R-tree based methods (eg., [Tan et.al, TKDE, 2000]): require access to internal index • Mediators : • HERMES : Statistics from previous queries • DISCO, Garlic : Statistics during initialization • Tuckila : Optimize parts of the execution tree

  7. Operators NO access to the internal indices! • WINDOW query: return all objects intersecting a window w • COUNT query: return the number of objects intersecting w • ε-RANGE query: return all objects within range ε from a point p w ε p

  8. Query Types • Intersection Join • Find hotels which are inside parks • E-range Join • Find restaurants which are within 500m of a hotel • Iceberg Semi-join • Find hotels which are close to at least 3 restaurants ε

  9. Hash Based Spatial Join Each partition must fit in memory

  10. Recursive evaluation Retrieve statistics for each subpart

  11. Inefficient HBSJ

  12. Nested Loop Spatial Join Recursive HBSJ : 4 QRY + 2 RCV + 5 RCV NLSJ : 2 RCV + 2 SND + 2 RES

  13. Inefficient NLSJ

  14. Cost Model • TCP/IP: MTU = MSS + BH • c1: download |RW| objects from R and |Sw| objects from S and join them on the PDA • C2,3: download |RW| objects from R, send them as window queries to S and retrieve the results • c4: repartition w, retrieve detailed statistics and apply the algorithm recursively

  15. UpJoin (Uniform Partition Join) Decide if datasets are uniform If HBSJ is cheaper and both datasets are uniform then perform HBSJ If NLSJ is cheaper and the largest dataset is uniform then perform NLSJ Else repartition

  16. Uniformity check Dw Dw’3 Dw’2 Dw’0 Dw’1 % variation from uniform distribution • Note: UpJoin will not repartition if the cost for retrieving statistics is larger than the cost of joining

  17. Inefficient UpJoin

  18. SR-Join (Similarity Related Join) • Identify dense and sparse quadrants • If the distribution is similar then apply HBSJ or NLSJ • Else repartition % variationof density Area X X X X

  19. Experimental setup • Implementation • Server: Unix • Client: HP-Ipaq PDA (WiFi network, 400MHz RISC CPU, 64MB RAM, Windows Pocket PC) • Datasets: • Synthetic: 1K – 10K points, varying skew • Real: Roads and railways of Germany

  20. Uniform Uniform Setting the parameters α (for UpJoin) ρ (for SR-Join)

  21. Uniform Real Dataset

  22. Uniform Comparison with SemiJoin • SemiJoin: Use intermediate levels of R-Tree index • We cannot use it in practice, because we cannot access the index

  23. Conclusions • Distributed spatial joins on mobile devices • No mediator – non collaborative servers – limited set of supported operators • Two algorithms • UpJoin • SRJoin • Both estimate the datasets’ distribution • Future work • Support multi-way spatial joins • Improve the accuracy of the cost model

  24. Questions?

More Related