1 / 37

Spatial Range Querying for Gaussian-Based Imprecise Query Objects

Spatial Range Querying for Gaussian-Based Imprecise Query Objects. Yoshiharu Ishikawa , Yuichi Iijima Nagoya University Jeffrey Xu Yu The Chinese University of Hong Kong. Outline. Background and Problem Formulation Related Work Query Processing Strategies Experimental Results Conclusions.

nola
Download Presentation

Spatial Range Querying for Gaussian-Based Imprecise Query Objects

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Spatial Range Querying forGaussian-Based Imprecise Query Objects Yoshiharu Ishikawa, Yuichi IijimaNagoya University Jeffrey Xu Yu The Chinese University of Hong Kong

  2. Outline • Background and Problem Formulation • Related Work • Query Processing Strategies • Experimental Results • Conclusions

  3. ImpreciseLocation Information • Sensor Environments • Frequent updates may not be possible • GPS-based positioning consumes batteries • Robotics • Localization using sensing andmovement histories • Probabilistic approach hasvagueness • Privacy • Location Anonymity

  4. Location-based Range Queries • Location-based Range Queries • Example: Find hotels located within 2km from Yuyuan Garden • Traditional problem in spatial databases • Efficient query processing using spatial indices • Extensible to multi-dimensional cases (e.g., image retrieval) • What happen if the location of query object is uncertain? q

  5. Probabilistic Range Query (PRQ) (1) • Assumptions • Location of query object qis specified as a Gaussian distribution • Target data: static points • Gaussian Distribution • Σ: Covariance matrix

  6. Probabilistic Range Query (PRQ) (2) • Probabilistic Range Query (PRQ) • Find objects such that the probabilities thattheir distances from qare less than δ are greater than θ q

  7. Probabilistic Range Query (PRQ) (3) • Is distance between q and p within d? pdf of q (Gaussian distribution) p Numerical integraiton is required

  8. Naïve Approach for Query Processing d d d d d • Exchanging roles • Pr[p is within d from q]= Pr[q is within d fromp] • Naïve approach • For each object p,integrate pdf forsphere region R • R : sphere withcenter p andradius δ • If the result q,it is qualified • Quite costly! R q p

  9. Outline • Background and Problem Formulation • Related Work • Query Processing Strategies • Experimental Results • Conclusions

  10. Related Work • Query processing methods for uncertain (location) data • Cheng, Prabhakar, et al. (SIGMOD’03, VLDB’04, …) • Tao et al. (VLDB’05, TODS’07) • Parker, Subrahmanian, et al. (TKDE’07, ‘09) • Consider arbitrary PDFs or uniform PDFs • Target objects may be uncertain • Research related to Gaussian distribution • Gauss-tree [Böhm et al., ICDE’06] • Target objects are based on Gaussian distributions

  11. Outline • Background and Problem Formulation • Related Work • Query Processing Strategies • Experimental Results • Conclusions

  12. Outline of Query Processing • Generic query processing strategy consists of three phases • Index-Based Search: Retrieve all candidate objects using spatial index (R-tree) • Filtering: Using several conditions, some candidates are pruned • Probability Computation: Perform numerical integration (Monte Carlo method) to evaluate exact probability • Phase 3 dominates processing cost • Filtering (phase 2) is important for efficiency

  13. Query Processing Strategies • Three strategies • Rectilinear-Region-Based Approach (RR) • Oblique-Region-Based Approach (OR) • Bounding-Function-Based Approach (BF) • Combination of strategies is also possible

  14. Rectilinear-Region-Based (RR) (1) • Use the concept of -region • Similar concepts are used in query processing for uncertain spatial databases •  -region: Ellipsoidal region for which the result of the integration becomes 1 – 2q: • The ellipsoidal regionis the q -region

  15. Rectilinear-Region-Based (RR) (2) • Query processing • Given a query, -region is computed: it is suffice if we have rθ-tablefor “normal” Gaussian pdf • “Normal” Gaussian:S = I, q = 0 • Given , it returns appropriate rθ • Derive MBR for -region and perform Minkowski Sum • Retrieve candidates then perform numerical integration d c a q q d b  -region

  16. Rectilinear-Region-Based (RR) (3) • Geometry of bounding boxwhere (Σ)iiis the (i, i) entry of Σ xj xi wj q wj wi wi

  17. Oblique-Region-Based (OR) (1) • Use of oblique rectangle • Query processing based on axis transformation • Not effective for phase 1 (index-based search): Only used for filtering (phase 2) c d c d d a b d q b a q

  18. Oblique-Region-Based (OR) (2) • Step 1: Rotate candidate objects • Based on the result of eigenvalue decomposition of Σ-1 • Step 2: Check whether each object is inside of the rectangle • λi: Eigenvalue of Σ-1 for i-th dimension q

  19. Bounding-Function-Based (BF) (1) d d d • Basic idea • Covariance matrix S = I (“normal” Gaussian pdf) • Isosurface of pdf has a spherical shape • Approach • Let a be the radius for which the integration result is q • If dist(q, p) ≤athen psatisfies the condition • Construct a table that gives (d, q ) abeforehand Pr< q Pr= q a q Pr> q

  20. Bounding-Function-Based (BF) (2) • General case • isosurface has an ellipsoidal shape • Approach • Use of upper- and lower-bounding functions for pdf • They have spherericalisosurfaces • Derived from covariance matrix q

  21. Bounding Functions • Original Gaussian pdf • Upper- and lower-bounding functions T T T Isosurfacehas aspherical shape T Note: λT = min{λi} T holds T λ = max{λi}

  22. Bounding-Function-Based (BF) (3) d d T • aT (a ): Radius with which the integration result of upper- (lower-) bounding function isq Originalpdf Prob. (integrationresult) = q T a Upper-bounding function T a q Lower-bounding function 2d 2d

  23. Bounding-Function-Based (BF) (4) • Theoretical result • Let ST be a spherical region with radius and its center relative to the origin is βT, and assume that ST satisfies the following equation: • Using table that gives (d, q ) a, we can get βT: • Then we can get

  24. Bounding-Function-Based (BF) (5) • Step 1: Use of R-tree • {b, c, d} are retrieved ascandidates • Step 2: Filtering using aT • b is deleted • Step 2’: Filtering using a • We can determine d as an answer withoutnumerical integration • Step 3: Numerical integration • Performed on {c} b a T T a q T d c a

  25. Outline • Background and Problem Formulation • Related Work • Query Processing Strategies • Experimental Results • and Conclusions

  26. Experiments on 2D Data (1) • Map of Long Beach, CA • Normalized into [0, 1000]  [0, 1000] • 50,747 entries • Indexed by R-tree • Covariance matrix • γ : Scaling parameter • Default: γ = 10

  27. Example Query • Find objects within distance δ = 50 with probability threshold θ = 1%

  28. Experiments on 2D Data (2) • Numerical integration dominates the total cost • R-tree-based search is negligible • ALL is the most effective strategy δ = 25 θ = 0.01

  29. Experiments on 2D Data (3) • Filtering regions (δ = 25, θ = 0.01, γ = 10) BF (upper) OR y RR x BF (lower) Integration region for ALL I

  30. Experiments on 2D Data (4) • Filtering regions for different uncertainty setting (δ = 25, θ = 0.01) γ = 1 : Nearly exact γ = 10 : Medium uncertainty γ = 100 : Uncertain

  31. Experiments on 9D Data (1) • Motivating Scenario: Example-Based Image Retrieval • User specifies sample images • Image retrieval system estimates his interest as a Gaussian distribution

  32. Experiments on 9D Data (2) • Data set: Corel Image Features data set • From UCI KDD Archive • Color Moments data • 68,040 9D vectors • Euclidean-distance based similarity • Experimental Scenario: Pseudo-Feedback • Select a random query object, then retrieve k-NN query (k = 20) as sample images • Derive the covariance matrix from samples : Sample covariance matrix κ : Normalization parameter

  33. Experiments on 9D Data (3) • Parameters • δ = 0.7: For exact case, it retrieves 15.3 objects • θ = 40% • Number of candidates (ANS: answer objs) Too many candidates to retrieve only 3.9 objects!

  34. Experiments on 9D Data (4) • Reason: Curse of dimensionality • Plot shows existence probability for pnorm for different radii and dimensions Location of query object is too vague: In medium dimension, it is quite apart from its distribution center on average Example: For 9D case, the probability that query object is within distance two is only 9%

  35. Outline • Background and Problem Formulation • Related Work • Query Processing Strategies • Experimental Results • Conclusions

  36. Conclusions • Spatial range query processing methods for imprecise query objects • Location of query object is represented by Gaussian distribution • Three strategies and their combinations • Reduction of numerical integration is important • Problem is difficult for medium- and high-dimensional data • Our related work • Probabilistic Nearest Neighbor Queries (MDM’09)

  37. Spatial Range Querying forGaussian-Based Imprecise Query Objects Yoshiharu Ishikawa, Yuichi IijimaNagoya University Jeffrey Xu Yu The Chinese University of Hong Kong

More Related