1 / 27

Probabilistic Data Management

This chapter explores the definition and query processing techniques of a probabilistic reverse nearest neighbor query. It covers various types of probabilistic queries and discusses the application of PRNN queries in uncertain databases. The chapter also presents different pruning techniques and the PRNN query processing procedure using a multidimensional index structure.

freyj
Download Presentation

Probabilistic Data Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Probabilistic Data Management Chapter 5: Probabilistic Query Answering (3)

  2. Objectives • In this chapter, you will: • Learn the definition and query processing techniques of a probabilistic query type • Probabilistic Reverse Nearest Neighbor Query

  3. Recall: Probabilistic Query Types Uncertain/probabilistic database Probabilistic range query Probabilistic k-nearest neighbor query Probabilistic group nearest neighbor (PGNN) query Probabilistic reverse k-nearest neighbor query Probabilistic spatial join /similarity join Probabilistic top-k query (or ranked query) Probabilistic skyline query Probabilistic reverse skyline query Probabilistic Spatial Query Probabilistic Preference Query 3

  4. Probabilistic Reverse Nearest Neighbor Queries in Uncertain Databases Very Large Data Bases Journal (VLDBJ), 2009

  5. Outline • Introduction • Related Work • Problem Definition • PRNN Query Processing • Experimental Evaluation • Summary

  6. Reverse Nearest Neighbor Query (RNN) • Rescue tasks in oceans • In the case of emergency, a ship will ask its nearest ship for help • A rescue ship needs to monitor those ships that have itself as their nearest neighbors • In other words, the rescue ship needs to obtain its reverse nearest neighbors (RNNs)

  7. Introduction • Reverse Nearest Neighbor Query (RNN) • Given a database D and a query object q, a RNN query retrieves those data objects o D that have q as nearest neighbor q o5 o4 o2 o1 o3

  8. RNN Processing on Certain Data Points TPL Approach [VLDB'04] q RNN candidate o5 o4 o2 o1 o3 pruning region 8

  9. RNN Processing on Certain Data Points TPL Approach [VLDB'04] RNN candidate q RNN candidate o5 o4 o2 o1 o3 pruning region 9

  10. Probabilistic Reverse Nearest Neighbor Query (PRNN) • Due to the accuracy of positioning devices (e.g. GPS) or their movement, the reported positions of ships are imprecise • Therefore, it is important to answer RNN queries over uncertain data effectively and efficiently

  11. Other Application of PRNN • Mixed-reality game • Each player tend to shoot his/her nearest neighbor • A query player needs to monitor those players (RNNs) who have himself/herself as their nearest neighbors • Due to movement of players, positions of players can be imprecise and uncertain, and RNN is conducted on uncertain objects

  12. RNN Queries in Uncertain Databases

  13. PRNN Definition • Probabilistic Reverse Nearest Neighbor (PRNN) Queries

  14. A Straightforward Method • For every uncertain object o in the database • Sequentially scan all the objects in the database • Calculate the PRNN probability, PPRNN(q, o), that o is an RNN of q • If PPRNN(q, o) is greater than or equal to probabilistic threshold a, then o is the answer; otherwise, o is discarded • Analysis • Complexity: O(N2), where N is the database size • The computation of probability PPRNN(q, o) is very costly

  15. Pruning Techniques • Geometric Pruning (GP) • GP0 method • The object distribution in the uncertainty region can be either known or unknown • Prune those data objects that definitely cannot be RNN of q • GPb method (b (0, 1]) • The object distribution in uncertainty region is known and the pre-computation is allowed • Prune those objects with the PRNN probability smaller than b

  16. Heuristics of GP0 Method • Data objects always reside within uncertainty regions conservative pruning region (CPR)

  17. Heuristics of GP0 Method (cont.) no false dismissals are introduced with hypersphere approximation candidate o

  18. Conditions of GP0 Method • Pruning Conditions • dist(P, q) - dist(P, Co) > ro • mindist(P, D)  rp • In other words, if object p is fully contained in the pruning region CPR'(q, o), then p can be safely pruned

  19. Heuristics of GPb Method (b (0, 1]) • GPbprunes those objects with the PRNN probability smaller than b (< a) p can be pruned by GPb candidate o

  20. Refinement Phase • After applying geometric pruning methods, we can obtain a candidate set • For each candidate o, we retrieve those uncertain objects p' intersecting with PR and compute the probability that o is an RNN of q

  21. PRNN Query Processing • Maintain a multidimensional index structureover uncertain database// indexing phase • For each PRNN query • Apply geometric pruning methods during the index traversal // pruning phase • Refine candidates and return the answer set // refinement phase

  22. PRNN Query Processing • Index uncertain data with an R-tree

  23. PRNN Query Procedure • Traverse the R-tree index by maintaining a minimum heap (with key the minimum distance from query point to node) • For each node/object Ni we encounter • Check whether or not Nican be pruned by GP methods • If the answer is no, then we either further check the children of node Ni, or add it to a PRNN candidate set Scand in case Ni is an object • After the index traversal, we refine candidates in Scand by calculating their actual PRNN probabilities

  24. PRNN Query Processing (cont'd)

  25. Experimental Evaluation • Experimental Settings • Real data sets: LB, MG, TCB, and CAR • Synthetic data sets: • Generate center locationCo of uncertain object o in a data space [0, 1,000]d • Produce radiusro  [rmin, rmax] for uncertainty region UR(o) • Four types of data sets: lUrU, lUrG, lSrU, and lSrG • Competitors: • Linear scan (worse than ours by 5-9 orders of magnitude) • Naïve pruning (pruning condition: given a PRNN candidate o, a node/object e can be pruned if maxdist(o, e) < mindist(q, e))

  26. Performance vs. b data size N = 100K, dimensionality d = 3, radius range [rmin, rmax] = [0, 5], and probabilistic threshold a = 1

  27. Summary • We formulate the problem of probabilistic queries over uncertain databases • We propose effective pruning methods to reduce the search space of probabilistic queries • We integrate pruning methods into an efficient query procedure • We verify the efficiency of our proposed approaches through extensive experiments

More Related