270 likes | 279 Views
This chapter explores the definition and query processing techniques of a probabilistic reverse nearest neighbor query. It covers various types of probabilistic queries and discusses the application of PRNN queries in uncertain databases. The chapter also presents different pruning techniques and the PRNN query processing procedure using a multidimensional index structure.
E N D
Probabilistic Data Management Chapter 5: Probabilistic Query Answering (3)
Objectives • In this chapter, you will: • Learn the definition and query processing techniques of a probabilistic query type • Probabilistic Reverse Nearest Neighbor Query
Recall: Probabilistic Query Types Uncertain/probabilistic database Probabilistic range query Probabilistic k-nearest neighbor query Probabilistic group nearest neighbor (PGNN) query Probabilistic reverse k-nearest neighbor query Probabilistic spatial join /similarity join Probabilistic top-k query (or ranked query) Probabilistic skyline query Probabilistic reverse skyline query Probabilistic Spatial Query Probabilistic Preference Query 3
Probabilistic Reverse Nearest Neighbor Queries in Uncertain Databases Very Large Data Bases Journal (VLDBJ), 2009
Outline • Introduction • Related Work • Problem Definition • PRNN Query Processing • Experimental Evaluation • Summary
Reverse Nearest Neighbor Query (RNN) • Rescue tasks in oceans • In the case of emergency, a ship will ask its nearest ship for help • A rescue ship needs to monitor those ships that have itself as their nearest neighbors • In other words, the rescue ship needs to obtain its reverse nearest neighbors (RNNs)
Introduction • Reverse Nearest Neighbor Query (RNN) • Given a database D and a query object q, a RNN query retrieves those data objects o D that have q as nearest neighbor q o5 o4 o2 o1 o3
RNN Processing on Certain Data Points TPL Approach [VLDB'04] q RNN candidate o5 o4 o2 o1 o3 pruning region 8
RNN Processing on Certain Data Points TPL Approach [VLDB'04] RNN candidate q RNN candidate o5 o4 o2 o1 o3 pruning region 9
Probabilistic Reverse Nearest Neighbor Query (PRNN) • Due to the accuracy of positioning devices (e.g. GPS) or their movement, the reported positions of ships are imprecise • Therefore, it is important to answer RNN queries over uncertain data effectively and efficiently
Other Application of PRNN • Mixed-reality game • Each player tend to shoot his/her nearest neighbor • A query player needs to monitor those players (RNNs) who have himself/herself as their nearest neighbors • Due to movement of players, positions of players can be imprecise and uncertain, and RNN is conducted on uncertain objects
PRNN Definition • Probabilistic Reverse Nearest Neighbor (PRNN) Queries
A Straightforward Method • For every uncertain object o in the database • Sequentially scan all the objects in the database • Calculate the PRNN probability, PPRNN(q, o), that o is an RNN of q • If PPRNN(q, o) is greater than or equal to probabilistic threshold a, then o is the answer; otherwise, o is discarded • Analysis • Complexity: O(N2), where N is the database size • The computation of probability PPRNN(q, o) is very costly
Pruning Techniques • Geometric Pruning (GP) • GP0 method • The object distribution in the uncertainty region can be either known or unknown • Prune those data objects that definitely cannot be RNN of q • GPb method (b (0, 1]) • The object distribution in uncertainty region is known and the pre-computation is allowed • Prune those objects with the PRNN probability smaller than b
Heuristics of GP0 Method • Data objects always reside within uncertainty regions conservative pruning region (CPR)
Heuristics of GP0 Method (cont.) no false dismissals are introduced with hypersphere approximation candidate o
Conditions of GP0 Method • Pruning Conditions • dist(P, q) - dist(P, Co) > ro • mindist(P, D) rp • In other words, if object p is fully contained in the pruning region CPR'(q, o), then p can be safely pruned
Heuristics of GPb Method (b (0, 1]) • GPbprunes those objects with the PRNN probability smaller than b (< a) p can be pruned by GPb candidate o
Refinement Phase • After applying geometric pruning methods, we can obtain a candidate set • For each candidate o, we retrieve those uncertain objects p' intersecting with PR and compute the probability that o is an RNN of q
PRNN Query Processing • Maintain a multidimensional index structureover uncertain database// indexing phase • For each PRNN query • Apply geometric pruning methods during the index traversal // pruning phase • Refine candidates and return the answer set // refinement phase
PRNN Query Processing • Index uncertain data with an R-tree
PRNN Query Procedure • Traverse the R-tree index by maintaining a minimum heap (with key the minimum distance from query point to node) • For each node/object Ni we encounter • Check whether or not Nican be pruned by GP methods • If the answer is no, then we either further check the children of node Ni, or add it to a PRNN candidate set Scand in case Ni is an object • After the index traversal, we refine candidates in Scand by calculating their actual PRNN probabilities
Experimental Evaluation • Experimental Settings • Real data sets: LB, MG, TCB, and CAR • Synthetic data sets: • Generate center locationCo of uncertain object o in a data space [0, 1,000]d • Produce radiusro [rmin, rmax] for uncertainty region UR(o) • Four types of data sets: lUrU, lUrG, lSrU, and lSrG • Competitors: • Linear scan (worse than ours by 5-9 orders of magnitude) • Naïve pruning (pruning condition: given a PRNN candidate o, a node/object e can be pruned if maxdist(o, e) < mindist(q, e))
Performance vs. b data size N = 100K, dimensionality d = 3, radius range [rmin, rmax] = [0, 5], and probabilistic threshold a = 1
Summary • We formulate the problem of probabilistic queries over uncertain databases • We propose effective pruning methods to reduce the search space of probabilistic queries • We integrate pruning methods into an efficient query procedure • We verify the efficiency of our proposed approaches through extensive experiments