Probabilistic Data Management

Probabilistic Data Management Chapter 5: Probabilistic Query Answering (3)

Objectives • In this chapter, you will: • Learn the definition and query processing techniques of a probabilistic query type • Probabilistic Reverse Nearest Neighbor Query

Recall: Probabilistic Query Types Uncertain/probabilistic database Probabilistic range query Probabilistic k-nearest neighbor query Probabilistic group nearest neighbor (PGNN) query Probabilistic reverse k-nearest neighbor query Probabilistic spatial join /similarity join Probabilistic top-k query (or ranked query) Probabilistic skyline query Probabilistic reverse skyline query Probabilistic Spatial Query Probabilistic Preference Query 3

Probabilistic Reverse Nearest Neighbor Queries in Uncertain Databases Very Large Data Bases Journal (VLDBJ), 2009

Outline • Introduction • Related Work • Problem Definition • PRNN Query Processing • Experimental Evaluation • Summary

Reverse Nearest Neighbor Query (RNN) • Rescue tasks in oceans • In the case of emergency, a ship will ask its nearest ship for help • A rescue ship needs to monitor those ships that have itself as their nearest neighbors • In other words, the rescue ship needs to obtain its reverse nearest neighbors (RNNs)

Introduction • Reverse Nearest Neighbor Query (RNN) • Given a database D and a query object q, a RNN query retrieves those data objects o D that have q as nearest neighbor q o5 o4 o2 o1 o3

RNN Processing on Certain Data Points TPL Approach [VLDB'04] q RNN candidate o5 o4 o2 o1 o3 pruning region 8

RNN Processing on Certain Data Points TPL Approach [VLDB'04] RNN candidate q RNN candidate o5 o4 o2 o1 o3 pruning region 9

Probabilistic Reverse Nearest Neighbor Query (PRNN) • Due to the accuracy of positioning devices (e.g. GPS) or their movement, the reported positions of ships are imprecise • Therefore, it is important to answer RNN queries over uncertain data effectively and efficiently

Other Application of PRNN • Mixed-reality game • Each player tend to shoot his/her nearest neighbor • A query player needs to monitor those players (RNNs) who have himself/herself as their nearest neighbors • Due to movement of players, positions of players can be imprecise and uncertain, and RNN is conducted on uncertain objects

RNN Queries in Uncertain Databases

PRNN Definition • Probabilistic Reverse Nearest Neighbor (PRNN) Queries

A Straightforward Method • For every uncertain object o in the database • Sequentially scan all the objects in the database • Calculate the PRNN probability, PPRNN(q, o), that o is an RNN of q • If PPRNN(q, o) is greater than or equal to probabilistic threshold a, then o is the answer; otherwise, o is discarded • Analysis • Complexity: O(N2), where N is the database size • The computation of probability PPRNN(q, o) is very costly

Pruning Techniques • Geometric Pruning (GP) • GP0 method • The object distribution in the uncertainty region can be either known or unknown • Prune those data objects that definitely cannot be RNN of q • GPb method (b (0, 1]) • The object distribution in uncertainty region is known and the pre-computation is allowed • Prune those objects with the PRNN probability smaller than b

Heuristics of GP0 Method • Data objects always reside within uncertainty regions conservative pruning region (CPR)

Heuristics of GP0 Method (cont.) no false dismissals are introduced with hypersphere approximation candidate o

Conditions of GP0 Method • Pruning Conditions • dist(P, q) - dist(P, Co) > ro • mindist(P, D)  rp • In other words, if object p is fully contained in the pruning region CPR'(q, o), then p can be safely pruned

Heuristics of GPb Method (b (0, 1]) • GPbprunes those objects with the PRNN probability smaller than b (< a) p can be pruned by GPb candidate o

Refinement Phase • After applying geometric pruning methods, we can obtain a candidate set • For each candidate o, we retrieve those uncertain objects p' intersecting with PR and compute the probability that o is an RNN of q

PRNN Query Processing • Maintain a multidimensional index structureover uncertain database// indexing phase • For each PRNN query • Apply geometric pruning methods during the index traversal // pruning phase • Refine candidates and return the answer set // refinement phase

PRNN Query Processing • Index uncertain data with an R-tree

PRNN Query Procedure • Traverse the R-tree index by maintaining a minimum heap (with key the minimum distance from query point to node) • For each node/object Ni we encounter • Check whether or not Nican be pruned by GP methods • If the answer is no, then we either further check the children of node Ni, or add it to a PRNN candidate set Scand in case Ni is an object • After the index traversal, we refine candidates in Scand by calculating their actual PRNN probabilities

PRNN Query Processing (cont'd)

Experimental Evaluation • Experimental Settings • Real data sets: LB, MG, TCB, and CAR • Synthetic data sets: • Generate center locationCo of uncertain object o in a data space [0, 1,000]d • Produce radiusro  [rmin, rmax] for uncertainty region UR(o) • Four types of data sets: lUrU, lUrG, lSrU, and lSrG • Competitors: • Linear scan (worse than ours by 5-9 orders of magnitude) • Naïve pruning (pruning condition: given a PRNN candidate o, a node/object e can be pruned if maxdist(o, e) < mindist(q, e))

Performance vs. b data size N = 100K, dimensionality d = 3, radius range [rmin, rmax] = [0, 5], and probabilistic threshold a = 1

Summary • We formulate the problem of probabilistic queries over uncertain databases • We propose effective pruning methods to reduce the search space of probabilistic queries • We integrate pruning methods into an efficient query procedure • We verify the efficiency of our proposed approaches through extensive experiments

Probabilistic Data Management

Probabilistic Data Management

Presentation Transcript

Probabilistic Reasoning for Modeling Unreliable Data

Probabilistic/Uncertain Data Management -- III

Probabilistic Histograms for Probabilistic Data

Probabilistic/Uncertain Data Management

Probabilistic/Uncertain Data Management -- IV

Probabilistic Models of Relational Data

Probabilistic Data Aggregation

Probabilistic Data Aggregation

Probabilistic Queries and Uncertain Data

Using Probabilistic Models for Data Management in Acquisitional Environments

Living Probabilistic Asset Management

COMP9315 Uncertain and Probabilistic Data

Probabilistic Models for Relational Data

Probabilistic Reasoning in Data Analysis

Probabilistic Reasoning with Uncertain Data

Probabilistic Data Aggregation

Probabilistic Reasoning with Uncertain Data