350 likes | 364 Views
Explore various probabilistic query types in data management, including skyline and reverse skyline queries. Learn about dominance probability, skyline probability, and example calculations on uncertain data. Understand Monochromatic and Bichromatic reverse skyline search in databases.
E N D
Probabilistic Data Management Chapter 7: Probabilistic Query Answering (5)
Objectives • In this chapter, you will: • Explore the definitions of more probabilistic query types • Probabilistic skyline query • Probabilistic reverse skyline query
Recall: Probabilistic Query Types Uncertain/probabilistic database Probabilistic range query Probabilistic k-nearest neighbor query Probabilistic group nearest neighbor (PGNN) query Probabilistic reverse k-nearest neighbor query Probabilistic spatial join /similarity join Probabilistic top-k query (or ranked query) Probabilistic skyline query Probabilistic reverse skyline query Probabilistic Spatial Query Probabilistic Preference Query 3
Probabilistic Skyline on Uncertain Data Very Large Data Bases (VLDB), 2007
Skyline Query • Skyline definition • Point X(X1, X2, …, Xd) dominates point Y(Y1, Y2, …, Yd), iffit holds that: • 1)Xi Yi for all 1 i d; • 2) Xj < Yj, for some 1 j d • Point X is a skyline pointif X is not dominated by other points skyline points
Motivation of Probabilistic Skyline Query • Motivation example • NBA dataset
Terminology • U, V – uncertain object • u, v – an instance of U or V • V ≺ U (v≺u) – the former dominates the latter Pei J. et al. Probabilistic Skyline on Uncertain Data. In VLDB, 2007
Probabilistic Skyline Query U, V – uncertain object u, v – an instance of U or V V ≺ U (v ≺ u) – the former dominates the latter • Dominance probability: • Continuous case: • Discrete case: (U has l1 instances, and V has l2instances) Pei J. et al. Probabilistic Skyline on Uncertain Data. In VLDB, 2007
Probabilistic Skyline Query (cont'd) • Skyline probability • Continuous case: • Discrete case: (U has l instances) • p-skyline: Pei J. et al. Probabilistic Skyline on Uncertain Data. In VLDB, 2007
Example of Calculating Skyline Probability • The probability Pr(D) that D is not dominated by other objects is given by: 4 instances of A 3 instances of B 3 instances of C 3 instances of D Pei J. et al. Probabilistic Skyline on Uncertain Data. In VLDB, 2007
Basic Pruning Rule • Bounding skyline probability • Pr(Umax) Pr(U) Pr(Umin) • If Pr(Umin) < p, then U can be pruned; if Pr(Umax) p, then U is the final result Umax U Umin Pei J. et al. Probabilistic Skyline on Uncertain Data. In VLDB, 2007
Monochromatic and Bichromatic Reverse Skyline Search over Uncertain Databases ACM Conference on the Management of Data (SIGMOD), 2008
Recall: Static Skyline Problem • Point o(o1,o2, …, od) dominates point p(p1,p2, …, pd), iff • oi pi for all 1 i d; • oj < pj, for some 1 j d • Point o is a skyline point if o is not dominated by other points static skyline points
Dynamic Skyline [Dellis and Seeger, VLDB07] • Skyline with dynamic attributes • Dynamic dominance • |oi - ui| |pi - ui|, for all 1 i d • |oj - uj| < |pj - uj|, for somej • To obtain all the objects in the database that are not dynamically dominated by other objects with respect to query object u |o1 – u1| o |o2 – u2| p dominating regions
Reverse Skyline Query [Dellis and Seeger, VLDB07] dynamic skyline of point b • Given a query point q, a reverse skyline query obtains all the objects u such that the dynamic skyline points of u include query point q b is a reverse skyline of q
Motivation Example • In a laptop market, each model corresponds to a 2D point in a price and performance space • Those customers who are interested in f, are very likely to be interested in a and c • If a company wants to produce a new model, … new model q f a model that customers prefer
The Laptop Market Next Year • How about the laptop market in the coming year? • The performance or price attribute of each model may vary • Monochromatic reverse skyline problem over uncertain data (MPRS)
The Bichromatic Case (BPRS) data set A data set B
Outline • Introduction • Problem Definition • Monochromatic PRS Query Processing • Bichromatic PRS Query Processing • Experimental Results • Summary
Introduction • In the context of uncertain databases, each uncertain object o is usually modeled as an uncertainty regionUR(o) • Uncertain object can reside within its uncertainty region with any data distribution
Monochromatic Probabilistic Reverse Skyline (MPRS) Query • MPRS Query • d-dimensional uncertain database D • query object q • probability threshold (0, 1] • MPRS query retrieves all the objects u D such that u is a reverse skyline point of q with probability greater than or equal to , that is,
Bichromatic Probabilistic Reverse Skyline (BPRS) Query • BPRS Query • two d-dimensional uncertain databases A and B • query object q • probability threshold (0, 1] • BPRS query obtains all the objects uA such that u is a reverse skyline point of q in B with probability greater than or equal to , that is,
Linear Scan Method • For each object u in uncertain database D (or A in bichromatic case) • sequentially scan objects in D (or B) to calculate the probability PMPRS(u) (or PBPRS(u)) • return object u as PRS answer if PMPRS(u) (or PBPRS(u) )
Pruning Techniques • Spatial Pruning • Probabilistic Pruning
Spatial Pruning • Assume uncertain object p is an MPRS candidate and Np is the farthest point in UR(p) to q • Point Mp is the middle point between q and Np • Any object o fully contained in the pruning region can be safely pruned
Probabilistic Pruning • For uncertain object o, we pre-compute an inner rectangle, called (1-)-hyperrectangle, UR1-(o), such that o locates in UR1-(o) with probability (1-), where [0, ) • Any object o whose UR1-(o) is completely contained in the pruning region can be safely pruned (1-)-hyperrectangle UR1-(o)
Framework for PRS • Indexing Phase • Construct a multidimensional index (e.g. R-tree) over the uncertain data • Pruning Phase • Traverse the index and perform the spatial and probabilistic pruning • Refinement Phase • Refine the PRS candidates and return the answer set
MPRS Query Processing • Traversal of the Index • For each encountered entry/object ei in nodes, we check whether or not it is fully contained in the pruning regions defined by candidates seen so far (via spatial pruning) • In addition, for each encountered object o, we apply the probabilistic pruning by considering (1-)-hyperrectangle UR1-(o)
MPRS Query Processing (cont.) • Refinement • Only considering objects that intersect with the refinement region
BPRS Query Processing index construction index traversal
Experimental Evaluation • Experimental Settings • Synthetic data sets • Generate center locationCo of uncertain object o in a data space [0, 1000]d • Produce radiusro [rmin, rmax] for uncertainty region UR(o) • Randomly generate a hyperrectangle within sphere centered at Co and with radius ro • Four types of data sets: lUrU, lUrG, lSrU, lSrG • Measures: • Filtering time (including CPU time and I/O cost) • Speed-up ratio compared with the linear scan method
MPRS Query Performance lUrU data set (data size = 100K, dimensionality d = 3)
BPRS Query Performance lUrG – lUrG (dimensionality d = 3)
Summary • MPRS and BPRS queries over uncertain data • Spatial and probabilistic pruning • PRS query processing with pre-computation • Experimental evaluation