1 / 35

Probabilistic Data Management

Explore various probabilistic query types in data management, including skyline and reverse skyline queries. Learn about dominance probability, skyline probability, and example calculations on uncertain data. Understand Monochromatic and Bichromatic reverse skyline search in databases.

joethompson
Download Presentation

Probabilistic Data Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Probabilistic Data Management Chapter 7: Probabilistic Query Answering (5)

  2. Objectives • In this chapter, you will: • Explore the definitions of more probabilistic query types • Probabilistic skyline query • Probabilistic reverse skyline query

  3. Recall: Probabilistic Query Types Uncertain/probabilistic database Probabilistic range query Probabilistic k-nearest neighbor query Probabilistic group nearest neighbor (PGNN) query Probabilistic reverse k-nearest neighbor query Probabilistic spatial join /similarity join Probabilistic top-k query (or ranked query) Probabilistic skyline query Probabilistic reverse skyline query Probabilistic Spatial Query Probabilistic Preference Query 3

  4. Probabilistic Skyline on Uncertain Data Very Large Data Bases (VLDB), 2007

  5. Skyline Query • Skyline definition • Point X(X1, X2, …, Xd) dominates point Y(Y1, Y2, …, Yd), iffit holds that: • 1)Xi Yi for all 1 i  d; • 2) Xj < Yj, for some 1 j  d • Point X is a skyline pointif X is not dominated by other points skyline points

  6. Motivation of Probabilistic Skyline Query • Motivation example • NBA dataset

  7. Terminology • U, V – uncertain object • u, v – an instance of U or V • V ≺ U (v≺u) – the former dominates the latter Pei J. et al. Probabilistic Skyline on Uncertain Data. In VLDB, 2007

  8. Probabilistic Skyline Query U, V – uncertain object u, v – an instance of U or V V ≺ U (v ≺ u) – the former dominates the latter • Dominance probability: • Continuous case: • Discrete case: (U has l1 instances, and V has l2instances) Pei J. et al. Probabilistic Skyline on Uncertain Data. In VLDB, 2007

  9. Probabilistic Skyline Query (cont'd) • Skyline probability • Continuous case: • Discrete case: (U has l instances) • p-skyline: Pei J. et al. Probabilistic Skyline on Uncertain Data. In VLDB, 2007

  10. Example of Calculating Skyline Probability • The probability Pr(D) that D is not dominated by other objects is given by: 4 instances of A 3 instances of B 3 instances of C 3 instances of D Pei J. et al. Probabilistic Skyline on Uncertain Data. In VLDB, 2007

  11. Basic Pruning Rule • Bounding skyline probability • Pr(Umax)  Pr(U)  Pr(Umin) • If Pr(Umin) < p, then U can be pruned; if Pr(Umax)  p, then U is the final result Umax U Umin Pei J. et al. Probabilistic Skyline on Uncertain Data. In VLDB, 2007

  12. Monochromatic and Bichromatic Reverse Skyline Search over Uncertain Databases ACM Conference on the Management of Data (SIGMOD), 2008

  13. Recall: Static Skyline Problem • Point o(o1,o2, …, od) dominates point p(p1,p2, …, pd), iff • oi pi for all 1 i  d; • oj < pj, for some 1 j  d • Point o is a skyline point if o is not dominated by other points static skyline points

  14. Dynamic Skyline [Dellis and Seeger, VLDB07] • Skyline with dynamic attributes • Dynamic dominance • |oi - ui|  |pi - ui|, for all 1  i  d • |oj - uj| < |pj - uj|, for somej • To obtain all the objects in the database that are not dynamically dominated by other objects with respect to query object u |o1 – u1| o |o2 – u2| p dominating regions

  15. Reverse Skyline Query [Dellis and Seeger, VLDB07] dynamic skyline of point b • Given a query point q, a reverse skyline query obtains all the objects u such that the dynamic skyline points of u include query point q b is a reverse skyline of q

  16. Motivation Example • In a laptop market, each model corresponds to a 2D point in a price and performance space • Those customers who are interested in f, are very likely to be interested in a and c • If a company wants to produce a new model, … new model q f a model that customers prefer

  17. The Laptop Market Next Year • How about the laptop market in the coming year? • The performance or price attribute of each model may vary • Monochromatic reverse skyline problem over uncertain data (MPRS)

  18. The Bichromatic Case (BPRS) data set A data set B

  19. Outline • Introduction • Problem Definition • Monochromatic PRS Query Processing • Bichromatic PRS Query Processing • Experimental Results • Summary

  20. Introduction • In the context of uncertain databases, each uncertain object o is usually modeled as an uncertainty regionUR(o) • Uncertain object can reside within its uncertainty region with any data distribution

  21. Monochromatic Probabilistic Reverse Skyline (MPRS) Query • MPRS Query • d-dimensional uncertain database D • query object q • probability threshold  (0, 1] • MPRS query retrieves all the objects u D such that u is a reverse skyline point of q with probability greater than or equal to , that is,

  22. Bichromatic Probabilistic Reverse Skyline (BPRS) Query • BPRS Query • two d-dimensional uncertain databases A and B • query object q • probability threshold   (0, 1] • BPRS query obtains all the objects uA such that u is a reverse skyline point of q in B with probability greater than or equal to , that is,

  23. Linear Scan Method • For each object u in uncertain database D (or A in bichromatic case) • sequentially scan objects in D (or B) to calculate the probability PMPRS(u) (or PBPRS(u)) • return object u as PRS answer if PMPRS(u)   (or PBPRS(u)  )

  24. Pruning Techniques • Spatial Pruning • Probabilistic Pruning

  25. Spatial Pruning • Assume uncertain object p is an MPRS candidate and Np is the farthest point in UR(p) to q • Point Mp is the middle point between q and Np • Any object o fully contained in the pruning region can be safely pruned

  26. Probabilistic Pruning • For uncertain object o, we pre-compute an inner rectangle, called (1-)-hyperrectangle, UR1-(o), such that o locates in UR1-(o) with probability (1-), where   [0, ) • Any object o whose UR1-(o) is completely contained in the pruning region can be safely pruned (1-)-hyperrectangle UR1-(o)

  27. Framework for PRS • Indexing Phase • Construct a multidimensional index (e.g. R-tree) over the uncertain data • Pruning Phase • Traverse the index and perform the spatial and probabilistic pruning • Refinement Phase • Refine the PRS candidates and return the answer set

  28. MPRS Query Processing • Traversal of the Index • For each encountered entry/object ei in nodes, we check whether or not it is fully contained in the pruning regions defined by candidates seen so far (via spatial pruning) • In addition, for each encountered object o, we apply the probabilistic pruning by considering (1-)-hyperrectangle UR1-(o)

  29. MPRS Query Processing (cont.) • Refinement • Only considering objects that intersect with the refinement region

  30. MPRS via Pre-Computation

  31. BPRS Query Processing index construction index traversal

  32. Experimental Evaluation • Experimental Settings • Synthetic data sets • Generate center locationCo of uncertain object o in a data space [0, 1000]d • Produce radiusro  [rmin, rmax] for uncertainty region UR(o) • Randomly generate a hyperrectangle within sphere centered at Co and with radius ro • Four types of data sets: lUrU, lUrG, lSrU, lSrG • Measures: • Filtering time (including CPU time and I/O cost) • Speed-up ratio compared with the linear scan method

  33. MPRS Query Performance lUrU data set (data size = 100K, dimensionality d = 3)

  34. BPRS Query Performance lUrG – lUrG (dimensionality d = 3)

  35. Summary • MPRS and BPRS queries over uncertain data • Spatial and probabilistic pruning • PRS query processing with pre-computation • Experimental evaluation

More Related