1 / 23

Probabilistic Data Management

Explore different types of probabilistic queries such as top-k, range, nearest neighbor, and group nearest neighbor queries. Learn about the semantics and algorithms for handling uncertain data in probabilistic databases.

mcveigha
Download Presentation

Probabilistic Data Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Probabilistic Data Management Chapter 8: Probabilistic Query Answering (6)

  2. Objectives • In this chapter, you will: • Explore the definitions of more probabilistic query types • Probabilistic top-k query

  3. Recall: Probabilistic Query Types Uncertain/probabilistic database Probabilistic range query Probabilistic k-nearest neighbor query Probabilistic group nearest neighbor (PGNN) query Probabilistic reverse k-nearest neighbor query Probabilistic spatial join /similarity join Probabilistic top-k query (or ranked query) Probabilistic skyline query Probabilistic reverse skyline query Probabilistic Spatial Query Probabilistic Preference Query 3

  4. Motivation Example In a coal mine surveillance application, a number of sensors are deployed to detect density of gas, temperature, and so on Assume we have a preference function f(O) = O.temp + O.den Top-k query: Retrieve k sensors with the highest scores (most dangerous) 4

  5. Motivation Example (cont'd) Sensor data usually contain noises The reported data can be modeled as uncertain objects Obtain top-k query answers over uncertain data with high confidence actual data actual data 5

  6. Background of Probabilistic Top-k Query Under possible worlds semantics Each tuple t is associated with a score t.score Each tuple t is associated with an existence probability t.prob query answer in possible worlds possible worlds 6

  7. Different Semantics of Probabilistic Top-k Query Top-k query in probabilistic databases Consider each possible world from which top-k answers are retrieved Aggregate the top-k answers (weighted by the probabilities of possible worlds) Aggregation Semantics Uncertain Top-k (U-Topk) [Soliman et al., ICDE 2007] Uncertain Rank-k (U-kRank) [Soliman et al., ICDE 2007] Probabilistic Threshold Top-k (PT(h)) [Hua et al., SIGMOD 2008] Expected Ranks (Exp-Rank) [Cormode et al., ICDE 2009] Expected Score (E-Score) [Cormode et al., ICDE 2009] 7

  8. Uncertain Top-k (U-Topk) [Soliman et al., ICDE 2007] group by top-k answer vectors top-k answer vector Find one top-k answer vector that appears in possible worlds with the highest probability top-k answer vector … … … … … … … … probabilistic database top-k answer vector U-Topk answers possible worlds 8

  9. Example of U-Topk Given the Uncertain Database and k=2 Pr[{ t1, t2 }] = 0.2 Pr[{ t1, t3 }] = 0.2 Pr[{ t2, t3 }] = 0.3 Pr[{ t3, t4 }] = 0.3 Final Result: {t2, t3} or {t3, t4} 9

  10. Uncertain Rank-k (U-kRanks) [Soliman et al., ICDE 2007] For some j [1, k], group by tuples with the j-th rank tuple with the j-thrank For each j [1, k], find one tuple that has the j-th rank in possible worlds with the highest probability tuple with the j-thrank … … … … … … … … probabilistic database tuple with the j-thrank U-kRank answers possible worlds 10

  11. Example of U-kRanks Given the Uncertain Database and k=2 At rank i= 1: Pr[t1] = 0.4 Pr[t2] = 0.3 Pr[t3] = 0.3 At rank i= 2: Pr[t2] = 0.2 Pr[t3] = 0.5 Pr[t4] = 0.3 Final Result: {t1, t3} 11

  12. Probabilistic Threshold Top-k (PT(h)) [Hua et al., SIGMOD 2008] group by tuples in top-h answer sets top-h answer set Find k tuples that are in top-h answer sets of possible worlds with the highest probabilities top-h answer set … … … … … … … … probabilistic database top-h answer set PT(h) answers possible worlds 12

  13. Example of PT-k Given the Uncertain Database, k=2, Threshold=0.5 Pr[t1] = 0.4 Pr[t2] = 0.5 Pr[t3] = 0.8 Pr[t4] = 0.3 Threshold=0.5 Pr[t2] = 0.5 Pr[t3] = 0.8 Final Result: {t2, t3} 13

  14. Expected Ranks (Exp-Rank) [Cormode et al., ICDE 2009] … … … … … … … … expected rank of t1: pwrpw(t1)Pr(pw) t1 t2 … … … … Find k tuples with the highest expected ranks … … … … … … probabilistic database … … alternatives possible worlds 14

  15. Expected Score (E-Score) [Cormode et al., ICDE 2009] … … … … … … … … expected score of t1: pwscore(t1)Pr(pw) t1 t2 … … … … Find k tuples with the highest expected scores … … … … … … probabilistic database … … alternatives possible worlds 15

  16. Example of Expected Ranks Given the Uncertain Database and k=2 If a tuple doesn’t appear in a world, its rank is considered to be the last one E[R(t1)] = 1×0.2+ 1×0.2+3×0.3+3× 0.3= 2.2 E[R(t2)] = 2.4 E[R(t3)] = 1.9 E[R(t4)] = 2.9 Final Result: {t3, t1} 16

  17. Unified Ranking Functions Parameterized Ranking Function (PRF) A probabilistic top-k query returns k tuples with the highest |gw| values weighted function Li, J., Deshpande, A. A Unified Approach to Ranking in Probabilistic Databases. In VLDB, 2009. 17

  18. Unified Ranking Functions (cont'd) When w(t, i) = 1, the result is the set of k tuples with the highest probability When w(t, i) = score(t), E-Score When , PT(h) When , U-Rank PRF cannot simulate U-Topk 18

  19. Unified Ranking Functions (cont'd) Two new semantics PRFw(h) and PRFe(h) PRFw(h): w(t, i) = wi for i  h,and w(t, i) = 0 for i > h PRFe(h): w(t, i) = a i, where a can be a real/complex number 19

  20. Ranking Algorithms Assuming tuple independence Compute the probability that a tuple ti has the j-th rank Observation: the coefficient cj of xj in a function, Fi(x), is exactly the probability that ti is at rank j 20

  21. Example Consider the rank of a tuple t3, Incremental computation of Fi(x): .4x 21

  22. Ranking Algorithms (cont'd) Assuming correlated database represented by and/xor tree Generating functions on the and/xor tree Observation: the coefficient cj of the term xj-1y is Pr(r(ti) = j) 22

  23. Summary Probabilistic top-k query Different semantics w.r.t. ranks and probabilities in possible worlds A unified approach 23

More Related