230 likes | 241 Views
Explore different types of probabilistic queries such as top-k, range, nearest neighbor, and group nearest neighbor queries. Learn about the semantics and algorithms for handling uncertain data in probabilistic databases.
E N D
Probabilistic Data Management Chapter 8: Probabilistic Query Answering (6)
Objectives • In this chapter, you will: • Explore the definitions of more probabilistic query types • Probabilistic top-k query
Recall: Probabilistic Query Types Uncertain/probabilistic database Probabilistic range query Probabilistic k-nearest neighbor query Probabilistic group nearest neighbor (PGNN) query Probabilistic reverse k-nearest neighbor query Probabilistic spatial join /similarity join Probabilistic top-k query (or ranked query) Probabilistic skyline query Probabilistic reverse skyline query Probabilistic Spatial Query Probabilistic Preference Query 3
Motivation Example In a coal mine surveillance application, a number of sensors are deployed to detect density of gas, temperature, and so on Assume we have a preference function f(O) = O.temp + O.den Top-k query: Retrieve k sensors with the highest scores (most dangerous) 4
Motivation Example (cont'd) Sensor data usually contain noises The reported data can be modeled as uncertain objects Obtain top-k query answers over uncertain data with high confidence actual data actual data 5
Background of Probabilistic Top-k Query Under possible worlds semantics Each tuple t is associated with a score t.score Each tuple t is associated with an existence probability t.prob query answer in possible worlds possible worlds 6
Different Semantics of Probabilistic Top-k Query Top-k query in probabilistic databases Consider each possible world from which top-k answers are retrieved Aggregate the top-k answers (weighted by the probabilities of possible worlds) Aggregation Semantics Uncertain Top-k (U-Topk) [Soliman et al., ICDE 2007] Uncertain Rank-k (U-kRank) [Soliman et al., ICDE 2007] Probabilistic Threshold Top-k (PT(h)) [Hua et al., SIGMOD 2008] Expected Ranks (Exp-Rank) [Cormode et al., ICDE 2009] Expected Score (E-Score) [Cormode et al., ICDE 2009] 7
Uncertain Top-k (U-Topk) [Soliman et al., ICDE 2007] group by top-k answer vectors top-k answer vector Find one top-k answer vector that appears in possible worlds with the highest probability top-k answer vector … … … … … … … … probabilistic database top-k answer vector U-Topk answers possible worlds 8
Example of U-Topk Given the Uncertain Database and k=2 Pr[{ t1, t2 }] = 0.2 Pr[{ t1, t3 }] = 0.2 Pr[{ t2, t3 }] = 0.3 Pr[{ t3, t4 }] = 0.3 Final Result: {t2, t3} or {t3, t4} 9
Uncertain Rank-k (U-kRanks) [Soliman et al., ICDE 2007] For some j [1, k], group by tuples with the j-th rank tuple with the j-thrank For each j [1, k], find one tuple that has the j-th rank in possible worlds with the highest probability tuple with the j-thrank … … … … … … … … probabilistic database tuple with the j-thrank U-kRank answers possible worlds 10
Example of U-kRanks Given the Uncertain Database and k=2 At rank i= 1: Pr[t1] = 0.4 Pr[t2] = 0.3 Pr[t3] = 0.3 At rank i= 2: Pr[t2] = 0.2 Pr[t3] = 0.5 Pr[t4] = 0.3 Final Result: {t1, t3} 11
Probabilistic Threshold Top-k (PT(h)) [Hua et al., SIGMOD 2008] group by tuples in top-h answer sets top-h answer set Find k tuples that are in top-h answer sets of possible worlds with the highest probabilities top-h answer set … … … … … … … … probabilistic database top-h answer set PT(h) answers possible worlds 12
Example of PT-k Given the Uncertain Database, k=2, Threshold=0.5 Pr[t1] = 0.4 Pr[t2] = 0.5 Pr[t3] = 0.8 Pr[t4] = 0.3 Threshold=0.5 Pr[t2] = 0.5 Pr[t3] = 0.8 Final Result: {t2, t3} 13
Expected Ranks (Exp-Rank) [Cormode et al., ICDE 2009] … … … … … … … … expected rank of t1: pwrpw(t1)Pr(pw) t1 t2 … … … … Find k tuples with the highest expected ranks … … … … … … probabilistic database … … alternatives possible worlds 14
Expected Score (E-Score) [Cormode et al., ICDE 2009] … … … … … … … … expected score of t1: pwscore(t1)Pr(pw) t1 t2 … … … … Find k tuples with the highest expected scores … … … … … … probabilistic database … … alternatives possible worlds 15
Example of Expected Ranks Given the Uncertain Database and k=2 If a tuple doesn’t appear in a world, its rank is considered to be the last one E[R(t1)] = 1×0.2+ 1×0.2+3×0.3+3× 0.3= 2.2 E[R(t2)] = 2.4 E[R(t3)] = 1.9 E[R(t4)] = 2.9 Final Result: {t3, t1} 16
Unified Ranking Functions Parameterized Ranking Function (PRF) A probabilistic top-k query returns k tuples with the highest |gw| values weighted function Li, J., Deshpande, A. A Unified Approach to Ranking in Probabilistic Databases. In VLDB, 2009. 17
Unified Ranking Functions (cont'd) When w(t, i) = 1, the result is the set of k tuples with the highest probability When w(t, i) = score(t), E-Score When , PT(h) When , U-Rank PRF cannot simulate U-Topk 18
Unified Ranking Functions (cont'd) Two new semantics PRFw(h) and PRFe(h) PRFw(h): w(t, i) = wi for i h,and w(t, i) = 0 for i > h PRFe(h): w(t, i) = a i, where a can be a real/complex number 19
Ranking Algorithms Assuming tuple independence Compute the probability that a tuple ti has the j-th rank Observation: the coefficient cj of xj in a function, Fi(x), is exactly the probability that ti is at rank j 20
Example Consider the rank of a tuple t3, Incremental computation of Fi(x): .4x 21
Ranking Algorithms (cont'd) Assuming correlated database represented by and/xor tree Generating functions on the and/xor tree Observation: the coefficient cj of the term xj-1y is Pr(r(ti) = j) 22
Summary Probabilistic top-k query Different semantics w.r.t. ranks and probabilities in possible worlds A unified approach 23