350 likes | 444 Views
Continuous Top-k Dominating Queries. Maria Kontaki, Apostolos N. Papadopoulos, Yannis Manolopoulos TKDE.2011. Outline. Introduction Background Method EVA(event-based algorithm) AVA( ADvanced Algorithm) AHBA( Approximate Hoeffding Bound Algorithm )
E N D
Continuous Top-k Dominating Queries Maria Kontaki, Apostolos N. Papadopoulos, Yannis Manolopoulos TKDE.2011
Outline • Introduction • Background • Method • EVA(event-based algorithm) • AVA(ADvanced Algorithm) • AHBA(Approximate Hoeffding Bound Algorithm) • AMSA(Approximate Minimum Score Algorithm) • Experiment • Conclusion
Introduction • Combine top-k and skyline query Advantages: • The output size can be controlled. • No ranking functions need to be specified by users.
Background • Adopt the count-based sliding window • Sliding window sizen • Arrival time of pi pi.arr • Expiration time of pi pi.exp • pi.exp = pi.arr + n • j-th best scorescorej
Cont. • C6 full dominate C11、C12、C15、C16 C6 partially dominate C6、C7、C8、C10、C14
Cont. • Exact score computation • P4 dominate p6、p11 (in partially dominated cells) P8、P9、P12 (in full dominated cells) score(p4) =2+3=5
Method--EVA • Event-Based Processing • 第K高的分數 Pi的分數 ,top-k中最快過期的那個資料 目前時間
Cont. • Event processing time, denoted as ei.ept ei.ept = min((score(k) - score(pi))/2 +now, exp1) • Event generation time, denoted as ei.egt • The score of pi at time ei.egt, denoted as ei.score
Cont. • Upper-Bounding Scores of Existing Points • Score(pi) <= ei.score+ei.ept−ei.egt • Score(P10)=9, Score(P3)=8, Score(P8)=7
Cont. ei.ept = min((score(k) - score(pi))/2 +now, exp1) • Illustrate the computation of the event e7 of p7 • Score(P7)=3 , e7.egt = 10 • e7.ept = min((7 − 3)/2 + 10, 13) = 12 • Min(12,13)=12
Cont. ei.ept = min((score(k) - score(pi))/2 +now, exp1) Score(pi) <= ei.score+ei.ept−ei.egt • score(p7) ≤ e7.score + e7.ept − e7.egt =3+12−10 = 5 • e7.egt = 12 • e7.ept = min((7 − 5)/2 + 12, 13) = 13 Min(13,13)=13
Cont. • When score(ei)>scorek Lead to exact computation else reschedule(ei)
Cont. • Upper-Bounding Scores of Incoming Points • Px arrive . • Determine Pr • (i) it dominates the incoming point Px • (ii) it is not part of TOPK.
Cont. • score(Pr) ≤ er.score + now − er.egt. • score(Px) ≤ score(Pr)−1
ADA • Φ(Pi) = Φ+(Pi) ∪ Φ−(Pi) part of top-k not part of top-k
Cont. ei.ept = min((score(k-r) - score(pi))/2 +now, exp2) • |Φ−(P4)|=1 , use the (k-1)-th scorek-1 and second minimum expiration time of top-k points. • e4.ept = min((8 − 5)/2 + 10, 18) = 12 (use lemma 2) Min(12,18)=12
Cont. ei.ept = min((score(k) - score(pi))/2 +now, exp1) Score(pi) <= ei.score+ei.ept−ei.egt • e4.ept = min((7 − 5)/2 + 10, 13) = 11 (use lemma 1)
Cont. • Maintains (1) A counter for each cell, Cj.counter. (2) A counter for each point, Pi.counter.
Cont. n = 4 and k = 1 P5 arrive Increase by one all counters of cells fully dominated by the cell containing p5 p5.counter = c6.counter
Cont. • Pj is inserted into TOPK, increase by one the counters of all points dominated by Pjand expire before Pj • In the case of a top-k expiration, the counters of these points are decreased by one.
Cont. • Before update, (dominated by p1) (P1,P4) • p3.counter = 1, c10.counter = 2 • c10.counter − p3.counter=1. • After update, (dominated by p4) (p4,p5) • p3.counter = 1, c10.counter = 2 • c10.counter − p3.counter=0. • | Φ−(p3)| = 1
Cont. • Using Candidate Points • A top-3 dominating query is applied in 1000 points • score3 =200 • Score(Pi)=194 and ei will be processed in 3 time unit.
Cont. • if ei.ept ≤ now +Δt insert pi into the set of candidate points and ignore the event. • Δt is initialized to n/1000, • Decreases by one if the number of candidate points is greater than or equal to maxcand. (1% of sliding window size) • increases by one if exact score computations are taking place.
AHBA • The key idea is to maintain a random sample of points in a cell and compute their mean score.
Cont. • Assume • s independent random variables Z1, ...,Zs α ≤ Zi ≤ β, i = 1, ..., s • 1 • ξ =error bound • μ = E[Z] (the expectation of Z).
Cont. Hoeffding inequality • Set δ = Prob{μ − Z ≤ ξ}, • α=maxscoreβ=minscore
Cont. • The bound is used to compute the number of samples that must be maintained for each cell cj, in order to monitor the average score in cj . minscore(cj) ≤ score(pi) ≤ maxscore(cj) • Score of points in C1 • 3 ≤ score(pi) ≤ 5.
Cont. • set ξ = ε(β − α) • ε is percentage of score range
Cont. • if (1)maxscore(cj) > scorek− Δt • (2)Cj is not dominated by another cell with a sample keep a sample for cj .
AMSA • If the score of a point is low, we expect that it will remain low during its lifetime. • if score1 − scorek< scorek−minscore(Cj) All points belonging to cj are excluded from event processing
Conclusion • This paper is the first study of top-k dominating query processing algorithms in a streaming environment. • The three algorithms, ADA, AHBA and AMSA, can work in combination.