1 / 35

Continuous Top-k Dominating Queries

Continuous Top-k Dominating Queries. Maria Kontaki, Apostolos N. Papadopoulos, Yannis Manolopoulos TKDE.2011. Outline. Introduction Background Method EVA(event-based algorithm) AVA( ADvanced Algorithm) AHBA( Approximate Hoeffding Bound Algorithm )

mieko
Download Presentation

Continuous Top-k Dominating Queries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Continuous Top-k Dominating Queries Maria Kontaki, Apostolos N. Papadopoulos, Yannis Manolopoulos TKDE.2011

  2. Outline • Introduction • Background • Method • EVA(event-based algorithm) • AVA(ADvanced Algorithm) • AHBA(Approximate Hoeffding Bound Algorithm) • AMSA(Approximate Minimum Score Algorithm) • Experiment • Conclusion

  3. Introduction • Combine top-k and skyline query Advantages: • The output size can be controlled. • No ranking functions need to be specified by users.

  4. Background • Adopt the count-based sliding window • Sliding window sizen • Arrival time of pi pi.arr • Expiration time of pi pi.exp • pi.exp = pi.arr + n • j-th best scorescorej

  5. Cont. • C6 full dominate C11、C12、C15、C16 C6 partially dominate C6、C7、C8、C10、C14

  6. Cont. • Exact score computation • P4 dominate p6、p11 (in partially dominated cells) P8、P9、P12 (in full dominated cells) score(p4) =2+3=5

  7. Method--EVA • Event-Based Processing • 第K高的分數 Pi的分數 ,top-k中最快過期的那個資料 目前時間

  8. Cont. • Event processing time, denoted as ei.ept ei.ept = min((score(k) - score(pi))/2 +now, exp1) • Event generation time, denoted as ei.egt • The score of pi at time ei.egt, denoted as ei.score

  9. Cont. • Upper-Bounding Scores of Existing Points • Score(pi) <= ei.score+ei.ept−ei.egt • Score(P10)=9, Score(P3)=8, Score(P8)=7

  10. Cont. ei.ept = min((score(k) - score(pi))/2 +now, exp1) • Illustrate the computation of the event e7 of p7 • Score(P7)=3 , e7.egt = 10 • e7.ept = min((7 − 3)/2 + 10, 13) = 12 • Min(12,13)=12

  11. Cont. ei.ept = min((score(k) - score(pi))/2 +now, exp1) Score(pi) <= ei.score+ei.ept−ei.egt • score(p7) ≤ e7.score + e7.ept − e7.egt =3+12−10 = 5 • e7.egt = 12 • e7.ept = min((7 − 5)/2 + 12, 13) = 13 Min(13,13)=13

  12. Cont. • When score(ei)>scorek Lead to exact computation else reschedule(ei)

  13. Cont. • Upper-Bounding Scores of Incoming Points • Px arrive . • Determine Pr • (i) it dominates the incoming point Px • (ii) it is not part of TOPK.

  14. Cont. • score(Pr) ≤ er.score + now − er.egt. • score(Px) ≤ score(Pr)−1

  15. Cont.

  16. ADA • Φ(Pi) = Φ+(Pi) ∪ Φ−(Pi) part of top-k not part of top-k

  17. Cont. ei.ept = min((score(k-r) - score(pi))/2 +now, exp2) • |Φ−(P4)|=1 , use the (k-1)-th scorek-1 and second minimum expiration time of top-k points. • e4.ept = min((8 − 5)/2 + 10, 18) = 12 (use lemma 2) Min(12,18)=12

  18. Cont. ei.ept = min((score(k) - score(pi))/2 +now, exp1) Score(pi) <= ei.score+ei.ept−ei.egt • e4.ept = min((7 − 5)/2 + 10, 13) = 11 (use lemma 1)

  19. Cont. • Maintains (1) A counter for each cell, Cj.counter. (2) A counter for each point, Pi.counter.

  20. Cont. n = 4 and k = 1 P5 arrive Increase by one all counters of cells fully dominated by the cell containing p5 p5.counter = c6.counter

  21. Cont. • Pj is inserted into TOPK, increase by one the counters of all points dominated by Pjand expire before Pj • In the case of a top-k expiration, the counters of these points are decreased by one.

  22. Cont. • Before update, (dominated by p1) (P1,P4) • p3.counter = 1, c10.counter = 2 • c10.counter − p3.counter=1. • After update, (dominated by p4) (p4,p5) • p3.counter = 1, c10.counter = 2 • c10.counter − p3.counter=0. • | Φ−(p3)| = 1

  23. Cont. • Using Candidate Points • A top-3 dominating query is applied in 1000 points • score3 =200 • Score(Pi)=194 and ei will be processed in 3 time unit.

  24. Cont. • if ei.ept ≤ now +Δt insert pi into the set of candidate points and ignore the event. • Δt is initialized to n/1000, • Decreases by one if the number of candidate points is greater than or equal to maxcand. (1% of sliding window size) • increases by one if exact score computations are taking place.

  25. AHBA • The key idea is to maintain a random sample of points in a cell and compute their mean score.

  26. Cont. • Assume • s independent random variables Z1, ...,Zs α ≤ Zi ≤ β, i = 1, ..., s • 1 • ξ =error bound • μ = E[Z] (the expectation of Z).

  27. Cont. Hoeffding inequality • Set δ = Prob{μ − Z ≤ ξ}, • α=maxscoreβ=minscore

  28. Cont. • The bound is used to compute the number of samples that must be maintained for each cell cj, in order to monitor the average score in cj . minscore(cj) ≤ score(pi) ≤ maxscore(cj) • Score of points in C1 • 3 ≤ score(pi) ≤ 5.

  29. Cont. • set ξ = ε(β − α) • ε is percentage of score range

  30. Cont. • if (1)maxscore(cj) > scorek− Δt • (2)Cj is not dominated by another cell with a sample keep a sample for cj .

  31. AMSA • If the score of a point is low, we expect that it will remain low during its lifetime. • if score1 − scorek< scorek−minscore(Cj)  All points belonging to cj are excluded from event processing

  32. experiment

  33. Cont.

  34. Cont.

  35. Conclusion • This paper is the first study of top-k dominating query processing algorithms in a streaming environment. • The three algorithms, ADA, AHBA and AMSA, can work in combination.

More Related