1 / 45

Best-Effort Top-k Query Processing Under Budgetary Constraints

Best-Effort Top-k Query Processing Under Budgetary Constraints. Michal Shmueli-Scheuer (IBM Haifa Research Lab and UCI). Yosi Mass, Haggai Roitman. Chen Li. Ralf Schenkel, Gerhard Weikum. Mobile Applications Highly impatient users, need fast results. Motivating Example.

Download Presentation

Best-Effort Top-k Query Processing Under Budgetary Constraints

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Best-Effort Top-k Query Processing Under Budgetary Constraints Michal Shmueli-Scheuer (IBM Haifa Research Lab and UCI) Yosi Mass, Haggai Roitman Chen Li Ralf Schenkel, Gerhard Weikum

  2. Mobile Applications Highly impatient users, need fast results. Motivating Example Mediation Systems Achieve high query throughput. Top-k Top-k queries results Engine Online Analytics (e.g. logs) Achieve high query throughput. Michal Shmueli-Scheuer

  3. Traditional top-k query • Pre-computed lists over multiple attributes. • Combine scores by some monotonic aggregation function. • Two accesses modes: • sorted access (Cs) • random access (Cr) • Objective:Compute k objects with highest scores. sorted n m Michal Shmueli-Scheuer

  4. NRA algorithm (Fagin et al.) Top-2 Best score Worst score highi f = SUM mink candidates mink > best-score of candidates Michal Shmueli-Scheuer

  5. NRA algorithm (Fagin et al.) Top-2 Best score Worst score highi mink candidates mink > best-score of candidates Michal Shmueli-Scheuer

  6. NRA algorithm (Fagin et al.) Top-2 Best score Worst score highi mink candidates mink > best-score of candidates Michal Shmueli-Scheuer

  7. Access Costs Sorted access cost- Cs Random access cost- Cr Top-k with Budget Constraints Top-2 NRA: 12Cs = 12 precision =0.5 Given budget B, maximize result quality Cs=1, Cr =3 f = SUM TA: 7Cs +7Cr = 28 precision =0 Budget =10 ? Michal Shmueli-Scheuer

  8. Contributions • Sorted Accesses • Efficient Plan • Solution with Adaptive a • Sorted and Random Accesses • Efficient Plan • Solution with Adaptive a • Experiments Michal Shmueli-Scheuer

  9. Results Under Limited Budget Results for limited budget K results for unlimited budget Michal Shmueli-Scheuer

  10. L1 L2 Top-2 o8, SL1 o2, SL2 o1 o4, SL2 P1 o1, SL1 o5 • Interesting positions-where the k objects appear in the lists. Q1 o5, SL2 o6, SL1 o5, SL1 P2 o3, SL2 o1, SL2 Q2 Efficient Plan- Sorted Accesses • Assume that we know the k results for unlimited budget (REXACT). • Plan – {L1,4} {L2,2} Michal Shmueli-Scheuer

  11. L1 L2 o8, SL1 o2, SL2 o4, SL2 P1 o1, SL1 Q1 o5, SL2 o6, SL1 o5, SL1 P2 o3, SL2 o1, SL2 Q2 Plan: {L1,2} {L2,3} Efficient Plan- Sorted Accesses • Goal: find plan t, such that : Plans for B=5 Denoted as ROPT Michal Shmueli-Scheuer

  12. Sorted Accesses • Observations: L1 L2 L3 O1, SL1 O1, SL2 O2, SL1 O2, SL2 O2, SL3 Prefer high scores Michal Shmueli-Scheuer

  13. Observations – contd. title=“war” description=“weapon” Prefer large score reductions Michal Shmueli-Scheuer

  14. o2, 1 o4, 0.9 o5, 0.8 o3, 0.7 o1, 0.6 Score Utilities Score gain: Score reduction: y =3 Michal Shmueli-Scheuer

  15. Optimization Problem • Bi-objective optimization problem: util(Li,x) = a* gain +(1-a)* reduction Heuristics: • Fair Heuristic • Rank Heuristic Where m is the number of lists Michal Shmueli-Scheuer

  16. Adaptive  gain reduction )) (1-( time Michal Shmueli-Scheuer

  17. L1 L2 L3 O1, SL1 O1, SL2 O1, SL3 Adaptive  top-k o1 [ws,bs] o2 [ws,bs] d(o4) = 0.8-0.6=0.2 o3 [0.8,bs] candidates hight1 o4 [0.6,bs] hight2 o6 [ws,bs] Theobald et al. VLDB04 Michal Shmueli-Scheuer

  18. TREC query, k=100 Adaptive  Michal Shmueli-Scheuer

  19. Efficient Plan- Random Accesses • Observations: • random accesses occur always after sorted accesses have been finished. schedule 1: {SA……RA……SA….} schedule 2: {SA……SA……RA….} precision(schedule1) = precision(schedule2) Michal Shmueli-Scheuer

  20. o1 [ws,bs] o2 [ws,bs] o3 [ws,bs] Observations- contd. • Random accesses are only useful to objects in REXACT. top-k L2 o1 [ws,bs] o2, SL2 Precision reduced o5 [ws,bs] o5, Not in REXACT o2 [ws,bs] o5, SL2 candidates o4 [ws,bs] o1, SL2 o5 [ws,bs] Precision remains the same Michal Shmueli-Scheuer

  21. Gathering with Sorted Not enough good candidates, RA is wasted Probing with Random Not enough RAs to prune the candidates Random Accesses • When to switch from SA to RA? )( (1-( time Michal Shmueli-Scheuer

  22. S+R > B Random Accesses • Switch from Sorted to Random: R= (1- )*S S – total cost of sorted accesses. R – total cost for random accesses. • Which items to access ? • maximize expected score. Michal Shmueli-Scheuer

  23. Experimental Data • TREC Terabyte • 25M webpages • 50 queries with average length of 3 words. • IMDB • 375,000 movies • 20 queries , each with 4 attributes: {Title, Genre, Actors, Description} • Synthetic data • Zipf, #lists =[2,6], #objects =[10000,1000000] • Aggregate Function : Sum Michal Shmueli-Scheuer

  24. Evaluation Methods • percentage of optimal precision Ropt Rexact Ralg Ropt • SME Michal Shmueli-Scheuer

  25. Results- Sorted Accesses TREC, k=100 Less budget, more improvement Michal Shmueli-Scheuer

  26. Varied k IMDB, B=400 Lower K, more improvement. Michal Shmueli-Scheuer

  27. Number of Lists Zipf, K=100, B=4000 More lists, more improvement. Michal Shmueli-Scheuer

  28. Results- Random Accesses TREC, k=100,Cr=10 TREC, K=100, Cr=100

  29. Related Works • Minimize budget for optimal results: • the algorithm computes the exact results with minimum cost. (Bast et al. VLDB06, Bruno et al. ICDE02, Chang et al. SIGMOD02) • Dual problem. • Anytime top-k : • The algorithm collects statistics during processing, which can be used to provide probabilistic guarantees at any time during processing. (Aray et al. VLDB07) • Do not do any optimizations. • Approximate top-k: • approximate results with probabilistic guarantees. (Theobald et al. VLDB04, Fagin et al. 2001) Michal Shmueli-Scheuer

  30. Conclusions • First attempt to deal with budget constraints. • For SA only, average precision around 70%. • Tradeoff between RAs and SAs, for relatively low cost of RA, RA schedules are improved. Michal Shmueli-Scheuer

  31. Thank You !

  32. Top-k query • Given a set of n objects and m scoring lists sorted in decreasing order, find the top-k objects according to a scoring function f • top-k: a set T of k objects such that f(rj1,…,rjm) ≤ f(ri1,…,rim)for every objectXi in T and every object Xjnot in T • Assumption: The scoring function f is monotone • f(r1,…,rm) ≤ f(r1’,…,rm’)ifri ≤ ri’for allI • Two accesses modes: • sorted access – Cs • random access - Cr • Objective:Compute top-k with the minimum cost

  33. L1 L2 L3 O1, SL1 O1, SL2 O1, SL3 Sorted Accesses • Observations: • object with high scores has higher potential to be part of the top-k. • object with “mediocre” scores does not help. Prefer high scores

  34. Q Wireless zone Example useless

  35. Applications • Mobile Applications • Highly impatient users, need fast results. • Mediation Systems • Achieve high query throughput. • Online analytics (e.g. logs) • Achieve high query throughput. Michal Shmueli-Scheuer

  36. Servers Mediator Engine User query Motivating Example Query throughput Allocate time for each query Given #queries per time unit

  37. Terminology • Sorted Access • Random Access • highi • Top-k queue • Candidates queue • mink • worstScore(d) • bestScore(d)

  38. L1 L2 o8, SL1 o2, SL2 o4, SL2 P1 o1, SL1 P1 o5, SL2 o6, SL1 o5, SL1 P2 o3, SL2 o1, SL2 P2 Efficient Offline Solution- Sorted • Goal: find trace t, such that : L1 L2 B=5 Denoted as ROPT

  39. L1 L2 o8, SL1 o2, SL2 o4, SL2 P1 o1, SL1 P1 o5, SL2 o6, SL1 o5, SL1 P2 o3, SL2 o1, SL2 P2 Efficient Offline Solution- Sorted • Goal: find trace t, such that : B =5 L1 L2 • Feasible for K up to 100, and m up to 10.

  40. Efficient Offline Solution- Sorted • Proof: (in negation) • Assume that t does not exists, and chose trace s that within the budget and has optimal precision. Assume s` with traces s`i that are largest position of Pi less or equal to si. • By construction the score of any object in S is the same to S`

  41. Fair Heuristic • Assume budget =b Runs in batches

  42. d Rexact best(o)-mink (best(o) = wosrt(o)+RA) o5, S o8, S o7, S o9, S …. …. Efficient Offline Solution- Random • Budget for RAs =(B-|t|*Cs) Top-k o1, S o2, S o3, S o4, S o10, S o14, S ….

  43. Motivation • Many applications work in budgeted constraint environments. Still, they wish to perform top-k queries. Servers Budget-aware Query processing Mediator Engine User query

  44. Future work • Different access costs for different lists • Time-aware top-k • Top-k with budget constraints for P2P

More Related