190 likes | 201 Views
This paper delves into the evaluation of top-k queries with probabilistic guarantees. It examines past algorithms, introduces new approaches, and discusses results and conclusions. The focus is on precision and relevance in top-k query search. The study evaluates various algorithms, including Fagin’s TA algorithm, TA-Random, TA-Sorted, and probabilistic threshold testing. Different progressive and smart algorithms are explored for efficient top-k query computation. The conclusion emphasizes the benefits of probabilistic score predictions for enhancing execution time without compromising result quality.
E N D
Top-k Query Evaluation with Probabilistic Guarantees By Martin Theobald, Gerald Weikum, Ralf Schenkel
Content • Problem • Past algorithms • Contribution in this paper • Approach • Differences • Results, Observation and Conclusion
Relevance Searching • Interested in only one or few relevant and novel data items/links • User may not care if some the links are not that useful • Precision, the fraction of the top-k which is actually in the true topk
Content • Problem • Past algorithms • Contribution in this paper • Approach • Differences • Results, Observation and Conclusion
Algorithms we have learned … • Fagin’s TA algorithm • TA-Random • Problem with TA-Random, random accesses are expensive • TA-Sorted • Problem with TA-sorted, sorted indices may not be always available
Content • Problem • Past algorithms • Contribution in this paper • Approach • Differences • Results, Observation and Conclusion
Contribution • Probabilistic threshold testp(d) • Looking at the current seen part of the score, “What is the probability that the tuple can be in final top-k?”
Content • Problem • Past algorithms • Contribution in this paper • Approach • Differences • Results, Observation and Conclusion
Approach • Probabilistic score prediction • Uniform distribution • Histograms • Poisson Distributions • Approximation technique which is computationally cheaper than histograms
Histogram Probability ∑ Probability = 1 0 150 Buckets and Value Ranges
Algorithms • Conservative Algorithm • Aggressive Algorithm • Progressive Algorithm • Smart Algorithm
Conservative Algorithm • Simply predict the scores of each candidate object in every step • Maintains priority queue for each group of unseen part • Incur very high overload for probabilistic threshold test
Aggressive Algorithm • If the score of object falls below the threshold min-k the algorithm stops immediately • Minimal overhead but result precision is low
Progressive Algorithm • Between conservative and aggressive • Tracks the best score changes after uniform interval • Maintains a single priority Queue
Smart Algorithm • Rebuilding the entire queue is also a costly operation when the queue is large in case of big datasets • Maintains only bounded priority Queue, whenever its rebuilt only best b elements are kept
Content • Problem • Past algorithms • Contribution in this paper • Approach • Differences • Results, Observation and Conclusion
Conclusion • Probabilistic score predictions can be very beneficial in terms of execution time for trading for some amount of top-k result quality