Top-k Query Evaluation with Probabilistic Guarantees

Top-k Query Evaluation with Probabilistic Guarantees By Martin Theobald, Gerald Weikum, Ralf Schenkel

Content • Problem • Past algorithms • Contribution in this paper • Approach • Differences • Results, Observation and Conclusion

Relevance Searching • Interested in only one or few relevant and novel data items/links • User may not care if some the links are not that useful • Precision, the fraction of the top-k which is actually in the true topk

Algorithms we have learned … • Fagin’s TA algorithm • TA-Random • Problem with TA-Random, random accesses are expensive • TA-Sorted • Problem with TA-sorted, sorted indices may not be always available

Contribution • Probabilistic threshold testp(d) • Looking at the current seen part of the score, “What is the probability that the tuple can be in final top-k?”

Approach • Probabilistic score prediction • Uniform distribution • Histograms • Poisson Distributions • Approximation technique which is computationally cheaper than histograms

Histogram Probability ∑ Probability = 1 0 150 Buckets and Value Ranges

Algorithms • Conservative Algorithm • Aggressive Algorithm • Progressive Algorithm • Smart Algorithm

Conservative Algorithm • Simply predict the scores of each candidate object in every step • Maintains priority queue for each group of unseen part • Incur very high overload for probabilistic threshold test

Aggressive Algorithm • If the score of object falls below the threshold min-k the algorithm stops immediately • Minimal overhead but result precision is low

Progressive Algorithm • Between conservative and aggressive • Tracks the best score changes after uniform interval • Maintains a single priority Queue

Smart Algorithm • Rebuilding the entire queue is also a costly operation when the queue is large in case of big datasets • Maintains only bounded priority Queue, whenever its rebuilt only best b elements are kept

Experiment

Conclusion • Probabilistic score predictions can be very beneficial in terms of execution time for trading for some amount of top-k result quality

Top-k Query Evaluation with Probabilistic Guarantees

Top-k Query Evaluation with Probabilistic Guarantees

Presentation Transcript

Top-k Query Processing in Uncertain Database

Query Evaluation

Top-k Query Processing

On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases

Top-K Query Evaluation on Probabilistic Data

Efficient Top-K Query Evaluation on Probabilistic Data

A Randomized Scheduler with Probabilistic Guarantees of Finding Bugs

Top- K Query Evaluation with Probabilistic Guarantees

XPath Query Evaluation - A Top Down Approach

Query Evaluation

Top-k Query Processing and Optimization

Efficient Query Evaluation on Probabilistic Databases

Query Evaluation

Query Evaluation

Xpath Query Evaluation

IO-Top-k: Index-access Optimized Top-k Query Processing

Efficient Top-K Query Calculation in Distributed Networks

Efficient Top-k Query Evaluation on Probabilistic Data

Probabilistic Structured Query Methods

Efficient Query Evaluation on Probabilistic Databases

Top-K Query Processing Techniques for Distributed Environments

IO-Top-k: Index-access Optimized Top-k Query Processing