1 / 15

Top- k Queries on Uncertain Data

Top- k Queries on Uncertain Data. 指導教授:陳良弼 老師 報告者:鄧雅文 97753034. Outline. Introduction Related Work Problem Formulation Future Work. Introduction. Top- k query on certain data Rank results according to a user-defined score Important for explore large databases

omana
Download Presentation

Top- k Queries on Uncertain Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Top-k Queries on Uncertain Data 指導教授:陳良弼 老師 報告者:鄧雅文 97753034

  2. Outline • Introduction • Related Work • Problem Formulation • Future Work

  3. Introduction • Top-k query on certain data • Rank results according to a user-defined score • Important for explore large databases • E.g., top-2 = {T1, T2}

  4. Introduction (cont.) • Uncertain database • How to define top-k on uncertain data? • Mutually exclusive rules • E.g., T1♁T4

  5. Related Work • C. C. Aggarwal and P. S. Yu. A Survey of Uncertain Data Algorithms and Applications. In TKDE, 2009. • Causes: • Sensor networks,privacy, trajectories prediction… • The main areas of research on the uncertain data: • Modeling of uncertain data • Uncertain data management • Top-k query, range query, NN query… • Uncertain data mining • Clustering, classification, frequent pattern, outliers…

  6. Related Work (cont.) • M. Soliman, I. Ilyas, and K. Chang. Top-k Query Processing in Uncertain Databases. In ICDE, 2007. • Possible Worlds

  7. Related Work (cont.) • U-Topk query • Return k tuples that can co-exist in a possible world with the highest probability • E.g., {T1, T2} as U-Top2 • U-kRanks query • Return k tuples each of which is a clear winner in its rank over all possible worlds • E.g., {T2, T6} as U-2Ranks

  8. Related Work (cont.) • M. Hua, J. Pei, W. Zhang, X. Lin. Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach. In SIGMOD, 2008. • PT-k query • Return a set of all tuples whose top-k probability values are at least p • E.g., {T1, T2, T5} as PT-2 (with p=0.4)

  9. Related Work (cont.) • T. Ge, S. Zdonik, and S. Madden. Top-k Queries on Uncertain Data: On Score Distribution and Typical Answers. In SIGMOD, 2009. • The tradeoff between reporting high-scoring tuples and tuples with a high probability of being in the top-k • Return a number of typical vectors that efficiently sample the distribution of all potential top-ktuple vectors

  10. Problem Formulation • Example: • In an International Tenpin Bowling Championship, the events include single, double, and trio. Due to the budget, the coach can only choose 3 players to attend. Therefore, we hope these 3 players can have relatively high probability to perform well over these 3 types of events.

  11. Problem Formulation (cont.) • U-Top3={T2, T5, T6} • But U-Top2={T1, T2}, U-Top1={T1} • How about also considering {T1, T2, T5} as top-3?

  12. Problem Formulation (cont.) • We choose the answers of a top-k query not only depending on the probability (P) but also on the confidence (C). • Confidence: to express the top-(k-1) probabilities of the sets formed by k-1 tuples of this possible top-k answer • E.g., k=3 {T1, T2, T3} as a possible top-k with P=0.0356 C is composed in some way of Pr({T1, T2}) to be top-2=0.2542 and its confidence, Pr({T1, T3}) to be top-2=0.0218 and its confidence, Pr({T2, T3}) to be top-2=0.0512 and its confidence

  13. Problem Formulation (cont.) • Since every possible top-k answer has two features—probability (P) and confidence (C), we only return those non-dominated ones as a result set. • E.g., {T1, T3, T5}: P=0.8, C=0.4 {T1, T4, T7}: P=0.5, C=0.7 {T2, T6, T7}: P=0.3, C=0.2  this will not be returned

  14. Future Work • Formulate the confidence function • Find an algorithm to generate the result set • Try to calculate the confidence in an efficient way • Carry out an empirical study on datasets

  15. Thank you!

More Related