170 likes | 282 Views
Aggregate Query Answering under Uncertain Schema Mappings. Avigdor Gal, Maria Vanina Martinez, Gerardo I. Simari, VS Subrahmanian Presented By Stephen Lynn. Overview. Aggregate Queries Probabilistic Schema Mapping Goals/Objectives Aggregate Processing (3 proposals) By-Table Algorithm
E N D
Aggregate Query Answering under Uncertain Schema Mappings Avigdor Gal, Maria Vanina Martinez, Gerardo I. Simari, VS Subrahmanian Presented By Stephen Lynn
Overview • Aggregate Queries • Probabilistic Schema Mapping • Goals/Objectives • Aggregate Processing (3 proposals) • By-Table Algorithm • By-Tuple Algorithm • Evaluation • Analysis
Aggregate Queries COUNT, MIN, MAX, SUM, AVG Simple PTIME algorithms to compute
By-Table vs By-Tuple • Tuple – consider all possible mappings for each tuple • Table – single mapping for entire table • P(date→postedDate) = 0.7 • P(date→reducedDate) = 0.3
Goals/Objectives • Impact Analysis of Probabilistic Schemas on Aggregate Queries • Aggregate Query Algorithms • Time Complexity Analysis • Evaluation
Aggregation Methods Range Distribution Expected Value
Method Relationships • Distribution • Most time consuming • Most information • Range • Computed directly from distribution • Expected Value • Computed directly from distribution More efficient ways to compute
By-Table Algorithm All PTIME computable
By-Tuple Algorithm (COUNT) O(n * m)
Evaluation • Empirical Evaluation • Real-world dataset (eBay) • Synthetic dataset • Evaluate Time Complexity • Vary tuple numbers • Vary attribute mappings
Analysis • Strengths • Effect of probabilistic schemas on aggregates • Nice PTIME algorithms • Weaknesses • Evaluation was obvious • By-Table results biased by database optimizations • Future Work • Improve algorithms • Extend to sub-queries • Heuristics