180 likes | 363 Views
On random sampling over Joins. Surajit Chaudhuri Rajeev Motwani Vivek Narasayya. Presented by : Srikantha Nema. Outline. Semantics of Sample Difficulty of join Sampling Algorithms for Sampling Sampling strategies New strategies for join Sampling Experimental evaluation
E N D
On random sampling over Joins SurajitChaudhuri Rajeev Motwani VivekNarasayya Presented by : SrikanthaNema
Outline Semantics of Sample Difficulty of join Sampling Algorithms for Sampling Sampling strategies New strategies for join Sampling Experimental evaluation Conclusions
Terminologies SAMPLE(R, f) is an SQL operation When a query Q is evaluated, we obtain relation R f is a fraction of a relation R
Semantics of Sample Sampling with Replacement (WR) Sampling without Replacement (WoR) Independent Coin Flips (CF)
Classification of Join Sampling problem • Case A • No information is available for either or • Case B • No information is available for but indexes and /or statistics are available for • Case C • Indexes/statistics are available for and
Algorithms for Sampling • Unweighted Sequential WR Sampling • Black-Box U1 • Black-Box U2 • Weighted Sequential WR Sampling • Black-Box WR1 • Black-Box WR2
Unweighted Sequential WR Sampling Black-Box U1 Black-Box U2
Weighted Sequential Sampling Black-Box WR1 Black-Box WR2
Sampling Strategies (old) Strategy Naïve-Sample Strategy Olken-Sample
New strategies for join Sampling Strategy Stream-Sample Strategy Group-Sample Strategy Frequency-Partition-Sample
Conclusions Difficulty of join sampling Classification of the problem into 3 cases Strategies for join sampling New schemes for sequential random sampling for uniform and weighted sampling More efficient strategies can be developed for the case of single join More work needed to understand the problem of sampling the result of join trees