1 / 16

On Random Sampling over Joins

On Random Sampling over Joins. Surajit Chaudhuri Rajeeve Motwani Vivek Narasayya Microsoft Research Stanford University Microsoft Research. Subtitles:. The difficulty of join sampling - Example. Semantic and algorithms of sample Two previous sampling strategies

lonato
Download Presentation

On Random Sampling over Joins

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On Random Sampling over Joins Surajit Chaudhuri Rajeeve Motwani Vivek Narasayya Microsoft ResearchStanford UniversityMicrosoft Research

  2. Subtitles: • The difficulty of join sampling - Example. • Semantic and algorithms of sample • Two previous sampling strategies • New strategies for join sampling • Experiment’s results

  3. The Difficulty of Join Sampling -Example: • Suppose that we have the relations

  4. Black-Box U2: Given relation R with n tuples, generate an unweighted WR sample of size r. • 1. • 2. Initialize reservoir array A[1..r] with r dummy values. • 3. While tuples are streaming by do begin (a) get next tuple t; (b) (c) for j=1 to r set A[j] to t with probability 1/N end

  5. Black-Box WR2: Given relation R with n tuples, generate a weighted WR sample of size r. • 1. • 2. Initialize reservoir array A[1…r] with r dummy values. • 3. While tuples are streaming by do begin (a) get next tuple t with weight w(t); (b) (c) for j=1 to r do set A[j] to t with prob. w(t)/W end.

  6. The Classification of the Problem: • Case A :No information is available for either or . • Case B : No information is available for but indexes and /or statistics are available for . • Case C : Indexes/statistics are available for and .

  7. Previous Sampling Strategies Strategy Naive-Sample: 1. Compute the join . 2. As the tuples of J stream by, use Black-Box U1 or U2 to produce .

  8. Previous Sampling Strategies Strategy Olken-Sample: 1. Let M be an upper bound on for all . 2.repeat (a) Sample a tuple uniformly at random. (b) Sample a random tuple from among all tuples that have . (c) Output with probability , and with remaining probability reject the sample. Until r tuples have been produced.

  9. New Strategies for Join Sampling • Strategy Stream Sample is more efficiency then Olken : 1. No information is required for - case B. 2. No tuple is rejected after computing the join . 3. Only one iteration is needed for each output tuple.

  10. New Strategies for Join Sampling Strategy Stream Sample: 1. Use Black-Box WR1 or WR2 to produce a WR sample of size r, where the weight for a tuple is set to 2. While tuples of are streaming by do begin (a) get next tuple and let ; (b) sample a random tuple from among all tuples that have ; (c) output . end.

  11. New Strategies for Join Sampling Strategy Group Sample 1. Use Black-Box WR1 or WR2 to produce a WR sample of size r, where the weight for a tuple is set to . 2. Let consist of the tuples . Produce whose tuples are grouped by ‘s tuples that generated them. 3. Use r invocations of Black-Box U1 or U2 to sample r sample, one of each group.

  12. New Strategy for Join Sampling • Strategy Frequency-Partition-Sample

  13. Experimental Results:

  14. Experimental Results:

  15. Experimental Results:

  16. Summery • The difficulty of join sampling- example. • The classification of the problem - 3 cases. • Naive-sample Olken-sample previous strategies • Stream-sample Group-sample new strategies Frequency-partition-sample • Conclusion : The new strategies are better then the earlier techniques.

More Related