1 / 11

Sampling Time-Based Sliding Windows in Bounded Space

Sampling Time-Based Sliding Windows in Bounded Space. Rainer Gemulla and Wolfgang Lehner SIGMOD 2008. Outline. Motivation Priority sampling Bounded priority sampling Correctness and analysis Sampling multiple items Experimental results Conclusion. Motivation.

kimberly
Download Presentation

Sampling Time-Based Sliding Windows in Bounded Space

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sampling Time-Based Sliding Windows in Bounded Space Rainer Gemulla and Wolfgang Lehner SIGMOD 2008 Chen Yi-Chun

  2. Outline • Motivation • Priority sampling • Bounded priority sampling • Correctness and analysis • Sampling multiple items • Experimental results • Conclusion Chen Yi-Chun

  3. Motivation • Random sampling is an appealing approach to build synopses of large data streams. • In this paper, author is concerned with sampling schemes that maintain a uniform sample of a time-based sliding window in bounded space. • Main challenge is to guarantee an upper bound on the space consumption of the sample. Chen Yi-Chun

  4. Notation definition • : the set of items from R with a timestamp smaller than or equal to t • : a sliding window of length • : the size of the window at time t • Window length : the timespan covered by the window ( ,fixed) • Window size : the number of items in the window (N(t),varying) • S(t) : uniform random sample Chen Yi-Chun

  5. Priority sampling • The replacement set is the reason for the unbounded space consumption of the sampling scheme. Chen Yi-Chun

  6. Bounded priority sampling • a) Arrival of item • becomes the new candidate item • There is currently no candidate item • The priority of is larger than priority of the candidate item • b) Expiration of candidate item : becomes test item • c) Double expiration of test item : discard Chen Yi-Chun

  7. Correctness and analysis p’ pmax emax e’ Chen Yi-Chun

  8. Cont. In the worst case, e’ equals the highest-priority item in W(t- ) p’ pmax emax e’ Chen Yi-Chun

  9. Sampling Multiple Items • BPSWOR(BPS without-replacement): • Modify BPS so as to store k candidates and k test items simultaneously. p1 p2 e2 e1 |Scand|< k Chen Yi-Chun

  10. Each item of the data stream consists of a 8-byte timestamp and 32 bytes of dummy data Experimental results • A space budget of 32 kbytes • At most 819 items can be stored in 32 kbytes space Chen Yi-Chun

  11. Conclusion • It has studied bounded –space techniques for maintaining uniform samples over a time-based sliding window of a data stream. Chen Yi-Chun

More Related