110 likes | 305 Views
Sampling Time-Based Sliding Windows in Bounded Space. Rainer Gemulla and Wolfgang Lehner SIGMOD 2008. Outline. Motivation Priority sampling Bounded priority sampling Correctness and analysis Sampling multiple items Experimental results Conclusion. Motivation.
E N D
Sampling Time-Based Sliding Windows in Bounded Space Rainer Gemulla and Wolfgang Lehner SIGMOD 2008 Chen Yi-Chun
Outline • Motivation • Priority sampling • Bounded priority sampling • Correctness and analysis • Sampling multiple items • Experimental results • Conclusion Chen Yi-Chun
Motivation • Random sampling is an appealing approach to build synopses of large data streams. • In this paper, author is concerned with sampling schemes that maintain a uniform sample of a time-based sliding window in bounded space. • Main challenge is to guarantee an upper bound on the space consumption of the sample. Chen Yi-Chun
Notation definition • : the set of items from R with a timestamp smaller than or equal to t • : a sliding window of length • : the size of the window at time t • Window length : the timespan covered by the window ( ,fixed) • Window size : the number of items in the window (N(t),varying) • S(t) : uniform random sample Chen Yi-Chun
Priority sampling • The replacement set is the reason for the unbounded space consumption of the sampling scheme. Chen Yi-Chun
Bounded priority sampling • a) Arrival of item • becomes the new candidate item • There is currently no candidate item • The priority of is larger than priority of the candidate item • b) Expiration of candidate item : becomes test item • c) Double expiration of test item : discard Chen Yi-Chun
Correctness and analysis p’ pmax emax e’ Chen Yi-Chun
Cont. In the worst case, e’ equals the highest-priority item in W(t- ) p’ pmax emax e’ Chen Yi-Chun
Sampling Multiple Items • BPSWOR(BPS without-replacement): • Modify BPS so as to store k candidates and k test items simultaneously. p1 p2 e2 e1 |Scand|< k Chen Yi-Chun
Each item of the data stream consists of a 8-byte timestamp and 32 bytes of dummy data Experimental results • A space budget of 32 kbytes • At most 819 items can be stored in 32 kbytes space Chen Yi-Chun
Conclusion • It has studied bounded –space techniques for maintaining uniform samples over a time-based sliding window of a data stream. Chen Yi-Chun