300 likes | 486 Views
Time-Decaying Sketches for Sensor Data Aggregation. Graham Cormode AT&T Labs, Research Srikanta Tirthapura Dept. of Electrical and Computer Engineering Iowa State University Bojian Xu Dept. of Electrical and Computer Engineering Iowa State University. 75F 11:39. 76F 11:34. 72F 11:29.
E N D
Time-Decaying Sketches for Sensor Data Aggregation Graham Cormode AT&T Labs, Research Srikanta Tirthapura Dept. of Electrical and Computer Engineering Iowa State University Bojian Xu Dept. of Electrical and Computer Engineering Iowa State University
75F11:39 76F11:34 72F11:29 73F11:19 78F11:41 78F11:41 73F11:39 73F11:39 76F11:38 76F11:38 76F11:26 76F11:26 79F11:30 70F11:22 76F11:15 76F11:45 80F11:38 79F11:30 76F11:25 76F11:45 73F11:40 Mean of the Temperatures in the Last 30 Minutes
75F11:39 76F11:34 72F11:29 73F11:19 78F11:41 73F11:39 76F11:38 76F11:26 79F11:30 70F11:22 76F11:15 76F11:45 80F11:38 79F11:30 76F11:25 76F11:45 73F11:40 Sketch
Sketch Merging Answer
General Time Decay • General Decay function: • Time decayed value of element at time c is: 0 age
Formal Model of the Data(on One Sensor) Data stream: e0=(v0,t0,id0), e1=(v1,t1,id1), … • v: value • t: timestamp of creation • id: a unique id of the observation • User defined Time Decay: • Asynchronous arrival: It is possible ti > tj, while i<j • Duplicates: idi = idj is possible • Assume: if idi = idj , then vi = vj, ti=tj
Contribution First mergable sketch combines the following:
Related Work • S. Nath, P. B. Gibbons, S. Seshan and Z. R. Anderson, “Synopsis diffusion for robust aggregation in sensor networks”, SenSys 2004 • J. Considine, F. Li, G. Kollios and J. Byers, “Approximate Aggregation Techniques for Sensor Databases”, ICDE 2004 • E. Cohen and M. Strauss, “Maintaining time-decaying stream aggregates”, PODS 2003; Journal of Algorithm 2006 • S. Tirthapura, B. Xu and C. Busch, “Sketching Asynchronous Streams Over Sliding Windows”, PODC 2006
Outline • Problem: Time decayedsum of distinct elements over an asynchronous stream. • Focus on Integral decay model: is always an integer
Estimate of the Sum (on One Sensor) • Given: • Stream: R = (v0,t0,id0),…, (vn,tn,idn), … • User defined decay function: f() • Maintain: • c: current time • D: set of distinct elements in R
Estimate of the Sum (cont’d) • Linear space lower bound on duplicate-insensitive sum (Alon, Matias and Szegedy, STOC 1996) • Deterministic approximate algorithm • Randomized algorithm giving accurate result • Goal: Continuously maintain an (, )-estimate of: • User inputs: • D: set of distinct elements in R An (, )- estimate for X is a random variable Y, such that Pr[|Y-X| > X] < .
√ √ √ √ Algorithm for Sum (High Level Picture) v1=4 v2=8 + Sum + Count Random Sampling SampleRate = p • Count the number of selected integers • Multiply by 1/p
Duplicate Detection Hash Function Random Sampling Select x Copy 1 √ √ √ Copy 2
Intuition - I (v,t,id) sample rate Sample By Chebyshev inequality, for anε-approximation of the count with constant probability:
Intuition - II • t • t+ • Sample rate ?
SIZE ?? Maintain Multiple Samples SampleRate pj p0 = 1 p1 = 1/2 p2 = 1/4
SIZE ?? SampleRate pj p0 = 1 p0 = 1 p1 = 1/2 p1 = 1/2 p2 = 1/4 p2 = 1/4 Faster Sampling • RangeSample(Pavan & Tirthapura, SICOMP 2007) • Efficiently compute the number of selected integers √ √ √
Binary search over [t, tmax] using RangeSample √ √ √ Expiry Time e=(v, t, id) At time: t At time: t √ √ √ At time: t + At time: t + = Expiry Time expiry time
Level 0 Level 1 Level 2 p=1 Sample 0 1/2 1/4 1/8 Sketch Structure Largest expiry time of all the elements discarded from the sample t0 t1 t2 Sketch
Level 0 Level 1 Level 2 p=1 1/2 1/4 (e1,22) (e1,19)
Level 0 Level 1 Level 2 p=1 1/2 1/4 (e1,22) (e2,23) (e3,21) (e1,19) (e2,21)
Level 0 Level 1 Level 2 p=1 1/2 1/4 Discard the element with smallest expiry time (e3,21) (e1,22) (e2,23) (e4,23) (e4,21) (e1,19) (e2,21)
Level 0 Level 1 Level 2 p=1 1/2 1/4 t0= 21 (e1,22) (e2,23) (e4,23) (e4,21) (e1,19) (e2,21)
Level 0 Level 1 Level 2 p=1 1/2 1/4 Duplicate t0= 21 (e1,22) (e2,23) (e4,23) (e4,21) (e1,19) (e2,21)
Level 0 Level 1 Level 2 Level used to answer the query p=1 1/2 1/4 Answer a Query for the Decayed Sum Current time = 20 t0= 21 (e1,22) (e2,23) (e4,23) (e4,21) (e1,19) (e2,21) e2 e4 √ √
union union (e2,9) (e5,10) (e3,13) union Over the Whole Sensor N/W Sketch 1 (e1,6) (e2,9) (e3,13) Result of merging sketch 1&2 (e4,6) (e5,10) (e3,13) Each sample keeps 3 distinct items with largest expiry time. Sketch 2
Algorithm Complexity • Space complexity: • Time complexity • expected time for processing one item • Time for answering a query • Time for merging two sketches
Conclusion First sketch combines the following
Ongoing and Future Work • Implementation • Observed results better than theoretical predictions • Better duplicate insensitive sketches for specific decay models? • Other aggregates, such as Variance, clustering?