Medians and Beyond: New Aggregation Techniques for Sensor Networks

Medians and Beyond: New Aggregation Techniques for Sensor Networks CS851 Seminar Presentation

Outline • Motivations, State of Art, Contributions • The Q-Digest Scheme • Queries on Q-Digest • Experimental Evaluation • Conclusions Be prepared! I have questions for you!

Motivations • Trade Computation for Communication • Transmitting one bit over radio is at least three orders of magnitude more expensive in terms of energy consumption than executing a single instruction • Support Aggregation Queries • Need aggregated answer, not a single raw reading • Quantile query • Nthvalue • Reverse quantile query • Value  Nth • Consensus query • Most frequent? • Histogram

State of Art • TinyDB project in Berkeley & Cougar project in Cornell • Pros: • Energy efficient in-network data aggregation • Work very well in singleton sensor values • MIN, MAX, AVERAGE, SUM, COUNT • Cons: • Do not deal with complex aggregate measures • Median, Quantile, Reverse Quantile, Consensus • [Zhao et. al. 2003] • Algorithms for constructing summaries like MAX, AVG • Focus more on network monitoring and maintenance • [Przydatek et. al. 2003] • Secure aggregation

Contributions • Propose Q-Digest for Approximated Aggregation • Provide Strict Theoretical Guarantees on the Approximation Quality of the Queries in Terms of the Message Size • Evaluate the performance of Q-Digest in Simulation

Roadmap • Motivations, State of Art, Contributions • The Q-Digest Scheme • Queries on Q-Digest • Experimental Evaluation • Conclusions and Discussions

Properties of Q-Digest • Each node v in tree T is a bucket; • Whose range [v.min, v.max] defines the position and width of the bucket; • Has counter count(v); • Given the compression parameter K, a node v is in q-digest iff it satisfies: • (1) If not a leaf, no high count; • (2) If not the root, a node and its children should not have low count; • A q-digest is a set of buckets of different sizes and their associated counts;

Building a Q-Digest • Going bottom up to check whether any node violates digest property (2) • If yes, delete itself and its sibling, and merge to its parent; • Key feature of q-digest: Detailed information concerning data values which occur frequently are preserved in the digest, while less frequently occurring values are lumped into larger buckets resulting in information loss.

Merging Q-Digest • Parent node merge Q1(n1,K) and Q2(n2,K) from children How about merging Q1(n1,k1) and Q2(n2,K2)? • Each node has different communication ability • Each node has different power level • Powerful node can have bigger K while less powerful node can have smaller K value. Can we still get the same accuracy? Is that feasible?

What dos it mean 3K? 3K bites? Space Complexity and Error Bound (1/4) The root node does not satisfy property (2).?? 3K means 3K <nodeID(v), count(v)> pairs

Space Complexity and Error Bound (2/4) What about the leaf node, which does not satisfy property (1)? It doesn’t matter, because a leaf node is not the ancestor of any node.

Space Complexity and Error Bound (3/4)

Space Complexity and Error Bound (4/4)

Representation of a Q-Digest • Now to transmit the q-digest we send a set of tuple of the following form <nideID(v), count(v)> which requires a total of bits for each tuple.

Quantile Query(1/3) • Quantile query: • Given a fraction 0<q<1, find the value whose rank in sorted sequence of the n values is qn. • Answer the query: • Sort nodes in q-digest in increasing v.max; breaking ties by putting smaller ranges first; • Scan the sorted list and add the counts of nodes; • For some node v, the sum becomes more than qn, and the v.max is reported as the estimate of the quantile;

Quantile Query(2/3) • The confidence factor • Why need this? • is the worst case error estimation, which only occurs for a very pathological input case • What is it? • Confidence factor is defined as: (maximum weight of any path from root to leaf in Q)/n

Confidence Factor Example • N=15, k=5, =8 1 1 5 7 3 3 3 3 (maximum weight of any path from root to leaf in Q)/n = 7/15 < = 3 * log8 / 3K = 3*3/3*5 = 9/15

Performance Evaluation • Settings • Routing tree • Breadth first search tree • Sensor field • 1000 x 1000 area with 1000 sensor nodes • 2000 x 2000 area with 4000 sensor nodes • Sensor value • Random • Correlated : • United States Geological Survey • Compare with List scheme: • List: Report all (value, count) back to base station; no in-network aggregation;

Error and Message Size • 160 bytes message size can get 5% error • 400 bytes message size can get 2% error

Total Data Transmission • Q-digest transmit less data than list • Random input needs more transmission than correlated data

Residual Power • For every byte transmitted, one unit of 40000 unit of power is depleted. • (How about reception?) • In List, 0.02% nodes have residual power fraction less than ½. • (???)

Conclusions • Propose Q-Digest for Approximated Aggregation • Provide Strict Theoretical Guarantees on the Approximation Quality of the Queries in Terms of the Message Size • Evaluate the performance of Q-Digest in Simulation

Thank you!

Medians and Beyond: New Aggregation Techniques for Sensor Networks