570 likes | 661 Views
CPSC 689: Discrete Algorithms for Mobile and Wireless Systems. Spring 2009 Prof. Jennifer Welch. Lecture 33. Topic: Data Aggregation in Sensor Networks Sources: Nath, Gibbons, Seshan & Anderson Shrivastava, Buragohain, Agrawal & Suri. Aggregation Problem.
E N D
CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch
Lecture 33 • Topic: • Data Aggregation in Sensor Networks • Sources: • Nath, Gibbons, Seshan & Anderson • Shrivastava, Buragohain, Agrawal & Suri Discrete Algs for Mobile Wireless Sys
Aggregation Problem • How to compute the answer to a query in a sensor network that requires aggregating data from all (or many) sensors? • Example: Suppose the nodes take temperature readings and queries ask for min/max/average temperature • Data has to flow through the network to the node that issues the query • In some cases, data can be aggregated on the way • save bandwidth and energy • Example: to find max temp., each node propagates largest temp. it has learned about Discrete Algs for Mobile Wireless Sys
Communication to Support Aggregation • Need to propagate sensor readings in some orderly way • Example: send data over a spanning tree rooted at the querying node • not robust: link or node failure will partition the tree, lose contact with sensors in subtree • Prefer to use multipath routing (message is sent on several paths) • redundancy provides more resilience • But duplication causes problems for aggregation • OK for max, but what about average? Discrete Algs for Mobile Wireless Sys
Overview of Algorithm • Provides framework for synopses of the data to be sent over multiple paths and then reconstructing correct answer • Phase 1: aggregate query is flooded through the network and an aggregation topology is constructed • Phase 2: aggregate values are continually routed toward the querying node: • each node converts its sensor data to a synopsis (SG function) • nodes merge two synopses into one (SF function) • querying node converts synopsis back to final answer (SE function) Discrete Algs for Mobile Wireless Sys
Specific Aggregation Topology • Rings: • kind of like levels in breadth-first search • Nodes are partitioned into rings during Phase 1: • querying node q is in ring 0 • a node is in ring i if it receives the query first from a node in ring i–1 • Phase 2 is divided into epochs, one aggregate answer per epoch • each node in outer ring (farthest distance from q) computes s := SG(r), where r is its sensor reading, and broadcasts s • each node in ring i computes s := SG(r), where r is its sensor reading, and updates s := SF(s,s'), where s' is each synopsis received from a neighbor, then broadcasts s • querying node computes SE(s) • Synchronous algorithm Discrete Algs for Mobile Wireless Sys
Figure R2 R0 R1 q A C B Discrete Algs for Mobile Wireless Sys
Analysis of Framework • Complexity: each node broadcasts once per epoch • Same as spanning-tree-based approach • More resilient than spanning-tree-based approach Discrete Algs for Mobile Wireless Sys
The Functions • What should SG, SF, and SE be in order to give the "correct" answer? • First, give a condition on the functions that is intuitive • Then show there are 4 simple checks that can be done on proposed functions • These conditions are necessary and sufficient to preserve correctness Discrete Algs for Mobile Wireless Sys
ODI-Correctness • Final result should be independent of how the data was routed to querier: • same no matter in which order the readings are combined and how many times they are included (duplicated) during the routing • Sensor reading r : <measurement, metadata> • assumed to be unique • Suppose we have SG, SF and SE • Define synopsis label SL(s) = {r} if s = SG(r ) and SL(s) = SL(s1) Ums SL(s2) if s = SF(s1,s2) Discrete Algs for Mobile Wireless Sys
ODI-Correctness (cont'd) • What constitutes a "duplicate" depends on what is being computed • Ex: average temp vs. number of distinct temps • q : multiset of sensor readings set of (unique) values • q(SL(s)) = set of unique values in all the sensor readings that formed the synopsis Discrete Algs for Mobile Wireless Sys
ODI-Correctness Definition • Let {v1,…,vk} be set of values in the label of s, i.e., q(SL(s)). • Then s must be same as computation on "canonical left-deep tree": • s := SG(v1) • for i = 2 to k do • s := SF(s,SG(vi)) • I.e., regardless of redundancy caused by multipath routing, the final synopsis is the same as if each distinct value is included just once Discrete Algs for Mobile Wireless Sys
s s r5 SF SF SF SF SF SF SF SF SF SF SF r4 SG SG SG SG SG SG SG SG SG SG r3 r2 r1 r5 r3 r1 r4 r2 ODI-Correctness Figure Canonical left-deep tree Aggregation DAG Discrete Algs for Mobile Wireless Sys
A Simple Test for ODI-Correctness • duplicate preservation: q({r1}) = q({r2}) SG(r1) = SG(r2) • if two readings are considered duplicates, then the same synopsis is generated • commutativity: SF(s1,s2) = SF(s2,s1) • associativity: SF(s1,SF(s2,s3)) = SF(SF(s1,s2),s3) • idempotence: SF(s,s) = s Discrete Algs for Mobile Wireless Sys
More About the Conditions • Theorem: The previous 4 conditions are necessary and sufficient for the SG and SF functions to ensure ODI-correctness. • Proof Sketch: • sufficiency: If SG and SF satisfy the 4 conditions, then show that any computation DAG can be transformed into a canonical left-deep binary tree that produces the same output • necessity: Argue that the 4 conditions follow from the definition of ODI-correctness. Discrete Algs for Mobile Wireless Sys
Count Example • Query: What is the (approximate) total number of sensor nodes in the network? • Synopsis: a bit vector of length k > log N, where N is an upper bound on the number of nodes • N could be original number of nodes deployed, or some function of the size of the id space Discrete Algs for Mobile Wireless Sys
SG for Count Example • No sensor is actually read for this example. • Let SG return vector s[1..k], where • a certain entry is 1 • rest of the entries are 0 • How to decide which entry should be 1: • entry CT(k), where CT(k) is a random variable that returns value i with probability 1/2i, 1 ≤ i < k. • How to compute CT(k): • Toss a fair coin until either the first head occurs or k coin tosses have occurred with no heads; return number of tosses Discrete Algs for Mobile Wireless Sys
Computation of CT(k) • Why does the coin-tossing protocol give the desired random variable? • Proof by Example: Suppose k = 4. • First toss is H, and 1 is returned, with probability 1/2 • Otherwise, second toss is H, and 2 is returned, with probability 1/4 • Otherwise third toss is H and 3 is returned, with probability 1/8 • (and then 4 is returned with probability 1/8, but the definition of CT(4) only cares about 1 through 3) Discrete Algs for Mobile Wireless Sys
SF and SE for Count Example • SF(s,s'): • s[i] := s[i] OR s'[i], 1 ≤ i ≤ k • return s • SE(s): • return 2i-1/.77351, where i is the minimum index such that s[i] = 0 Discrete Algs for Mobile Wireless Sys
Intuition for Count Synopsis Functions • Suppose all (live) sensors have a failure-free path to the querier. • The final bit vector to which SE is applied indicates which bit positions have been set by at least one node • The probability of n nodes failing to set the i-th bit is (1–2i)n by definition of SG • Thus the number of (live) nodes is proportional to 2i–1 • constant of proportionality is 1/.77351 Discrete Algs for Mobile Wireless Sys
Intuition for Count Synopsis Functions • Alternatively… • We expect half the nodes to set the 1st bit, a quarter of the nodes to set the 2nd bit, an eighth of the nodes to set the 3rd bit, etc. • If there are n distinct nodes, then we might expect log n bits to be set • I.e., if log n = i bits are set, then we might expect there to be about n = 2i nodes Discrete Algs for Mobile Wireless Sys
Count Algorithm is ODI-Correct • Note that ODI-correctness says nothing about the SE function, only that SE will return the same result as in the canonical tree. • "Clever algorithms are still required to get provably good approximations, although the task has been simplified…" • Commutativity, associativity, and idempotence follow from properties of Boolean OR Discrete Algs for Mobile Wireless Sys
Count Algorithm is ODI-Correct • Why does SG preserve duplicates? • Assume each node calls SG only once. • Show that if sensor readings are considered duplicates, then the synopsis generated by SG is the same. • Since there is no actual sensor reading for this algorithm, we just use ids for the readings. • Assumption that each node calls SG only once ensures the property. Discrete Algs for Mobile Wireless Sys
Implicit Acknowledgments • When a node broadcasts a synopsis, avoid overhead of explicit acknowledgments from receivers this way: • node u broadcasts its synopsis • node u snoops (listens to) subsequent broadcasts by its parent nodes (nodes closer to the querying node) • if the synopsis broadcast by a parent "effectively includes" u's synopsis, u does not need to rebroadcast, otherwise rebroadcast (or adapt the topology) Discrete Algs for Mobile Wireless Sys
Implicit Acknowledgments (cont'd) • How can u accurately infer if its broadcasts was "effectively included"? • Suppose u's synopsis was x and the parent's was z. • If SF(x,z) = z, then x is effectively included. • Why? Since SF is commutative, associative, and idempotent, it is a "semi-lattice". • in a semi-lattice, every 2 elements x and y have a least upper bound z, and SF(x,z) = z = SF(y,z) • Count example: check if appropriate bits are set Discrete Algs for Mobile Wireless Sys
Error Bounds of Approximate Answers • Sources of error: • communication error: some nodes have no failure-free propagation path to querier • approximation error: introduced by SG, SF and SE functions. • defined as relative error of computed answer w.r.t. exact algorithm using the same readings • Argue that communication error can be made negligible by deploying sensor nodes sufficiently densely Discrete Algs for Mobile Wireless Sys
Error Bounds of Approximate Answers (cont'd) • Approximation error analysis for the centralized data stream model work in this model, since synposis is ODI-correct • canonical left-deep tree corresponds to processing a data stream of sensor readings in a centralized location • Thus, e.g., Count algorithm has same approximation error guarantees as computed by Flajolet & Martin Discrete Algs for Mobile Wireless Sys
More Examples • Max and Min: easy. • SG is the value, SF takes larger/smaller, SE is identity • Sum: cf. paper by Considine et al. which adapts Count algorithm • Average, Standard deviation, Second Moment: cf. paper by Considine et al. which uses Sum • Count Distinct: modification of Count Discrete Algs for Mobile Wireless Sys
Uniform Sample Example • Compute a uniform sample of a given size K of the values occurring at all nodes in the network • Synopsis: a sample of size K tuples (or fewer initially) • SG: output (val,r,id) where • val is the sensor reading of the node • r is a random number drawn uniformly from [0,1] • id is the node's id • SF(s,s'): list the tuples in s U s' in decreasing order of r-value, and output the first K (or all, if less than K total) • U is set union, removes duplicates • SE(s): output the set of values in the tuples of s Discrete Algs for Mobile Wireless Sys
Uniform Sample Example (cont'd) • SG labels each reading with a random number, thus placing it in a random position in the global ordering of all readings • So taking first K in the ordering gives a uniform sample. • Uniform sample can then be used… Discrete Algs for Mobile Wireless Sys
More Examples • Use uniform samples to compute these aggregates: • k-th statistical moment (k = 1 is the mean) • k-th percentile value (k = 50 is the median) with certain error and probability, by choosing the sample size appropriately (cf. Bar-Yossef et al.) • Compute the k most frequent values (k = 1 is the mode): run an ODI-correct Count algorithm for each value Discrete Algs for Mobile Wireless Sys
Adapting the Topology • If message loss is detected as occurring "too frequently", nodes can adapt the Ring topology • Idea: use a heuristic that tries to assign a node u to a ring so that there are plenty of ndoes in the next ring to forward u's synopsis to the querier • ODI-correct synopses are helpful: • implicit acks are used to detect message loss energy-efficiently • duplicates that occur during the adaptation of the topology are not a problem Discrete Algs for Mobile Wireless Sys
Simulation Results • Extensive! • Synopsis diffusion • reduces answer errors in lossy environments • helps address challenges from correlated node failures • does not use significantly more power • What topology to use? • Adaptive Rings has same overhead as Rings but much better accuracy • Adaptive Rings gets about 90% of the sensor readings most of the time vs. 100% with Flooding, but uses much less power Discrete Algs for Mobile Wireless Sys
Medians and Beyond [SBAS] • Extend beyond min/max/sum the class of queries that can be answered in sensor networks to include • approximate quantiles (including median) • most frequent data values (including consensus) • histogram of data distribution • range queries • Provide strict theoretical guarantees on the approximation quality of the answers in terms of message size Discrete Algs for Mobile Wireless Sys
Comparison with Nath Paper • Some of the same problems are considered • "Medians and Beyond" is concerned with efficiency of message size and its tradeoff with quality of approximation • Nath paper was concerned with handling arbitrary ordering and duplicates • "Medians and Beyond" assumes no duplicates Discrete Algs for Mobile Wireless Sys
Overview • Assume we have a tree rooted at the querying node • To compute Average: each node sends to its parent the sum of thedata values of its descendants and its number of descendants • constant size messages • To compute Median, need to keep track of all distinct values • size of messages, and memory, grows linearly • Trade off memory and bandwidth with accuracy of approximations Discrete Algs for Mobile Wireless Sys
Q-Digests • Assume sensor readings are integers in the range [1,s] • Introduce q-digest data structure to answer quantile queries with • messages of size m • error O((log s)/m) • Users specify message size vs. error tradeoff • q-digest measures maximum error accumulated so far • Once q -digest query is done, use it to compute quantiles, data distribution,… Discrete Algs for Mobile Wireless Sys
More on q-Digest • Compute a compressed view of the complete distribution of values (instead of just a function of the values) • Use this view of the distribution to compute approximations of various functions • Basic idea: Essentially compute a histogram, but • equally large, instead of equally spaced, buckets • buckets can overlap • size of buckets gives accuracy vs. communication tradeoff Discrete Algs for Mobile Wireless Sys
Definition of q-Digest • Group values into variable-sized buckets of almost equal weights • size refers to range • weight refers to number of elements • q-digest consists of a set of buckets • Build a complete binary tree • 1,…,s at the leaves • every tree node is a bucket, its range is all the leaves in its subtree • At any given point, only some of the buckets are being used Discrete Algs for Mobile Wireless Sys
Example 1 data range 1-8 15 data items 5 buckets 2 2 4 6 1 2 3 4 5 6 7 8 Discrete Algs for Mobile Wireless Sys
Definition of q-Digest • Given compression parameter k and number of data items n, a (tree) node v is in the q-digest iff: • count(v) ≤ n/k • node should not have a high count • count(v) + count(parent(v)) + count(sibling(v)) > n/k • if a node and its children have low total count then combine using Compress algorithm • For a leaf node, if count > n/k, then it is in the q-digest • Root only needs to satisfy first condition Discrete Algs for Mobile Wireless Sys
1 2 2 4 6 1 2 3 4 5 6 7 8 Example check that this has k = 5; n/k = 3 Discrete Algs for Mobile Wireless Sys
Centralized Construction of q-Digest • Go through all the tree nodes bottom up • Check which ones satisfy the 2 properties. • If a node v has a child that violates 2nd property then merge v with both its children • Detailed info about values which occur frequently is preserved, while less frequently occurring values are lumped into larger buckets resulting in info loss Discrete Algs for Mobile Wireless Sys
2 2 1 1 4 6 1 1 1 4 6 8 1 2 3 4 5 6 8 7 1 2 3 4 5 6 7 1 1 2 2 2 2 4 6 4 6 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 1
Distributed Construction of q-Digest • Represent a q-digest by numbering the nodes of the digest tree and sending a set of (node id, count) pairs • q-digests move up the spanning tree, being merged as they go. • To merge 2 q-digests: • take their union • add the counts of buckets with the same range • compress the result • Merging can cause information loss. Discrete Algs for Mobile Wireless Sys
Analysis of Q-Digest • Lemma 1: A q-digest with parameter k has size (number of buckets) at most 3k. • because the count of a node and its children can't be too small • Lemma 2: In a q-digest with parameter k, the maximum error in the count of any node is n(log s)/k. • because in the worst case the count of a node can deviate from the actual value by the sum of the counts of its ancestors • Lemma 3: Merging multiple q-digests gives the same error as in Lemma 2. Discrete Algs for Mobile Wireless Sys
Quantile Queries • Problem Statement: Given a fraction q between 0 and 1, find the value whose rank in sorted sequence of the n values is qn. • Median is when q = 1/2 • Relative error is defined to be |r – qn|/n, where r is the true rank of the returned value Discrete Algs for Mobile Wireless Sys
Using Q-Digest to Answer a Quantile Query • Goal: find q-th quantile • Sort the nodes of the q-digest in increasing order of max values (right endpoints); break ties by putting smaller ranges first • this gives post-order traversal of the tree • Scan sorted list and add up the counts • Let v be the first node at which the running sum exceeds qn • Return the max value of node v Discrete Algs for Mobile Wireless Sys
Error Analysis • Answer returned is v.max • There are at least qn values less than or equal to v.max, by choice of v • Error comes from values that are less than v.max but are stored in ancestors of v (these buckets are listed after v) • But this error is at most n(log s)/k • Note that estimate is always at least as great as the eact answer Discrete Algs for Mobile Wireless Sys
Example 1 • Find Median (q = 1/2); recall n = 15 so look for 7.5 • Sorted list is (j,4), (k,6), (f,2), (g,2), (a,1) • Running sums of counts are 4, 10 - done! • Return max value in tree node k, which is 4 • Error is at most sum of counts on path from k to root, which is 1 a a through o are the ids of the digest tree nodes: j = [3:3] k = [4:4] f = [5:6] g = [7:8] a = [1:8] b c 2 2 d e g f 4 6 k m o i j n l h 1 2 3 4 5 6 7 8 Discrete Algs for Mobile Wireless Sys