420 likes | 541 Views
In Network Processing. When processing is cheaper than transmitting Daniel V Uhlig Maryam Rahmaniheris. Basic Problem. How to gather interesting data from thousands of Motes? Tens to thousands of motes Unreliable individually To collect and analyze data
E N D
In Network Processing When processing is cheaper than transmitting Daniel V Uhlig Maryam Rahmaniheris
Basic Problem • How to gather interesting data from thousands of Motes? • Tens to thousands of motes • Unreliable individually • To collect and analyze data • Long term low energy deployment • Can using processing power at each Mote • Analyze local before sharing data
Costs • Transmission of data is expensive compare to CPU cycles • 1Kb transmitted 100 meters = 3 million CPU instructions • AA power Mote can transmit 1 message per day for about two months (assuming no other power draws) • Power density is growing very slowly compared to computation power, storage, etc • Analyze and process locally, only transmitting what is required
Framework of Problem • Minimize communications • Minimize broadcast/receive time • Minimize message size • Move computations to individual nodes • Nodes pass data in multi-hop fashion towards a root • Select connectivity so graph helps with processing • Handle faulty nodes within network
Example of Problem (MAX) C: 4, 6 E: 3, 5, 1 B: 4,7, 6 6 5 7 6 5 5 D: 3,4, 6 F: 2, 7,5, 10 A: 7,1, 6 10 10 10
Complications • Max is very simple • What about Count? • Need to avoid double counting due to redundant paths • What about spatial events? • Need to evaluate readings across multiple sensors • Correlation between events • Failures of nodes can loose branches of the tree
Design Decisions • Connectivity Graph • unstructured or how to structure • Diffusion of requests and how to combine data • Maintenance messages vs Query messages • Reliability of results • Load balancing • messages traffic • storage • Storage costs at different nodes
TAG: a Tiny Aggregation Service for Ad-Hoc Sensor Networks S.Madden, M.Franklin, J.Hellerstein, and W.HongIntel Research, 2002
TAG • Aggregates values in low power, distributed network • Implemented on TinyOS Motes • SQL like language to search for values or sets of values • Simple declarative language • Energy savings • Tree based methodology • Root node generates requests and dissipates down the children
TAG Functions • Three functions to aggregate results • f (merge function) • Each node runs f to combine values • <z>=f (<x> , <y>) • EX: <SUM, COUNT>=f (<SUM1+SUM2>, <COUNT1+COUNT2>) • i (initialize function) • Generates state record at lowest level of tree • EX:<SUM, COUNT> • e (evaluator function) • Root uses e to generate the final result • RESULT=e<z>, • EX: SUM/COUNT • Functions must be preloaded on Motes or distributed via software protocols
TAG 10 Count = 1 2 7 1 3 3 1 1 1 1 Max via tree
TAG Taxonomy All searches have different properties that affect aggregate performance • Duplicate insensitive – unaffected by double counting (Max, Min) vs (Count, Average) • Restrict network properties • Exemplary – return one value (Max/Min) • Sensitive to failure • Summary – computation over values (Average) • Less sensitive to failure
TAG Taxonomy • Distributive – Partial states are the same as final state (Max) • Algebraic – Partial states are of fixed size but differ from final state (Average - Sum, Count) • Holistic – Partial states contain all sub-records (median) • Unique – similar to Holistic, but partial records may be smaller then holistic • Content Sensitive – Size of partial records depend on content (Count Distinct)
TAG • Diffusion of requests and then collection of information • Epochs subdivided for each level to complete task • Saves energy • Limits rate of data flow
TAG Optimizations • Snooping – Broadcast messages so others can hear messages • Rejoin tree if parents have failure • Listen to other broadcasts and only broadcast if its values are needed • In case of MAX, do not broadcast if peer has transmitted a higher value • Hypothesis testing – root guesses at value to minimize traffic
TAG - Results • Theoretic results for • 2500 Nodes • Savings depend on function • Duplicate Insensitive, summary best • Distributive helps • Holistic is the worse
TAG Real World Results • 16 Mote network • Count number of motes in 4 sec epochs • No optimizations • Quality of count is due to less radio contention in TAG • Centralized used 4685 messages vs TAG’s 2330 • 50% reduction, but less then theoretical results • Different loss model, node placement
Advantages/Disadvantages • Loss of nodes and subtrees • Maintenance for structured connectivity • Single message per node per epoch • Message size might increase at higher level nodes • Root gets overload (Does it always matter?) • Epochs give a method for idling nodes • Snooping not included, timing issues
Synopsis Diffusion for Robust Aggregation in Sensor Networks S.Nath, P.Gibbons, S.Seshan, Z.Anderson Microsoft Research, 2008
Motivation • TAG • Not robust against node or link failure • A single node failure leads to loss of the entire sub branch's data • Synopsis Diffusion • Exploiting the broadcast nature of wireless medium to enhance reliability • Separating routing from aggregation • The final aggregated data at the sink is independent of the underlying routing topology • Synopsis diffusion can be used on top of any routing structure • The order of evaluations and the number of times each data included in the result is irrelevant
10 TAG 10 3 Count = 1 2 7 1 3 3 1 1 1 1 Not robust against node or link failure
Count = 58 23 20 15 2 7 3 4 1 2 1 10 Synopsis Diffusion • Multi-path routing • Benefits • Robust • Energy-efficient • Challenges • Duplicate sensitivity • Order sensitivity
Contributions • A novel aggregation framework • ODI synopsis: small-sized digest of the partial results • Bit-vectors • Sample • Histogram • Better aggregation topologies • Multi-path routing • Implicit acknowledgment • Adaptive rings • Example aggregates • Performance evaluation
SG: Synopsis Generation SF: Synopsis Fusion SE: Synopsis Evaluation Aggregation • The exact definition of these functions depend on the particular aggregation function: • SG(.) • Takes a sensor reading and generates a synopsis • SF(.,.) • Takes two synopsis and generates a new one • SE(.) • Translates a synopsis into the final answer
Synopsis diffusion Algorithm • Distribution phase • The aggregate query is flooded • The aggregate topology is constructed • Aggregation phase • Aggregated values are routed toward Sink • SG() and SF() functions are used to create partial results
Ring Topology • The sink is in R0 • A node is in Ri if it’s i hops away from sink • Nodes in Ri-1 should hear the broadcast by nodes in Ri • Loose synchronization between nodes in different rings • Each node transmits only once • Energy cost same as tree R3 C A R2 R1 R0 B
SG: Synopsis Generation SF: Synopsis Fusion SE: Synopsis Evaluation Example: Count • Coin tossing experiment CT(x) used in Flajolet and Martin’s Algorithm: • For i=1,…,x-1: CT(x) = i with probability • Simulates the behavior of the exponential hash function • Synopsis: a bit vector of length k > log(n) • n is an upper bound on the number of the sensor nodes in the network • SG(): a bit vector of length k with only the CT(k)th bit is set • SF(): bit wise Boolean OR • SE(): the index of lowest-order 0 in the bit vector= i-> Magic Constant
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 1 1 1 1 0 1 1 0 0 1 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 1 1 0 1 1 0 0 0 0 0 0 0 1 0 0 1 1 0 SG: Synopsis Generation SF: Synopsis Fusion SE: Synopsis Evaluation Example: Count • The number of live sensor nodes, N, is proportional to Intuition: The probability of N nodes all failing to set the ith bit is which is approximately 0.37 when and even smaller for larger N. 4 Count 1 bits
SG: Synopsis Generation SF: Synopsis Fusion SE: Synopsis Evaluation ODI-Correctness For any aggregation DAG, the resulting synopsis is identical to the synopsis produced by the canonical left-deep tree s s SF SF SF SF SG SF SF SF SG r5 SF SF SF SF SG r4 SG SG SG SG SG SG SG r3 Aggregation DAG r1 r2 r3 r4 r5 Canonical left-deep tree r1 r2
A Simple Test for ODI-Correctness Theorem: Properties P1-P4 are necessary and sufficient properties for ODI-Correctness • P1: SG() preserves duplicates • If two reading are considered duplicates then the same synopsis is generated • P2: SF() is commutative • SF(s1, s2) = SF(s2, s1) • P3: SF() is associative • SF(s1, SF(s2, s3)) = SF(SF(s1, s2), s3) • P4: SF() is same-synopsis idempotent • SF(s, s) = s
More Examples SG: Synopsis Generation SF: Synopsis Fusion SE: Synopsis Evaluation • Uniform Sample of Readings • Synopsis: A sample of size K of <value, random number, sensor id> tuples • SG(): Output the tuple <valu, ru, idu> • SF(s,s’): outputs the K tuples in s∪s’ with the K largest ri • SE(s): Output the set of values valiin s • Useful holistic aggregation
SG: Synopsis Generation SF: Synopsis Fusion SE: Synopsis Evaluation More Examples • Frequent Items (items occurring at least T times) • Synopsis: A set of <val, weight> pairs, the values are unique and the weights are at least log(T) • SG(): Compute CT(k) where k>log(n) and call this weight and if it’s at least log(T) output <val, weight> • SF(s,s’): For each distinct value discard all but the pair <value, weight> with maximum weight. Output the remaining pairs. • SE(s): Output <value, > for each <val, weight> pair in s as a frequent value and its approximate count • Intuition: A value occurring at least T time is expected to have at least one of its calls to CT() return at least log(T) • p=1/T
Error Bounds of Approximation • Communication error • 1-Percent contributing • h: height of DAG • k: the number of neighbors each nodes has • p: probability of loss • The overall communication error upper bound: • If p=0.1, h=10 then the error is negligible with k=3 • Approximation error • Introduced by SG(), SF(), and SE() functions • Theorem 2: any approximation error guarantees provided for the centralized data stream scenario immediately applies to a synopsis diffusion algorithm , as long as the data stream synopsis is ODI-correct.
Adaptive Rings • Implicit acknowledgement provided by ODI synopses • Retransmission • High energy cost and delay • Adapting the topology • When the number of times a node’s transmission is included in the parents transmission is below a threshold • Assigning the node to a ring that can have a good number of parents • Assign a node in ring i with probability p to : • Ring i +1 If • ni > ni-1 • ni+1 > ni -1 and ni+2 > ni • Ring i -1 If • ni-2 > ni-1 • ni-1 < ni+1 and ni-2 > ni
Effectiveness of Adaptation • Random placement of sensors in a 20*20 grid with a realistic communication model • the solid squares indicate the nodes not accounted for in the final answer Rings Adaptive Rings
Realistic Loss Experiment • The algorithms are implemented in TAG simulator • 600 sensors deployed randomly in a 20 ft * 20 ft grid • The query node is in the center • Loss probabilities are assigned based of the distance between nodes
Impact of Packet Loss RMS Error % Value Included
Synopsis Diffusion • Pros • High reliability and robustness • More accurate answers • Implicit acknowledgment • Dynamic topology adaptation • Moderately affected by mobility • Cons • Approximation error • Low node density decreases the benefits • The fusion functions should be defined for each aggregation function • Increased message size
Overall Discussion points • Is there any benefit in coupling routing with aggregation? • Choosing the paths and finding the optimal aggregation points • Routing the sensed data along a longer path to maximize aggregation • Finding the optimal routing structure • Considering energy cost of links • NP-Complete • Heuristics (Greedy Incremental) • Considering data correlation in the aggregation process • Spatial • Temporal • Defining a threshold • TiNA
Overall Discussion points • Could energy saving gained by aggregation be outweighed by the cost of it? • Aggregation function cost • Storage cost • Computation cost (Number of CPU cycles) • No mobility • Static aggregation tree • Structure-less or structured? That is the question… • Continuous • On-demand
Generalize Problem to other areas • Transmitting large amounts of data on the internet is slow • Better to process locally and transmit the interesting parts only
Overall Discussion points • How does query rate affect design decisions? • Load balancing between levels of the tree • Overload root and main nodes • How will video capabilities of Imote affect aggregation models?