110 likes | 201 Views
Probabilistic Data Aggregation. Ling Huang, Ben Zhao, Anthony Joseph Sahara Retreat January, 2004. Motivation. Definition of Data agg. An important function for network infrastructures Exact result not achievable in face of loss and faults
E N D
Probabilistic Data Aggregation Ling Huang, Ben Zhao, Anthony Joseph Sahara Retreat January, 2004
Motivation • Definition of Data agg. • An important function for network infrastructures • Exact result not achievable in face of loss and faults • Low overhead, accurate approximation is crucial in • Sensor networks • P2P networks • Network monitoring and intrusion detection systems • But, it’s difficult to achieve • Many problems in existing approaches
5 1 3 1 2 2 3 1 4 5 Background • Aggregate functions • MIN, MAX, AGG, COUNT, …, etc. • In-Network hierarchical processing • Reduce overhead • Query propagation • Tree construction • Aggregates calculation • Addressing fault-tolerance • Multi-root • Multi-tree • Reliable transmission
Problems in Existing Approaches • Few approach is designed to handle data loss and corruption. • Simple algorithm for data loss recovery • Fragile for large process groups • Need all relevant nodes for participation • Difficult to trade accuracy for communication overhead • Good applications need this tradeoff • Only need approximation • But, minimize resource consumption
Our Approach • Probabilistic data aggregation: a scalable and robust approach • Model loss on links and failures on nodes • Apply statistical learning theory (SLT) into aggregation • Develop protocol that handles loss and failures as essential part of normal operations • Self-repairing algorithm for aggregation tree maintenance • Nodes participate in aggregation and communication according to statistical sampling algorithm • In the absence of data, estimate value using statistical learning algorithm
Aggregator Distribution Estimator Data Predictor Sampler Tree Constructor Design & System Architecture • Building blocks • Spanning tree with fault-detection and self-repairing algorithm for tree construction and maintenance • Statistical sampling for low-overhead and scalability without much loss of accuracy • Distribution estimation to provide information for work load analysis, data prediction and outlier detection • Data prediction to compensate the data loss in sampling, as well as the uncontrolled loss on links
Statistical Sampling • A simple approach: sampling on the agg. tree • Every child node report the aggregation result of its subtree to its parent with certain probability, which is the design parameter of the algorithm • Low overhead of in control traffic and easy for implementation. • Might result in high data loss close to the root • Distribution of sampling rate on the tree • Uniform distribution on each level • Linear distribution on each level • Proportional to the number of nodes on its subtree • Value-based sampling
Prediction Algorithm • Naive algorithm: use value in previous epoch as current one. • Linear Prediction: linear algorithm with Minimum Mean Square Estimation (MMSE) Where: • More sophisticate algorithm like Kalman Filter can be used to achieve better prediction results.
The Protocol • Tree construction and query propagation start from root of the query • Aggregates are computed in each epoch from bottom up • When a node receives data from a child, it updates the distribution statistics based on the distribution estimator. • If a node receives data from all its children in the epoch, it does a normal data aggregation. • If a node doesn't receive data from a child at the end of epoch, it does a data prediction to estimate a value, and then performs the aggregation. • Aggregates are report from children to parents with certain probability. • If necessary, a node might performance outlier detection on the data from a child. However • It is very danger to discard a data • Assume neighbor nodes has physical locality, a parent can use both temporal and spatial statistics to do the outlier detection.
Future Work • Integrated optimization by combining tree construction with statistical learning theory • Sampling on graph before tree construction • Non-linear estimation algorithm for data prediction • Evaluation of outlier detector in data aggregation • System implementation • System deployment and evaluation in real environment