210 likes | 217 Views
Explore the need for a new data handling architecture in sensor networks, discussing goals, design ideas, and implementation strategies to achieve enhanced performance with low communication overhead.
E N D
DIMENSIONS: Why do we need a new Data Handling architecture for sensor networks? Deepak Ganesan, Deborah Estrin (UCLA), John Heidemann (USC/ISI) Presenter: Vijay Sundaram
Deployment: Microclimate monitoring at James Reserve Park (UC Riverside) How well does data fit model <M> of variation of temperature with altitude. Send robotic agent to edge between low and high precipitation regions Weather Sensor Network Hmm…I wonder why packet-loss is so high. Get a connectivity map of the network for all transmit power settings Get detailed data from node with maximum precipitation from Sept to Dec 2003
Goals • Flexible spatio-temporal querying • Provide ability to mine for interesting patterns and features in data. • Drill-down on details • Distributed Long-term networked data storage • Preserve ability for long-term data mining, while catering to node storage constraints • Performance • Reasonable Accuracy for wide range of queries • Low communication (energy) overhead
How can we achieve goals? • Exploit redundancy in data • Potentially huge gains from lossy compression exploiting spatio-temporal correlation • Exploit rarity of interesting features • Preserve only interesting features. • Exploit scale of sensor network. • large distributed storage, although limited local storage. • Exploit low cost of approximate query processing • allow approximate query processing that obtain sufficiently accurate responses.
Data Correlation Vs Decentralization Geo-Spatial Data Mining, Streaming Media (MPEG-2) Wireless Sensor Networks Spatial Temporal Exploited Data Correlation Centralized Data Collection P2P: DHT Gnutella Web Caches None Centralized Hierarchical Fully Distributed Degree of Decentralization Can existing systems satisfy design goals?
DIMENSIONS Design: Key Ideas • Construct hierarchy of lossy compressedsummaries of data using wavelet compression. • Queries “drill-down” from root of hierarchy to focus search on small portions of the network. • Progressively age lossy data along spatio-temporal hierarchy to enable long-term storage Level 2 Level 1 PROGRESSIVELY LOSSY PROGRESSIVELY AGE Level 0
Roadmap • Why wavelets? • Example Precipitation Hierarchy • Spatial and Temporal Processing internals • Initial Results: Precipitation Dataset
Enabling Technique: Wavelets • Very popular signal processing approach, that provides good time and frequency localization. • JPEG2000, Geo-Spatial Data Mining • preserves spatio-temporal features (edges, discontinuities) while providing good approximation of long-term trends in data • Efficient distributed implementation possible.
Sample Architecture: Precipitation Hierarchy What is the maximum precipitation between Sept-Dec 2002? • Local Processing: Construct lossy time-series summary(zero communication cost) • Spatial Data Processing: Hierarchical Lossy Compression • Organize network into hierarchy. At each higher level, reduce number of participating nodes by a factor of 4. • At each step of the hierarchy, summarize data from 4 quadrants, and propagate Direct query to quadrant that best matches query decreasing spatial resolution decreasing temporal resolution Wavelet Coeffs
Spatial Decomposition • Recursively split network into non-overlapping square grids. • At each level of the hierarchy, • Elect clusterhead • Cluster-head combines and summarizes data from 4 quadrants • Cluster-head propagates compressed data to the next level of the hierarchy. • Routing protocol: GPSR variant (DCS - Ratnasamy et al,) Hierarchy construction
time y x Wavelet Compression Internals Compressed Output Thresholding + Quantization + Drop Subbands Wavelet Subband Decomposition Lossless Encoder Input Data time y Filter x Cost Metric • Communication Budget • Error bound • Haar Filter • Debauchies 9/7 filter
Initial Results with Precipitation Dataset: Communication Overhead • 15x12 grid (50km edge) of precipitation data from 1949-1994, from Pacific Northwest†. Gridded before processing. • Handpicked choice of threshold, quantization intervals, subbands to drop. Huffman Encoder at output. • Very large compression ratio up the hierarchy †M. Widmann and C.Bretherton. 50 km resolution daily precipitation for the Pacific Northwest, 1949-94.
Exact Answer for 89% of queries. Within 90% of answer for >95% of queries. Queries require less than 3% of network. Good performance on average with very low lookup overhead Find maximum annual precipitation for each year.
Error Metric: Number of nodes greater than 1 pixel distance from drill-down boundary Accuracy: Within 25% error for 93% of the queries (or within 13% error for 75% of the queries) Less than 5% of the network queried. Locate boundaryin annual precipitation between Low and High Precipitation Areas
Open Issues • Load Balancing and Robustness • Hierarchical Model vs Peer Model: lot of work in p2p systems… • Irregular Node Placement • Use wavelet extensions for irregular node placement. Computationally more expensive • Gridify dataset with interpolation • Providing Query Guarantees • Can we bound error in response obtained for a drill-down query at a particular level of hierarchy? • Implementation on IPAQ/mote network
Summary • DIMENSIONS provides a holistic data handling architecture for sensor networks that can • Support a wide range of sensor-network usage and query models (using drill-down querying of wavelet summaries) • Provide a gracefully degrading lossy storage model (by progressively ageing summaries) • Offer ability to tune energy expended for query performance. (tunable lossy compression)
Other Examples: Packet Loss • Different example of dataset that exhibits spatial correlation • Throughput from one transmitter to proximate receivers is correlated • Throughput from multiple proximate transmitters to one receiver is correlated. • Typically, what we want to query is the deviations from normal and average throughput.
Packet-Loss Dataset: Get Throughput Vs Distance Map • Involves expensive transfer of 12x14 map from each node. • Good approximate results can be obtained from querying compressed data.
Slower Ageing Wavelet Coefficients Long-term Storage: Concepts • Data is progressively aged, both locally, and along the hierarchy. • Summaries that cover larger areas and longer time-periods are retained for much longer than raw time-series.
Load Balancing and Robustness: Concepts • Hierarchical Model • Naturally fits wavelet processing • Strict hierarchies are vulnerable to node failures. Failures near root of hierarchy can be expensive to repair • Decentralized Peer Model • Summaries communicated to multiple nodes probabilistically. • Better robustness, but incurs greater communication overhead.