240 likes | 381 Views
Towards a Model-Based Data Collection Framework for Environmental Monitoring Networks Research Proposal Jayant Gupchup Department of Computer Science, Johns Hopkins University †. 75 m. Background – II (motes). Communication (radio). 3.6 V 19.0 Ah. Computing, Storage. Sensors.
E N D
Towards a Model-Based Data Collection Framework for Environmental Monitoring Networks Research Proposal Jayant Gupchup Department of Computer Science, Johns Hopkins University†
Background – II (motes) Communication (radio) 3.6 V 19.0 Ah Computing, Storage Sensors “Sending one packet costs same energy as thousands of CPU cycles” – Matt Welsh, Harvard
Task list • Define “Informative Periods” • Algorithm : Find Informative (or interesting) Periods • Algorithm : Sampling Planner based on the interesting periods • Evaluation
Initial Direction & Main Results • Principal Component Analysis (PCA) based approach • Classification-based approach towards detecting events.
PCA based approach: Motivation • Observations: • Well behaved days show typical signature (bell-shaped pattern) • Rainy days (or periods) deviate from this signature • Strong trend component from one day to the next • Diurnal, trend features seen in most environmental modalities • PCA is good at capturing variation in collection of similar curves
PCA – Toy Example First Principal Component Variable #2 Variable #1 Finds directions of Maximum Variance Reduces Dimensionality (truncate to first “p” directions)
Eigenmodes for Air Temperature Directions of Maximum Variance
Discriminating event, well-behaved days [5] Well-behaved days: “Fits model well” Event day: “Large residuals” [5] : J. Gupchup, R. Burns, A. Terzis, and A. Szalay, Model-Based Event Detection in Wireless Sensor Networks, Proceedings of Workshop on Data Sharing and Interoperability on the World-Wide Sensor Web (DSI), ACM/IEEE, 2007
Offline to Online • Offline • Basis locked from midnight to midnight • Access to complete 24 hour signal • Online • Access to signal up to the current hour “d” • Basis locked from hour “d” to hour “d” • Vectors cyclically shifted by “d” • Eigenvalues remain the same
Summary • PCA model effective in finding informative periods • Need to know • Shift value, “d” • “sundial” [6] • But … why not use Barometric Pressure too? [6] : Jayant Gupchup, Razvan Musăloiu-E, Alex Szalay, Andreas Terzis. Sundial: Using Sunlight to Reconstruct Global Timestamps, To appear in the proceedings of the 6th European Conference on Wireless Sensor Networks (EWSN 2009)
Classification-Based Approach • 2-class problem {Rainy, Sunny} • Most classifiers provide probabilities • Sample based on those probabilities
Future Work - I • Task 1: Model Improvement • Study effect (or correlation) of • Event-magnitude • Inter-Arrival Time • Explore Incremental and Robust PCA [7], [8] • Explore Label based Classifiers • Combine Air Temp, Barometric Pressure and Light Modalities (joint work with Zhiliang Ma, Dept. of Applied Math and statistics) • Task 2 : Sampling Planner • Prediction error and/or Probability of Event (PoE) • Neighbor opinion(s) • Acquisition cost of each sensor [7] : Reliable Eigenspectra for New Generation Surveys, Tamas Budavari, Vivienne Wild, Alexander S. Szalay , Laszlo Dobos, Ching-Wa Yip , MNRAS. Accepted for publication [8] : A Robust Classification of Galaxy Spectra: Dealing with Noisy and Incomplete Data, A.J. Connolly, A.S. Szalay, Astronomical Journal
Future Work - II • Task 3 : Evaluation • Define Cost and Benefit functions • Compare proposed approach with existing systems • Task 4 : Application and Extensions • Identify class of applications where the framework can be used
Questions ???
Overview: Proposed Framework <θ1,θ2, .. θn> Model Prediction Error Prob (Event) Sampling Scheduler Update Model <X1,X2, ... Xt> <Xt+1,Xt+2, … Xt+h> Mote Storage
Properties of our PCA model • Transformation: Y = X*V • Projected variables are uncorrelated • Compression/Multi-resolution • Achieve a massive compression • From previous slide, compression ratio = 4/96 = 24X • Online Basis • Basis for any “d” to “d” hour using cyclic shifting • Re-projection error Bounds • Sum of “left out” eigenvalues
Preliminary Results • Rain prediction • Use Barometric Pressure • Simple linear classifiers perform well • Classification Accuracy towards 76%
Literature Survey • Barbie-Query (BBQ, [1]) • Approximate query answering (Range, value queries) • Sensing cost differential … Energy Saving opportunities! • Predictions outside confidence interval, collect samples • Shortcomings • NOT collecting long-term environmental data • Do not consider the role played by events • PRESTO [2] • Reduce Storage costs => Reduce Communication costs • Seasonal-AutoRegressive Integrated Moving Average (S-ARIMA) [3] model for predictions • Model known to node and Basestation • When predictions within confidence bounds, do not store collected samples • Basestation can reconstruct missing samples. • Shortcomings • No adaptive sampling on interesting events [1] : Model-Driven Data Acquisition in Sensor Networks; Amol Deshpande, et al. VLDB 2004 [2] : PRESTO: Feedback-driven Data Management in Sensor Networks; Ming Li, Deepak Ganesan, and Prashant Shenoy; USENIX 2006 [3]: P.J. Brockwell, R.A. Davis. Introduction to time series and forecasting. 2002.
Related Work • Near-Optimal Sensor Placement [4] • Find most informative locations to place sensors • At the same time … Keep the network connected • Solution: Information-theoretic (entropy) & Steiner tree approximation • Differences • Focus is finding informative locations in an offline fashion • Solution addresses spatial variability • Sampling rate does not change once locations are fixed [4] : A. Krause, C. Guestrin, A. Gupta, J. Kleinberg. "Near-optimal Sensor Placements: Maximizing Information while Minimizing Communication Cost". In Proc. of Information Processing in Sensor Networks (IPSN) 2006