330 likes | 521 Views
Model-driven Data Acquisition in Sensor Networks. Amol Deshpande 1,4 Carlos Guestrin 4,2 Sam Madden 4,3 Joe Hellerstein 1,4 Wei Hong 4 1 UC Berkeley 2 Carnegie Mellon University 3 MIT 4 Intel Research - Berkeley. Sensor networks and distributed systems.
E N D
Model-driven Data Acquisition in Sensor Networks Amol Deshpande1,4Carlos Guestrin4,2Sam Madden4,3 Joe Hellerstein1,4Wei Hong4 1UC Berkeley 2Carnegie Mellon University 3MIT 4Intel Research - Berkeley
Sensor networks and distributed systems • A collection of devices that can sense, actuate, and communicate over a wireless network • Sensors for temperature, humidity, pressure, sound, magnetic fields, acceleration, visible and ultraviolet light, etc. • Available resources • 4 MHz, 8 bit CPU • 40 Kbps wireless • 3V battery (lasts days or months) • Analogous issues in other distributed systems, including streams and the Internet
Redwoods • Precision agriculture • Fabrication monitoring Leach's Storm Petrel Real deployments • Great Duck Island
Distribute query Collect query answer or data Analogy:Sensor net as a database Data aggregation: • Can reduce communication TinyDB Query SQL-style query Declarative interface: • Sensor nets are not just for PhDs • Decrease deployment time Every time step
Redo process every time query changes Distribute query Collect data Limitations of existing approach Data collection: • Every node must wake up at every time step • Data loss ignored • No quality guarantees • Data inefficient – ignoring correlations Query distribution: • Every node must receive query TinyDB New Query Query SQL-style query Every time step
Spatial-temporal correlation Inter-attributed correlation Sensor net data is correlated • Data is not i.i.d. shouldn’t ignore missing data • Observing one sensor information about other sensors (and future values) • Observing one attribute information about other attributes
Data gathering plan Condition on new observations Dt Model-driven data acquisition: overview posterior belief Probabilistic Model • Strengths of model-based data acquisition • Observe fewer attributes • Exploit correlations • Reuse information between queries • Directly deal with missing data • Answer more complex (probabilistic) queries New Query Query SQL-style query with desired confidence
Probabilistic models and queries User’s perspective: Query SELECT nodeId, temp ± 0.5°C, conf(.95) FROM sensors WHERE nodeId in {1..8} System selects and observes subset of nodes Observed nodes: {3,6,8} Query result
Probabilistic query Example: Value of X2± with prob. > 1- Observe attributes Example: Observe X1=18 P(X2|X1=18) Probabilistic models and queries Joint distribution P(X1,…,Xn) Prob. below 1-? • Learn from historical data Higher prob., could answer query
Condition on observations t Dynamic models: filtering Joint distribution at time t Fewer obs. in future queries • Example: Kalman filter • Learn from historical data Observe attributes Example: Observe X1=18
Supported queries • Value query • Xi± with prob. at least 1- • SELECT and Range query • Xi[a,b] with prob. at least 1- • which sensors have temperature greater than 25°C ? • Aggregation • average ± of subset of attribs. with prob. > 1- • combine aggregation and selection • probability > 10 sensors have temperature greater than 25°C ? • Queries require solution to integrals • Many queries computed in closed-form • Some require numerical integration/sampling
Condition on new observations Dt Model-driven data acquisition: overview posterior belief What sensors do we observe ? How do we collect observations? Probabilistic Model Query SQL-style query with desired confidence Data gathering plan
cheaper? 2 1 3 6 4 5 Acquisition costs • Attributes have different acquisition costs • Exploit correlation through probabilistic model • Must consider networking cost
Network model and plan format • Assume known (quasi-static) network topology • Define traversal using (1.5-approximate) TSP • Ct(S) is expected cost of TSP (lossy communication) 2 8 7 Goal: Find subset Sthat is sufficient to answer query at minimum cost C(S) 1 9 3 6 12 4 5 10 11 Cost of collecting subset S of sensor values: C(S) = Ca(S) + Ct(S)
If we observe S=s : Ri(s) = max{ P(Xi2[a,b] | s ), 1-P(Xi2[a,b] | s )} Value of S is unknown: Ri(S) = P(s) Ri(s) ds Optimization problem: Choosing observation plan Is a subset S sufficient? Xi2[a,b] with prob. > 1-
Condition on new observations Dt BBQ system • Multivariate Gaussians • Learn from historical data posterior belief • Exhaustive or greedy search • Factor 1.5 TSP approximation Probabilistic Model • Simple matrix operations Query SQL-style query with desired confidence Data gathering plan • Value • Range • Average • Equivalent to Kalman filter • Simple matrix operations
Experimental results • Redwood trees and Intel Lab datasets • Learned models from data • Static model • Dynamic model – Kalman filter, time-indexed transition probabilities • Evaluated on a wide range of queries
Obtaining approximate values Query: True temperature value ± epsilon with confidence 95%
Approximate range queries Query: Temperature in [T1,T2] with confidence 95%
Condition on new observations Dt BBQ system • Multivariate Gaussians • Learn from historical data posterior belief • Exhaustive or greedy search • Factor 1.5 TSP approximation Probabilistic Model • Simple matrix operations Query • Extensions • More complex queries • Other probabilistic models • More advanced planning • Outlier detection • Dynamic networks • Continuous queries • … SQL-style query with desired confidence Data gathering plan • Value • Range • Average • Equivalent to Kalman filter • Simple matrix operations
Conclusions • Model-driven data acquisition • Observe fewer attributes • Exploit correlations • Reuse information between queries • Directly deal with missing data • Answer more complex (probabilistic) queries • Basis for future sensor network systems