Model-driven Data Acquisition in Sensor Networks

Model-driven Data Acquisition in Sensor Networks Amol Deshpande1,4Carlos Guestrin4,2Sam Madden4,3 Joe Hellerstein1,4Wei Hong4 1UC Berkeley 2Carnegie Mellon University 3MIT 4Intel Research - Berkeley

Sensor networks and distributed systems • A collection of devices that can sense, actuate, and communicate over a wireless network • Sensors for temperature, humidity, pressure, sound, magnetic fields, acceleration, visible and ultraviolet light, etc. • Available resources • 4 MHz, 8 bit CPU • 40 Kbps wireless • 3V battery (lasts days or months) • Analogous issues in other distributed systems, including streams and the Internet

Redwoods • Precision agriculture • Fabrication monitoring Leach's Storm Petrel Real deployments • Great Duck Island

Example: Intel Berkeley Lab deployment

Distribute query Collect query answer or data Analogy:Sensor net as a database Data aggregation: • Can reduce communication TinyDB Query SQL-style query Declarative interface: • Sensor nets are not just for PhDs • Decrease deployment time Every time step

Redo process every time query changes Distribute query Collect data Limitations of existing approach Data collection: • Every node must wake up at every time step • Data loss ignored • No quality guarantees • Data inefficient – ignoring correlations Query distribution: • Every node must receive query TinyDB New Query Query SQL-style query Every time step

Spatial-temporal correlation Inter-attributed correlation Sensor net data is correlated • Data is not i.i.d.  shouldn’t ignore missing data • Observing one sensor  information about other sensors (and future values) • Observing one attribute  information about other attributes

Data gathering plan Condition on new observations Dt Model-driven data acquisition: overview posterior belief Probabilistic Model • Strengths of model-based data acquisition • Observe fewer attributes • Exploit correlations • Reuse information between queries • Directly deal with missing data • Answer more complex (probabilistic) queries New Query Query SQL-style query with desired confidence

Probabilistic models and queries User’s perspective: Query SELECT nodeId, temp ± 0.5°C, conf(.95) FROM sensors WHERE nodeId in {1..8} System selects and observes subset of nodes Observed nodes: {3,6,8} Query result

Probabilistic query Example: Value of X2± with prob. > 1- Observe attributes Example: Observe X1=18 P(X2|X1=18) Probabilistic models and queries Joint distribution P(X1,…,Xn) Prob. below 1-? • Learn from historical data Higher prob., could answer query

Condition on observations t Dynamic models: filtering Joint distribution at time t Fewer obs. in future queries • Example: Kalman filter • Learn from historical data Observe attributes Example: Observe X1=18

Supported queries • Value query • Xi±  with prob. at least 1- • SELECT and Range query • Xi[a,b] with prob. at least 1- • which sensors have temperature greater than 25°C ? • Aggregation • average ±  of subset of attribs. with prob. > 1- • combine aggregation and selection • probability > 10 sensors have temperature greater than 25°C ? • Queries require solution to integrals • Many queries computed in closed-form • Some require numerical integration/sampling

Condition on new observations Dt Model-driven data acquisition: overview posterior belief What sensors do we observe ? How do we collect observations? Probabilistic Model Query SQL-style query with desired confidence Data gathering plan

cheaper? 2 1 3 6 4 5 Acquisition costs • Attributes have different acquisition costs • Exploit correlation through probabilistic model • Must consider networking cost

Network model and plan format • Assume known (quasi-static) network topology • Define traversal using (1.5-approximate) TSP • Ct(S) is expected cost of TSP (lossy communication) 2 8 7 Goal: Find subset Sthat is sufficient to answer query at minimum cost C(S) 1 9 3 6 12 4 5 10 11 Cost of collecting subset S of sensor values: C(S) = Ca(S) + Ct(S)

If we observe S=s : Ri(s) = max{ P(Xi2[a,b] | s ), 1-P(Xi2[a,b] | s )}  Value of S is unknown: Ri(S) = P(s) Ri(s) ds Optimization problem: Choosing observation plan Is a subset S sufficient? Xi2[a,b] with prob. > 1-

Condition on new observations Dt BBQ system • Multivariate Gaussians • Learn from historical data posterior belief • Exhaustive or greedy search • Factor 1.5 TSP approximation Probabilistic Model • Simple matrix operations Query SQL-style query with desired confidence Data gathering plan • Value • Range • Average • Equivalent to Kalman filter • Simple matrix operations

Experimental results • Redwood trees and Intel Lab datasets • Learned models from data • Static model • Dynamic model – Kalman filter, time-indexed transition probabilities • Evaluated on a wide range of queries

Cost versus Confidence level

Obtaining approximate values Query: True temperature value ± epsilon with confidence 95%

Approximate range queries Query: Temperature in [T1,T2] with confidence 95%

Comparison to other methods

Intel Lab traversals

Condition on new observations Dt BBQ system • Multivariate Gaussians • Learn from historical data posterior belief • Exhaustive or greedy search • Factor 1.5 TSP approximation Probabilistic Model • Simple matrix operations Query • Extensions • More complex queries • Other probabilistic models • More advanced planning • Outlier detection • Dynamic networks • Continuous queries • … SQL-style query with desired confidence Data gathering plan • Value • Range • Average • Equivalent to Kalman filter • Simple matrix operations

Conclusions • Model-driven data acquisition • Observe fewer attributes • Exploit correlations • Reuse information between queries • Directly deal with missing data • Answer more complex (probabilistic) queries • Basis for future sensor network systems

Model-driven Data Acquisition in Sensor Networks

Model-driven Data Acquisition in Sensor Networks

Presentation Transcript

Sensor Data Management In Sensor Networks

Data Management in Sensor Networks

Data Dissemination in Vehicular Sensor Networks

PATRON DRIVEN ACQUISITION

Distributed Data Classification in Sensor Networks

Data centric Storage In Sensor networks

Information Quality Aware Routing in Event-Driven Sensor Networks

Model-Driven Data Acquisition in Sensor Networks - Amol Deshpande et al., VLDB ‘04

Querying Sensor Data in Smartphone Networks

Model Based Techniques for DATA RELIABILITY in Wireless Sensor Networks.

Scalable Data Collection in Sensor Networks

Data-Driven Processing in Sensor Networks

Data Gathering Tours in Sensor Networks

Data Storage Placement in Sensor Networks

Model Based Event Detection in Sensor Networks

Data Acquisition Networks

PRESTO: Feedback-driven Data Management in Sensor Network

Data Aggregation In Wireless Sensor Networks

Model Based Event Detection in Sensor Networks

FEM3- Sensor and Data Acquisition Technology

Model Based Event Detection in Sensor Networks

Sensor Data Management In Sensor Networks