240 likes | 433 Views
Data Provenance in Remote Environmental Monitoring. Dr. Christian Skalka, University of Vermont, USA. Data Provenance in Remote Environmental Monitoring (REM). REM = automated collection of data from the natural environment in remote settings. Central points:
E N D
Data Provenance in Remote Environmental Monitoring Dr. Christian Skalka, University of Vermont, USA
Data Provenance in Remote Environmental Monitoring (REM) REM = automated collection of data from the natural environment in remote settings. Central points: • Data provenance is fundamental to REM. • Data source, times, ownership are intrinsic. • REM hardware and software architectures pose unique challenges for establishing provenance. • Heterogeneous, distributed, low-power systems.
Outline Two REM case studies and problem statements: • Snowpack monitoring (SnowMAN) • The SnowMAN project summary. • Microcosmic provenance issues, challenges. • SnowMAN provenance “coping mechanisms”. • Sagehen Creek Field Station network • Overview of project setting. • Macrocosmic provenance issues, challenges. • Possible approaches to central challenges.
How Much Snow is Out There? • Snow/Water Equivalent (SWE): measurement of water content in snowpack • Not the same as snow height.
How Much Snow is Out There? • Regional snowpack profiles are critically important to natural resource planning, public safety. • Real world measurement is complicated by terrain, forest canopies, wind, exposure. • Accurate realtime SWE measurement is a “holy grail” of REM.
The UVM SnowMAN Project • A new approach to SWE measurement • Use modern computer technology for data acquisition and retrieval • A multi-modal approach to SWE approximation • Lightweight, low cost, robust, adaptable • Improved spatial and temporal resolution
Multimodal Sensor Fusion • Algorithms on sensing nodes combine multiple sensing technologies of variable power cost: • Snow height via ultrasound (cheap) • Snow density via microwave absorption (moderate) • Snow density via gamma ray attenuation (expensive)
SnowMAN System Architecture • Multiple data gathering-and-processing nodes connected via a Wireless Sensor Network (WSN) • Arduino-based on-site gateway provides datalogging via SD card, data processing • Remote data retrieval via TCP/IP over cellmodem
Provenance Issues in SnowMAN • Data reported by sensors meaningless without provenance information: • Time of sampling event • Location of sample • Type and ADC conversion formula of sensor • Refinement of multimodal fusion algorithm requires history/cause of sampling event.
Provenance Challenges in SnowMAN • Low-bandwidth requirements in WSNs • Messages must be small, infrequent. • Volatility of low-cost devices • WSN node failures require data reliability solutions • Heterogeneous network architecture • Data formats must be converted in network communications • Time synchronization
Managing Provenance in SnowMAN • Reliability ensured by datalogging on gateway, replication within WSN. • Requires data source, time to be stored with readings. • Provenance information reported with data readings. • Component of packet format; not onerously large. • Data converted at “protocol boundaries”. • 802.15.4 to RS232 to TCP/IP to SQL. • Time synchronization handled by simple protocols. • Low precision sufficient; cellmodem provides “true” time.
Outstanding Provenance Issues in SnowMAN • How to verify that data is converted properly at protocol boundaries? • How to encode history of multi-modal readings, for analysis and refinement of algorithms? • How to detect errors in data readings, due to sensor, time synchronization, node failure?
REM in Macrocosm: Sagehen Creek Field Station Sagehen Creek Field Station and Experimental Forest located near Truckee, CA • Research and Teaching Facility of UC Berkeley • 9,000 acres of undisturbed wilderness, extensive REM technology
REM in Macrocosm: Sagehen Creek Field Station • Literally hundreds of various sensor devices • Temperature, wind, humidity • Streamflow, Stream temperature • Snow height, SWE • Video • 9 hubs with (programmable) dataloggers, power, wireless transmission • Goal: wireless connectivity to field house and internet, off-site data warehousing • Multiple user, administration groups
Provenance Issues at Sagehen • Inherits microcosmic issues (time, location, sensor modality essential to data). • Video triggering events should be reported. • Group data ownershipnow important to report (and maintain through data cycle). • Sagehen provenance should be credited in myriad end-uses of data. • Diagnostics of network functionality and services.
Provenance Challenges at Sagehen Inherits microcosmic challenges, but: • Increased sampling rates, network traffic • Time synchronization much more complex • GPS auto-location for some sensors, manual for others • Much greater diversity of devices, communications mediums (wired, wireless) • More protocol boundaries • Multimedia
Sagehen Provenance Issues: Scalability Sagehen network modeled as source-to-sink dataflow, from sensors to end-users. • Sources extensible by user groups • New sensors, sensor networks (e.g. WSNs) • New remote datalogging/replication architecture • Sink usable by end-user groups • Arbitrary visualization technologies • Diverse research and education applications
Sagehen Network: The Current Reality • Establishing data communications backbone over IEEE802.11 wireless LAN. • Limited data collection over network (one-hop) via canned proprietary software. • Most data collection being done manually from dataloggers. • Sensors hardwired to dataloggers, no WSNs in the field. • Some one-hop connectivity between hubs.
Sagehen Network: The Vision • Seamless source-to-sink dataflow. • From sensors in the field to off-site, permanent data warehouse. • Also accessible onsite at remote hubs (reliable). • Wireless sensor network capabilities in the field. • Attribution of data to source groups and Sagehen. • Easy extensibility of network at source end, to allow addition of new sensors (and WSNs).
Some Ideas for Supporting Provenance in the Sagehen Software Architecture Treating data like messages on a protocol stack. • Stack defined across device (protocol) boundaries: • Sensor data is “raw”, collects more provenance information as it moves towards the sink. • Higher layers of provenance (time, ownership) encapsulate lower layers. • Allows compositional (principled) treatment of cross-protocol data transformation.
Some Ideas for Supporting Provenance in the Sagehen Software Architecture Watermarking data to establish Sagehen and group ownership. • Easily done for video media. • Video retrieved only from the internet; watermarking performed on traditional platform. • Watermarking sensor data?? • Need to preserve data may not tolerate traditional techniques. • In-the-field retrieval requires in-the-field watermarking.
Conclusion • Remote environmental monitoring requires provenance for correct interpretation of data. • REM networks heterogeneous, some components computationally “weak”. • Power, cost restrictions. • Protocol hodgepodge! • Adapting to REM environment a unique challenge for provenance in software.
Conclusion Two case studies: • SnowMAN: lightweight, low cost SWE monitoring. • Sagehen Creek Field Station: REM in macrocosm. http:www.cs.uvm.edu/~skalka http://sagehen.ucnrs.org/