60 likes | 334 Views
Data Management Systems for Sensor Data. Magdalena Balazinska University of Washington. Processing Sensor Data Outside Sensor Networks. Approach 1: Store data first, then process it Traditional databases, IrisNet Approach 2: Process data as it streams
E N D
Data Management Systems for Sensor Data Magdalena Balazinska University of Washington
Processing Sensor Data Outside Sensor Networks • Approach 1: Store data first, then process it • Traditional databases, IrisNet • Approach 2: Process data as it streams • Stream processing engines: Borealis, STREAM, etc. • But users want both at the same time: • Need to integrate live streams with stream archives • Challenges: archive size, stream speed, distribution, federation, fault-tolerance, etc. • Moirae project at the University of Washingtonhttp://data.cs.washington.edu/moirae/ • Other projects: HiFi and LATTE
Sensor Data Cleaning • Sensor data contains errors • Can clean some but not all errors inside sensor network • Need to clean data at higher levels of abstraction • Using deterministic techniques (e.g., ESP framework) • Using models (e.g., BBQ project and follow ons) • Using integrity constraints (StreamClean project at UW) http://data.cs.washington.edu/streamclean/ • Cannot clean all errors deterministically • Need to build systems that can handle probabilistic data
Extracting High-Level Information from Sensor Data • Sensors produce ambiguous, low-level information • But applications are interested in high-level events • These events are increasingly more sophisticated as sensor deployments and sensor diversity grow • Need new languages and systems to extract events • Probabilistic Event EXtraction: PEEX project at UW http://data.cs.washington.edu/peex/ • Other projects: SASE, activity inference in AI
Summary • As sensor deployments • Become common place • Are used for long-lasting applications • Need new, powerful data management systems • Requirements include • Integrate live data streams with stream archives • Perform data cleaning • Extract high-level information from low-level sensor data • All this in a distributed and federated environment