140 likes | 282 Views
Sensor Data Management: Challenges and (some) Solutions. Amol Deshpande, University of Maryland. RFID. Distributed measurement networks (e.g. GPS). Wireless sensor networks. Industrial Monitoring. Motivation. Unprecedented, and rapidly increasing, instrumentation of our every-day world.
E N D
Sensor Data Management:Challenges and (some) Solutions Amol Deshpande, University of Maryland
RFID Distributed measurement networks (e.g. GPS) Wireless sensor networks Industrial Monitoring Motivation • Unprecedented, and rapidly increasing, instrumentation of our every-day world
Sensor Network User • Extract all readings into a file • Run MATLAB/R/other data processing tools • Write output to a file/back to the database • Write data processing tools to process/aggregate the output (maybe using DB) • Decide new data to acquire Repeat Sensor Data Processing: Now Database Table raw-data
Models to be applied to data in real-time (at least simple ones) Continuous (standing) queries e.g. alert monitoring Results to continuous queries Ad hoc queries (possibly against processed, modeled data) Sensor Data Processing: What we want Database User Sensor Network Table raw-data Data Table processed-data Tasks
Data Management Challenges • Very, very large scale • Spatio-temporal querying essential • Need new indexing techniques, data description formats, techniques for “data ingest” (cleaning the data etc) • Much work in scientific data management • E.g. SkyServer • Data is typically imprecise, unreliable, or incomplete (data quality) • Measurement noise, failures in sensor/GPS data • High message loss rate in wireless/RFID Balazinska et al; Data Management in the Worldwide Sensor Web; IEEE Pervasive, 2007.
Data Management Challenges • Data is generated continuously and must be processed in real-time (distributed data streams) • Need different query processing paradigms • Typically very high data rates • Must be able to handle a large number of continuous queries efficiently • Much recent work on “Data Streams” • Research systems: TelegraphCQ [Berkeley], STREAM [Stanford], Aurora [Brown/MIT/Brandeis] etc… • Commercial systems: Streambase, TruViso, … Balazinska et al; Data Management in the Worldwide Sensor Web; IEEE Pervasive, 2007.
Data Management Challenges • Need for real-time statistical modelingof data • Eliminate spatial/temporal biases, handle missing data through extrapolation (e.g. regression, interpolation models) • Filter measurement noise (e.g. Kalman Filters) • Infer hidden variables, pattern recognition (e.g. HMMs) • Fault or anomaly detection • Forecasting/prediction (e.g. ARIMA) Temperature monitoring GPS Data Regression/interpolation models Kalman Filters …
Data Management Challenges • The applications have strong acquisitional aspects • Data has to be actively acquired as needed • Typically high data acquisition costs(e.g. energy consumption in battery-powered devices) • Data provenance • Being able to trace something back to its origins • Data exploration and visualization • Data interoperability • Data security and privacy • … Balazinska et al; Data Management in the Worldwide Sensor Web; IEEE Pervasive, 2007.
My Research Interests • Managing imprecise and incomplete data • Support statistical modeling and querying of sensor data in relational databases • Clean, declarative abstractions • Real-time processing of streaming data • Probabilistic databases • Store and query data annotated with probabilities • Energy-efficient algorithms for wireless sensornets • Data acquisition, target monitoring, data compression .. • In-network query processing
MauveDB • Written using Apache Derby Java open source DBMS • Supports an abstraction called model-based views • Declarative specification of models to be applied • Can query the output of the models using SQL • Models kept updated as new data/measurements arrive A. Deshpande, S. Madden; MauveDB: Supporting Model-based User Views in Database Systems; SIGMOD 2006 B. Kanagal, A. Deshpande; Online Filtering, Smoothing and Probabilistic Modeling of Streaming data; ICDE 2008
MauveDB A. Deshpande, S. Madden; MauveDB: Supporting Model-based User Views in Database Systems; SIGMOD 2006 B. Kanagal, A. Deshpande; Online Filtering, Smoothing and Probabilistic Modeling of Streaming data; ICDE 2008
MauveDB • Written using Apache Derby Java open source DBMS • Supports an abstraction called model-based views • Declarative specification of models to be applied • Can query the output of the models using SQL • Models kept updated as new data/measurements arrive • Status: • Support for Regression- and Interpolation-based views • Currently building support for views based on Dynamic Bayesian networks (Kalman Filters, HMMs etc) • Ongoing work: • Query processing and optimization, continuous queries • APIs for arbitrary models … A. Deshpande, S. Madden; MauveDB: Supporting Model-based User Views in Database Systems; SIGMOD 2006 B. Kanagal, A. Deshpande; Online Filtering, Smoothing and Probabilistic Modeling of Streaming data; ICDE 2008
Probabilistic Databases • Motivation: Increasing amounts of uncertain data • From sensor networks • Imprecise data, data with confidence/accuracy bounds • Human-observed data • Statistical modeling/machine learning • Many models provide a distribution over a set of labels (e.g. HMMs) • Information extraction from text • Social networks • How to manage and query such data in relational databases ? • Different types of uncertainties • Complex correlation patterns • Much work in database community over last few years P. Sen, A. Deshpande; Representing and Querying Correlated Tuples in Probabilistic Databases; ICDE 2007
Thanks ! • Questions ?