350 likes | 359 Views
Explore shared infrastructure, applications, and services in scientific and business environments. Discuss data cleaning, core services, and future prospects in data management for sensor networks.
E N D
Abstractions for Shared Sensor Networks Michael J. Franklin DMSN September 2006
Outline • Perspective on shared infrastructure • Scientific Applications • Business Environments • Data Cleaning as a Shared Service • Other Core Services • What’s core and what isn’t? • Conclusions Mike Franklin UC Berkeley EECS
Scientific Instruments Cost: moderate Users: one Use: defense/navigation Scheduling: ad hoc Data Cleaning: cloth Mike Franklin UC Berkeley EECS
Scientific Instruments Cost: more Users: one Use: science Scheduling: ad hoc Data Cleaning: religious Mike Franklin UC Berkeley EECS
Scientific Instruments Cost: 100’s K$ (1880s $) Users: 100’s Use: science Scheduling: by committee Data Cleaning: grad students Mike Franklin UC Berkeley EECS
Scientific Instruments Cost: 100’s M$ (2010s $) Users: 1000’s-millions Use: science and education Scheduling: mostly static - SURVEY Data cleaning: mostly algorithmic Key Point: Enabled by modern (future) Data Management! Mike Franklin UC Berkeley EECS
Shared Infrastructure • Sharing dictated by costs • Costs of hardware • Costs of deployment • Costs of maintenance • Pooled Resource Management • Comptetitively Scheduled • Statically Scheduled (surveys) • Data Cleaning • At the instrument • By the applications (or end users) • Other Services Mike Franklin UC Berkeley EECS
They will be shared resources: - across organizations - across apps w/in organizations Shared Sensor Nets • Macroscopes are expensive: • to design • to build • to deploy • to operate and maintain Q: What are the right abstractions to support them? Mike Franklin UC Berkeley EECS
Inventory Etc. Data Mart Data Mart Traditional Shared Data Mgmt All users/apps see only cleaned data: a.k.a. “TRUTH” Users Data Feeds Reports Point of Sale Extract Transform Load Business Intelligence Dashboards Data Warehouse Operational Systems Cleaning, Auditing, … ad hoc Queries Mike Franklin UC Berkeley EECS
Quality Estimation Scheduling Provisioning Data Cleaning Tasking/ Programming Evolution Monitoring Actuation Shared SensorNet Services Query & Reporting Data Collection We will need to understand the shared/custom tradeoffs for all of these. Mike Franklin UC Berkeley EECS
Data Cleaning as a Shared Service Mike Franklin UC Berkeley EECS
Some Data Quality Problems with Sensors • (Cheap) sensors are failure and error prone (and people want their sensors to be really cheap). • Device interface is too low level for applications. • They produce too much (uninteresting) data. • They produce some interesting data, and it’s hard to tell case #3 from case #4. • Sensitive to environmental conditions. Mike Franklin UC Berkeley EECS
Problem 1a: Sensors are Noisy • A simple RFID Experiment • 2 adjacent shelves, 6 ft. wide • 10 EPC-tagged items each, plus 5 moved between them • RFID antenna on each shelf Mike Franklin UC Berkeley EECS
Shelf RIFD Test - Ground Truth Mike Franklin UC Berkeley EECS
“Restock every time inventory goes below 5” Actual RFID Readings Mike Franklin UC Berkeley EECS
3 temperature-sensing motes in the same room Prob 1b: Sensors “Fail Dirty” Outlier Mote Average Mike Franklin UC Berkeley EECS
Problem 2: Low-level Interface Lack of good support for devices increases the complexity of sensor-based applications. Mike Franklin UC Berkeley EECS
Problems 3 and 4: The Wheat from the Chaff Shelf RFID reports (50 times/sec): • there are 100 items on the shelf • the 100 items are still on the shelf • the 100 items are still on the shelf • the 100 items are still on the shelf • the 100 items are still on the shelf • the 100 items are still on the shelf • the 100 items are still on the shelf • the 100 items are still on the shelf • the 100 items are still on the shelf • the 100 items are still on the shelf • the 100 items are still on the shelf • the 100 items are still on the shelf • the 100 items are still on the shelf • the 100 items are still on the shelf • the 100 items are still on the shelf • the 100 items are still on the shelf • the 100 items are still on the shelf • the 100 items are still on the shelf • the 100 items are still on the shelf • there are 99 items on the shelf • the 99 items are still on the shelf Mike Franklin UC Berkeley EECS
Problem 5: Environment Read Rate vs. Distance using same reader and tag in the room next door Read Rate vs. Distance Alien I2 Tag in a room on the 4th floor of Soda Hall Mike Franklin UC Berkeley EECS
VICE: Virtual Device Interface[Jeffery et al., Pervasive 2006] “Metaphysical Data Independence” • Goal: Hide messy details of underlying physical devices. • Error characteristics • Failure • Calibration • Sampling Issues • Device Management • Physical vs. Virtual • Fundamental abstractions: • Spatial & temporal granules Mike Franklin UC Berkeley EECS
“Virtual Device (VICE) API” VICE - A Virtual Device Layer Vice API is a natural place to hide much of the complexity arising from physical devices. Mike Franklin UC Berkeley EECS
Analyze Join w/Stored Data Validate Arbitrate Multiple Receptors Smooth Window Clean Single Tuple The VICE Query Pipeline On-line Data Mining Vice Stages Generalization Mike Franklin UC Berkeley EECS
RFID data has many dropped readings Typically, use a smoothing filter tointerpolate Smoothing Filter RFID Smoothing w/Queries SELECT distinct tag_id FROM RFID_stream [RANGE ‘5 sec’] GROUP BY tag_id Smoothed output Raw readings Time Mike Franklin UC Berkeley EECS
After Vice Processing “Restock every time inventory goes below 5” Mike Franklin UC Berkeley EECS
Adaptive Smoothing[Jeffery et al. VLDB 2006] Mike Franklin UC Berkeley EECS
Ongoing Work: Spatial Smoothing • With multiple readers, more complicated Two rooms, two readers per room C A B D Reinforcement A? B? A U B? A B? Arbitration A? C? U All are addressed by statistical framework! Mike Franklin UC Berkeley EECS
If you knew what was going to happen, you wouldn’t need sensors upside down airplane ozone layer hole Monitoring vs. Needle-in-a-haystack Probability-based smoothing may remove unlikely, but real events! Problems with a single Truth Mike Franklin UC Berkeley EECS
Risks of too little cleaning • GIGO • Complexity- Burden on App Developers • Efficiency (repeated work) • Too much opportunity for error Mike Franklin UC Berkeley EECS
Risks of too much cleaning The appearance of a hole in the earth's ozone layer over Antarctica, first detected in 1976, was so unexpected that scientists didn't pay attention to what their instruments were telling them; they thought their instruments were malfunctioning. National Center for Atmospheric Research In fact, the data were rejected as unreasonable by data quality control algorithms Mike Franklin UC Berkeley EECS
One Truth for Sensor Nets? • How clean is “clean-enough”? • How much cleaning is too much? • Answers are likely to be: • domain-specific • sensor-specific • application-specific • user-specific • all of the above? How to split between shared and application-specific cleaning? Mike Franklin UC Berkeley EECS
Fuzzy Truth One solution is to make the shared interface richer. Probabilistic Data Management is also the key to “Calm Computing” Mike Franklin UC Berkeley EECS
Adding Quality Assessment A. Das Sarma, S. Jeffery, M. Franklin, J. Widom, “Estimating Data Stream Quality for Object-Detection Applications”, 3rd Intl ACM SIGMOD Workshop on Information Quality in Info Sys, 2006 Mike Franklin UC Berkeley EECS
‘Data Furnace” Architecture Garafalakis et al. D.E. Bulletin, 3/06 • Service Layer • Probabilistic Reasoning • Uncertainty Management • Data Model Learning • Complex Event Processing • Data Archiving and Streaming Mike Franklin UC Berkeley EECS
Scheduling Provisioning Tasking/ Programming Evolution Monitoring Actuation Rethinking Service Abstractions Query-Data Collection Quality Estimation Data Cleaning We will need to understand the shared/custom tradeoffs for all of these. Mike Franklin UC Berkeley EECS
Conclusions • Much current sensor research is focused on the “single user” or “single app” model. • Sensor networks will be shared resources. • Can leverage some ideas from current shared Data Management infrastructures. • But, new solutions, abstractions, and architectures will be required. Mike Franklin UC Berkeley EECS