1 / 35

Abstractions for Shared Sensor Networks

Explore shared infrastructure, applications, and services in scientific and business environments. Discuss data cleaning, core services, and future prospects in data management for sensor networks.

aundrea
Download Presentation

Abstractions for Shared Sensor Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Abstractions for Shared Sensor Networks Michael J. Franklin DMSN September 2006

  2. Outline • Perspective on shared infrastructure • Scientific Applications • Business Environments • Data Cleaning as a Shared Service • Other Core Services • What’s core and what isn’t? • Conclusions Mike Franklin UC Berkeley EECS

  3. Scientific Instruments Cost: moderate Users: one Use: defense/navigation Scheduling: ad hoc Data Cleaning: cloth Mike Franklin UC Berkeley EECS

  4. Scientific Instruments Cost: more Users: one Use: science Scheduling: ad hoc Data Cleaning: religious Mike Franklin UC Berkeley EECS

  5. Scientific Instruments Cost: 100’s K$ (1880s $) Users: 100’s Use: science Scheduling: by committee Data Cleaning: grad students Mike Franklin UC Berkeley EECS

  6. Scientific Instruments Cost: 100’s M$ (2010s $) Users: 1000’s-millions Use: science and education Scheduling: mostly static - SURVEY Data cleaning: mostly algorithmic Key Point: Enabled by modern (future) Data Management! Mike Franklin UC Berkeley EECS

  7. Shared Infrastructure • Sharing dictated by costs • Costs of hardware • Costs of deployment • Costs of maintenance • Pooled Resource Management • Comptetitively Scheduled • Statically Scheduled (surveys) • Data Cleaning • At the instrument • By the applications (or end users) • Other Services Mike Franklin UC Berkeley EECS

  8. They will be shared resources: - across organizations - across apps w/in organizations Shared Sensor Nets • Macroscopes are expensive: • to design • to build • to deploy • to operate and maintain Q: What are the right abstractions to support them? Mike Franklin UC Berkeley EECS

  9. Inventory Etc. Data Mart Data Mart Traditional Shared Data Mgmt All users/apps see only cleaned data: a.k.a. “TRUTH” Users Data Feeds Reports Point of Sale Extract Transform Load Business Intelligence Dashboards Data Warehouse Operational Systems Cleaning, Auditing, … ad hoc Queries Mike Franklin UC Berkeley EECS

  10. Quality Estimation Scheduling Provisioning Data Cleaning Tasking/ Programming Evolution Monitoring Actuation Shared SensorNet Services Query & Reporting Data Collection We will need to understand the shared/custom tradeoffs for all of these. Mike Franklin UC Berkeley EECS

  11. Data Cleaning as a Shared Service Mike Franklin UC Berkeley EECS

  12. Some Data Quality Problems with Sensors • (Cheap) sensors are failure and error prone (and people want their sensors to be really cheap). • Device interface is too low level for applications. • They produce too much (uninteresting) data. • They produce some interesting data, and it’s hard to tell case #3 from case #4. • Sensitive to environmental conditions. Mike Franklin UC Berkeley EECS

  13. Problem 1a: Sensors are Noisy • A simple RFID Experiment • 2 adjacent shelves, 6 ft. wide • 10 EPC-tagged items each, plus 5 moved between them • RFID antenna on each shelf Mike Franklin UC Berkeley EECS

  14. Shelf RIFD Test - Ground Truth Mike Franklin UC Berkeley EECS

  15. “Restock every time inventory goes below 5” Actual RFID Readings Mike Franklin UC Berkeley EECS

  16. 3 temperature-sensing motes in the same room Prob 1b: Sensors “Fail Dirty” Outlier Mote Average Mike Franklin UC Berkeley EECS

  17. Problem 2: Low-level Interface Lack of good support for devices increases the complexity of sensor-based applications. Mike Franklin UC Berkeley EECS

  18. Problems 3 and 4: The Wheat from the Chaff Shelf RFID reports (50 times/sec): • there are 100 items on the shelf • the 100 items are still on the shelf • the 100 items are still on the shelf • the 100 items are still on the shelf • the 100 items are still on the shelf • the 100 items are still on the shelf • the 100 items are still on the shelf • the 100 items are still on the shelf • the 100 items are still on the shelf • the 100 items are still on the shelf • the 100 items are still on the shelf • the 100 items are still on the shelf • the 100 items are still on the shelf • the 100 items are still on the shelf • the 100 items are still on the shelf • the 100 items are still on the shelf • the 100 items are still on the shelf • the 100 items are still on the shelf • the 100 items are still on the shelf • there are 99 items on the shelf • the 99 items are still on the shelf Mike Franklin UC Berkeley EECS

  19. Problem 5: Environment Read Rate vs. Distance using same reader and tag in the room next door Read Rate vs. Distance Alien I2 Tag in a room on the 4th floor of Soda Hall Mike Franklin UC Berkeley EECS

  20. VICE: Virtual Device Interface[Jeffery et al., Pervasive 2006] “Metaphysical Data Independence” • Goal: Hide messy details of underlying physical devices. • Error characteristics • Failure • Calibration • Sampling Issues • Device Management • Physical vs. Virtual • Fundamental abstractions: • Spatial & temporal granules Mike Franklin UC Berkeley EECS

  21. “Virtual Device (VICE) API” VICE - A Virtual Device Layer Vice API is a natural place to hide much of the complexity arising from physical devices. Mike Franklin UC Berkeley EECS

  22. Analyze Join w/Stored Data Validate Arbitrate Multiple Receptors Smooth Window Clean Single Tuple The VICE Query Pipeline On-line Data Mining Vice Stages Generalization Mike Franklin UC Berkeley EECS

  23. RFID data has many dropped readings Typically, use a smoothing filter tointerpolate Smoothing Filter RFID Smoothing w/Queries SELECT distinct tag_id FROM RFID_stream [RANGE ‘5 sec’] GROUP BY tag_id Smoothed output Raw readings Time Mike Franklin UC Berkeley EECS

  24. After Vice Processing “Restock every time inventory goes below 5” Mike Franklin UC Berkeley EECS

  25. Adaptive Smoothing[Jeffery et al. VLDB 2006] Mike Franklin UC Berkeley EECS

  26. Ongoing Work: Spatial Smoothing • With multiple readers, more complicated Two rooms, two readers per room C A B D Reinforcement  A? B? A U B? A B? Arbitration  A? C? U  All are addressed by statistical framework! Mike Franklin UC Berkeley EECS

  27. If you knew what was going to happen, you wouldn’t need sensors upside down airplane ozone layer hole Monitoring vs. Needle-in-a-haystack Probability-based smoothing may remove unlikely, but real events! Problems with a single Truth Mike Franklin UC Berkeley EECS

  28. Risks of too little cleaning • GIGO • Complexity- Burden on App Developers • Efficiency (repeated work) • Too much opportunity for error Mike Franklin UC Berkeley EECS

  29. Risks of too much cleaning The appearance of a hole in the earth's ozone layer over Antarctica, first detected in 1976, was so unexpected that scientists didn't pay attention to what their instruments were telling them; they thought their instruments were malfunctioning. National Center for Atmospheric Research In fact, the data were rejected as unreasonable by data quality control algorithms Mike Franklin UC Berkeley EECS

  30. One Truth for Sensor Nets? • How clean is “clean-enough”? • How much cleaning is too much? • Answers are likely to be: • domain-specific • sensor-specific • application-specific • user-specific • all of the above? How to split between shared and application-specific cleaning? Mike Franklin UC Berkeley EECS

  31. Fuzzy Truth One solution is to make the shared interface richer. Probabilistic Data Management is also the key to “Calm Computing” Mike Franklin UC Berkeley EECS

  32. Adding Quality Assessment A. Das Sarma, S. Jeffery, M. Franklin, J. Widom, “Estimating Data Stream Quality for Object-Detection Applications”, 3rd Intl ACM SIGMOD Workshop on Information Quality in Info Sys, 2006 Mike Franklin UC Berkeley EECS

  33. ‘Data Furnace” Architecture Garafalakis et al. D.E. Bulletin, 3/06 • Service Layer • Probabilistic Reasoning • Uncertainty Management • Data Model Learning • Complex Event Processing • Data Archiving and Streaming Mike Franklin UC Berkeley EECS

  34. Scheduling Provisioning Tasking/ Programming Evolution Monitoring Actuation Rethinking Service Abstractions Query-Data Collection Quality Estimation Data Cleaning We will need to understand the shared/custom tradeoffs for all of these. Mike Franklin UC Berkeley EECS

  35. Conclusions • Much current sensor research is focused on the “single user” or “single app” model. • Sensor networks will be shared resources. • Can leverage some ideas from current shared Data Management infrastructures. • But, new solutions, abstractions, and architectures will be required. Mike Franklin UC Berkeley EECS

More Related