1 / 27

Data Quality and Query Cost in Pervasive Sensing Systems

Data Quality and Query Cost in Pervasive Sensing Systems. David J. Yates. Bentley College Computer Information Systems Dept. Waltham, Massachusetts, USA dyates@bentley.edu. Joint Work With …. Erich Nahum IBM T.J. Watson Research Center 19 Skyline Drive Hawthorne, New York, USA

dunphy
Download Presentation

Data Quality and Query Cost in Pervasive Sensing Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Quality and Query Cost in Pervasive Sensing Systems David J. Yates Bentley CollegeComputer Information Systems Dept. Waltham, Massachusetts, USA dyates@bentley.edu David Yates

  2. Joint Work With … Erich Nahum IBM T.J. Watson Research Center 19 Skyline Drive Hawthorne, New York, USA James Kurose and Prashant Shenoy Dept. of Computer Science University of Massachusetts Amherst, Massachusetts, USA David Yates

  3. Talk Outline • Data quality and query cost for pervasive sensing systems • Motivation and introduction • Pervasive sensing applications • Resource-constrained sensor fields • Sensor networks and backbone networks • Data management techniques to conserve resources • Sensor network data server and cache • Query cost, data quality, delay, value deviation • Cost and quality performance • Summary and Conclusions David Yates

  4. Research Contributions • Define and quantify data quality and query cost performance in pervasive sensing systems • Develop policies that approximate sensor field values using cached values for nearby locations • Prove analytic upper bound on sensor field query rate • Show cost and quality win-win for pervasive sensing applications for which response time is most important • Show cost vs. quality tradeoff for sensing applications for which accuracy is most important • Results are robust with respect to the manner in which the query workload changes David Yates

  5. Pervasive Sensing Applications • Microsensors, on-board processing, wireless interfaces feasible at very small scale – can monitor phenomena “up close” • Enables spatially and temporally dense monitoring and control Pervasive sensing will reveal previously unobservable phenomena Data center management Manufacturing engineering Environmental monitoring Natural disaster response Embedded, energy-constrained (wireless, small form-factor), unattended systems David Yates

  6. Sensors Embedded in Infrastructure • The day after a moderate earthquake jolts the city of San Francisco, building inspectors check on the structural integrity of an office building in the financial district. Sensors embedded in the walls of the building to monitor and record vibration data confirm that the structure is safe to enter. (Intel 2005) David Yates

  7. From Sensor Networks to Applications • Sensor fields (blue), backbone (yellow), monitoring & control applications (red) • Queries submitted from sensing applications • Replies received from sensor fields • Our focus – Data management at data server Routers & Switches Sensing Application Data server / Gateway (and cache) Sound … Light … Embedded, energy-constrained (wireless, small form-factor), unattended systems David Yates

  8. Data Server Node Without Cache Sensor field s {t1} s l1 Sensor network query queue s Queries Queries s s s s {t2} s l2 Replies s Replies s s Gateway reply queue s li = query location i ti = timestamp associated with value sampled in sensor field at location i s = sensor David Yates

  9. Data Server Node Without Cache Sensor field End-to-end delay occurs between Querym and Replym.Value deviation is between the value in Replym and the value at li as Replym leaves the gateway reply queue. s {t1} s l1 Sensor network query queue s Queries Queries s s Querym s s {t2} s l2 Replies s Replym Replies s s Gateway reply queue s li = query location i ti = timestamp associated with value sampled in sensor field at location i s = sensor David Yates

  10. Data Server Node With Cache Sensor field For a cache hit or a miss, end-to-end delay occurs between Querym and Replym. Also, value deviation is between the value in Replym and the value at li as Replym leaves the gateway reply queue. s l3 s l1 Sensor network query queue Gateway query queue s Queries Queries s s Querym Miss or Prefetch s eli = {li,vi,ti} el1, el2 s Cache s l2 Hit s Replym Updates or replies Updates Replies s s Cache update queue Gateway reply queue s li = query location; eli = cache entry for query location vi = value in cache associated with location i s = sensor ti = timestamp of value associated with location i Locations l1 and l2 are cached in entries el1 and el2 David Yates

  11. Query Cost and Data Quality Cost to query location li is normalized such that Normalized quality using softmax normalization David Yates

  12. precise lookups and queries Caching and Lookup Policies • All hits • All misses • Simple lookup • Piggyback queries • Greedy age-based lookup • Greedy distance-based lookup • Median-of-3 lookup approximate lookups and queries Policies incorporate an age parameter TTcan be 0, finite, or infinite David Yates

  13. Research Contributions • Defined and quantified data quality and query cost performance in pervasive sensing systems • Developed policies that approximate sensor field values using cached values for nearby locations • Prove analytic upper bound on sensor field query rate • Show cost and quality win-win for pervasive sensing applications for which response time is most important • Show cost vs. quality tradeoff for sensing applications for which accuracy is most important • Results are robust with respect to the manner in which the query workload changes David Yates

  14. Lab Trace Data Trace data from multi-sensor motes deployed at Intel Berkeley lab (Deshpande 2004) David Yates

  15. Lab Environment and Workload • 2.3 million readings taken over 35+ days • Use readings with largest changes in value in our simulator (light measured in Lux) • Changes occur slowly relative to correlated changes (about 1 location every 1.4 seconds) • But, range of values is large • Applications determine values for A and T David Yates

  16. Bounded Resource Consumption • N is set of locations in sensor field • Cache entry for each location used by multiple queries for periods of T seconds (requires blocking behind pending queries) • Sensor field query rate can be bounded by: queries per second • Proof: Induction on size of N • Sensor field transmissions dominate resource consumption David Yates

  17. Data Quality Driven by Response Time Picking a large value of A means delay is more importantthan value deviationConsider normalized quality when A = 0.9 David Yates

  18. Cost and Quality Performance whenResponse Time drives Quality Trace-driven ChangesA = 0.9, T = 90 secQuery rate = 0.9 lpsChange rate = 1.4 lps Approximate greedy lookups outperform other policiesThere is a win-win here! David Yates

  19. Delay when Response Time drives Quality Trace-driven Changes David Yates

  20. Research Contributions • Defined and quantified data quality and query cost performance in pervasive sensing systems • Developed policies that approximate sensor field values using cached values for nearby locations • Proved analytic upper bound on sensor field query rate • Showed cost and quality win-win for pervasive sensing applications for which response time is most important • Show cost vs. quality tradeoff for sensing applications for which accuracy is most important • Results are robust with respect to the manner in which the query workload changes David Yates

  21. Data Quality Driven by Accuracy Choosing a small value of A means value deviation is moreimportant to data quality than delayFor example, consider normalized quality when A = 0.1 David Yates

  22. Cost vs. Quality when Accuracy drives Quality Trace-driven ChangesA = 0.1, T = 90 secQuery rate = 0.9 lpsChange rate = 1.4 lps There is a tradeoff between cost and quality here David Yates

  23. Value Deviation when Accuracy drives Quality Trace-driven Changes Significant differences in accuracy between policies David Yates

  24. Cost and Quality Trends when Response Time drives Quality Trace-driven ChangesA = 0.9, T = 9 secQuery rate = 90, 9,and 0.9 lps Again, there is awin-win here! David Yates

  25. Cost vs. Quality Trends when Accuracy drives Quality Trace-driven ChangesA = 0.1, T = 9 secQuery rate = 90, 9,and 0.9 lps Same relative performance David Yates

  26. Talk Summary • Define and quantify data quality and query cost performance in pervasive sensing systems • Develop policies that approximate sensor field values using cached values for nearby locations • Prove analytic upper bound on sensor field query rate • Show cost and quality win-win for pervasive sensing applications for which response time is most important • Show cost vs. quality tradeoff for sensing applications for which accuracy is most important • Results are robust with respect to the manner in which the query workload changes David Yates

  27. Thank You! • Further questions ??? • … David J. Yates Bentley CollegeComputer Information Systems Dept. Waltham, Massachusetts, USA dyates@bentley.edu David Yates

More Related