270 likes | 281 Views
Data Quality and Query Cost in Pervasive Sensing Systems. David J. Yates. Bentley College Computer Information Systems Dept. Waltham, Massachusetts, USA dyates@bentley.edu. Joint Work With …. Erich Nahum IBM T.J. Watson Research Center 19 Skyline Drive Hawthorne, New York, USA
E N D
Data Quality and Query Cost in Pervasive Sensing Systems David J. Yates Bentley CollegeComputer Information Systems Dept. Waltham, Massachusetts, USA dyates@bentley.edu David Yates
Joint Work With … Erich Nahum IBM T.J. Watson Research Center 19 Skyline Drive Hawthorne, New York, USA James Kurose and Prashant Shenoy Dept. of Computer Science University of Massachusetts Amherst, Massachusetts, USA David Yates
Talk Outline • Data quality and query cost for pervasive sensing systems • Motivation and introduction • Pervasive sensing applications • Resource-constrained sensor fields • Sensor networks and backbone networks • Data management techniques to conserve resources • Sensor network data server and cache • Query cost, data quality, delay, value deviation • Cost and quality performance • Summary and Conclusions David Yates
Research Contributions • Define and quantify data quality and query cost performance in pervasive sensing systems • Develop policies that approximate sensor field values using cached values for nearby locations • Prove analytic upper bound on sensor field query rate • Show cost and quality win-win for pervasive sensing applications for which response time is most important • Show cost vs. quality tradeoff for sensing applications for which accuracy is most important • Results are robust with respect to the manner in which the query workload changes David Yates
Pervasive Sensing Applications • Microsensors, on-board processing, wireless interfaces feasible at very small scale – can monitor phenomena “up close” • Enables spatially and temporally dense monitoring and control Pervasive sensing will reveal previously unobservable phenomena Data center management Manufacturing engineering Environmental monitoring Natural disaster response Embedded, energy-constrained (wireless, small form-factor), unattended systems David Yates
Sensors Embedded in Infrastructure • The day after a moderate earthquake jolts the city of San Francisco, building inspectors check on the structural integrity of an office building in the financial district. Sensors embedded in the walls of the building to monitor and record vibration data confirm that the structure is safe to enter. (Intel 2005) David Yates
From Sensor Networks to Applications • Sensor fields (blue), backbone (yellow), monitoring & control applications (red) • Queries submitted from sensing applications • Replies received from sensor fields • Our focus – Data management at data server Routers & Switches Sensing Application Data server / Gateway (and cache) Sound … Light … Embedded, energy-constrained (wireless, small form-factor), unattended systems David Yates
Data Server Node Without Cache Sensor field s {t1} s l1 Sensor network query queue s Queries Queries s s s s {t2} s l2 Replies s Replies s s Gateway reply queue s li = query location i ti = timestamp associated with value sampled in sensor field at location i s = sensor David Yates
Data Server Node Without Cache Sensor field End-to-end delay occurs between Querym and Replym.Value deviation is between the value in Replym and the value at li as Replym leaves the gateway reply queue. s {t1} s l1 Sensor network query queue s Queries Queries s s Querym s s {t2} s l2 Replies s Replym Replies s s Gateway reply queue s li = query location i ti = timestamp associated with value sampled in sensor field at location i s = sensor David Yates
Data Server Node With Cache Sensor field For a cache hit or a miss, end-to-end delay occurs between Querym and Replym. Also, value deviation is between the value in Replym and the value at li as Replym leaves the gateway reply queue. s l3 s l1 Sensor network query queue Gateway query queue s Queries Queries s s Querym Miss or Prefetch s eli = {li,vi,ti} el1, el2 s Cache s l2 Hit s Replym Updates or replies Updates Replies s s Cache update queue Gateway reply queue s li = query location; eli = cache entry for query location vi = value in cache associated with location i s = sensor ti = timestamp of value associated with location i Locations l1 and l2 are cached in entries el1 and el2 David Yates
Query Cost and Data Quality Cost to query location li is normalized such that Normalized quality using softmax normalization David Yates
precise lookups and queries Caching and Lookup Policies • All hits • All misses • Simple lookup • Piggyback queries • Greedy age-based lookup • Greedy distance-based lookup • Median-of-3 lookup approximate lookups and queries Policies incorporate an age parameter TTcan be 0, finite, or infinite David Yates
Research Contributions • Defined and quantified data quality and query cost performance in pervasive sensing systems • Developed policies that approximate sensor field values using cached values for nearby locations • Prove analytic upper bound on sensor field query rate • Show cost and quality win-win for pervasive sensing applications for which response time is most important • Show cost vs. quality tradeoff for sensing applications for which accuracy is most important • Results are robust with respect to the manner in which the query workload changes David Yates
Lab Trace Data Trace data from multi-sensor motes deployed at Intel Berkeley lab (Deshpande 2004) David Yates
Lab Environment and Workload • 2.3 million readings taken over 35+ days • Use readings with largest changes in value in our simulator (light measured in Lux) • Changes occur slowly relative to correlated changes (about 1 location every 1.4 seconds) • But, range of values is large • Applications determine values for A and T David Yates
Bounded Resource Consumption • N is set of locations in sensor field • Cache entry for each location used by multiple queries for periods of T seconds (requires blocking behind pending queries) • Sensor field query rate can be bounded by: queries per second • Proof: Induction on size of N • Sensor field transmissions dominate resource consumption David Yates
Data Quality Driven by Response Time Picking a large value of A means delay is more importantthan value deviationConsider normalized quality when A = 0.9 David Yates
Cost and Quality Performance whenResponse Time drives Quality Trace-driven ChangesA = 0.9, T = 90 secQuery rate = 0.9 lpsChange rate = 1.4 lps Approximate greedy lookups outperform other policiesThere is a win-win here! David Yates
Delay when Response Time drives Quality Trace-driven Changes David Yates
Research Contributions • Defined and quantified data quality and query cost performance in pervasive sensing systems • Developed policies that approximate sensor field values using cached values for nearby locations • Proved analytic upper bound on sensor field query rate • Showed cost and quality win-win for pervasive sensing applications for which response time is most important • Show cost vs. quality tradeoff for sensing applications for which accuracy is most important • Results are robust with respect to the manner in which the query workload changes David Yates
Data Quality Driven by Accuracy Choosing a small value of A means value deviation is moreimportant to data quality than delayFor example, consider normalized quality when A = 0.1 David Yates
Cost vs. Quality when Accuracy drives Quality Trace-driven ChangesA = 0.1, T = 90 secQuery rate = 0.9 lpsChange rate = 1.4 lps There is a tradeoff between cost and quality here David Yates
Value Deviation when Accuracy drives Quality Trace-driven Changes Significant differences in accuracy between policies David Yates
Cost and Quality Trends when Response Time drives Quality Trace-driven ChangesA = 0.9, T = 9 secQuery rate = 90, 9,and 0.9 lps Again, there is awin-win here! David Yates
Cost vs. Quality Trends when Accuracy drives Quality Trace-driven ChangesA = 0.1, T = 9 secQuery rate = 90, 9,and 0.9 lps Same relative performance David Yates
Talk Summary • Define and quantify data quality and query cost performance in pervasive sensing systems • Develop policies that approximate sensor field values using cached values for nearby locations • Prove analytic upper bound on sensor field query rate • Show cost and quality win-win for pervasive sensing applications for which response time is most important • Show cost vs. quality tradeoff for sensing applications for which accuracy is most important • Results are robust with respect to the manner in which the query workload changes David Yates
Thank You! • Further questions ??? • … David J. Yates Bentley CollegeComputer Information Systems Dept. Waltham, Massachusetts, USA dyates@bentley.edu David Yates