1 / 40

ICS280 Presentation by Suraj Nagasrinivasa

ICS280 Presentation by Suraj Nagasrinivasa. (1) Evaluating Probabilistic Queries over Imprecise Data (SIGMOD 2003) by R Cheng, D Kalashnikov, S Prabhakar (2) Model-Driven Data Acquisition in Sensor Networks (VLDB 2004) by A Deshpande, C Guestrin, J Hellerstein, W Hong, S Madden

ranger
Download Presentation

ICS280 Presentation by Suraj Nagasrinivasa

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ICS280 Presentationby Suraj Nagasrinivasa (1) Evaluating Probabilistic Queries over Imprecise Data (SIGMOD 2003) by R Cheng, D Kalashnikov, S Prabhakar (2) Model-Driven Data Acquisition in Sensor Networks (VLDB 2004) by A Deshpande, C Guestrin, J Hellerstein, W Hong, S Madden Acknowledgements: Dmitri Kalashnikov and Michal Kapalka

  2. In typical sensor applications... • Sensors monitor external environment continuously • Sensor readings are sent back to the application • Decisions are often made based on these readings

  3. However, we face uncertainty… • Typically, DB/server collects sensor readings • DB cannot store “true” sensor value at all points in time • Scarce battery power • Limited network bandwidth • So, readings recorded at discrete time points • Value of phenomenon continuously changing • As a result, DB stored reading is mostly obsolete

  4. Scenario: Answering Minimum Query with discrete DB stored readings Recorded Temperature Current Temperature x1 y0 • x0 < y0: x is minimum • y1 < x1: y is minimum • Wrong query result x0 y1 x y

  5. Scenario: Answering Minimum Query with error-bound readings I Recorded Temperature Bound for Current Temperature y0 • x certainly gives the minimum temperature reading x0 x y

  6. Scenario: Answering Minimum Query with error-bound readings II Recorded Temperature Bound for Current Temperature y0 • Both x and y have a chance of yielding the minimum value • Which one has a higher probability? x0 x y

  7. Probabilistic Queries • Based on variation characteristics of sensor value over time: • Bounds can be estimated for possible values • Probability distribution of values defined within bounds • Evaluate probability for query answers • Probabilistic queries give a correct answer, instead of a potentially incorrect answer

  8. Rest of the paper… • Notation & Uncertainty Model • Classification of Probabilistic Queries • Evaluating Probabilistic Queries • Quality of Probabilistic Queries • Object Refreshment Policies • Experimental Results

  9. Notation • T : A set of DB objects (e.g. sensors) • a : Dynamic attribute (e.g. pressure) • Ti : ith object of T • Ti.a(t) : Value of ‘a’ in ‘Ti’ at time ‘t’

  10. Uncertainty Model fi(x,t) – uncertainty pdf Ti.a(t) [li(t) ui(t)] Uncertainty Interval Ui(t) • Can be extended in ‘n’ dimensions

  11. Classification of Probabilistic Queries • Type of Result • Value-based: returns single value • E.g. Minimum query ([l,u], pdf) • Entity-based: returns set of objects • E.g. Range query ({(Ti, pi), pi>0}) • Aggregation • Non-Aggregate: query result for an object is independent of other objects • E.g. Range query • Aggregate: query result computed from set of objects • E.g. Nearest Neighbor query

  12. Classification of Probabilistic Queries • Query evaluation algorithms and quality metrics are developed for each class

  13. ENNQ algorithm…Projection, Pruning, Bounding & Evaluation

  14. ENNQ algorithm

  15. "Is reading of sensor i in range [l,u] ?" Quality of Probabilistic Result • Introduce a notion of “quality of answer” • Proposed metrics for different classes of queries • regular range query • "yes" or "no" with 100% • probabilistic query ERQ • yes with pi = 95%: OK • yes with pi = 5%: OK (95% it is not in [l, u]) • yes with pi = 50%: NOT OK (not certain!)

  16. Quality for Entity-Aggregate Queries "Which sensor, among n, has the minimum reading?" • Recall • Result set R = {(Ti, pi)} • e.g. {(T1, 30%), (T2, 40%), (T3, 30%)} • B is interval, bounding all possible values • e.g. minimum is somewhere in B = [10,20] • Our metrics for aggregate queries Min, Max, NN • objects cannot be treated independently as in ERQ metric • uniform distribution (in result set) is the worst case • metrics are based on entropy

  17. Quality for Entity-Aggregate Queries • H(X) entropy of random variable X (X1 ,…,Xn with p(X1) ,…, p(Xn)) • entropy is smallest (i.e., 0) iff  i : p(Xi) = 1 • entropy is largest (i.e., log2(n)) iff all Xi's are equally likely

  18. Improving Answer Quality • Is important to pick right update policies that will help improve answer quality • Global Choice • Glb_RR (pick random) • Local Choice • Loc_RR (pick random) • MaxUnc (heuristic chooses max. uncertainty interval ) • MinExpEntropy (heuristic choose object with minimum expected entropy)

  19. Experiments: Simulation Set-up • 1 server, 1000 sensors, limited network bandwidth, “Min” queries tested • Queries arrival is a Poisson distribution • Each query over a random set of 100 sensors

  20. Results

  21. Conclusions • Probabilistic Querying for handling inherent uncertainty in sensor DBs • Classification, Algorithms and Quality of Answer metrics for various query types • Very general model of uncertainty which makes the algorithms not directly implement-able in any sensor network • Besides, in order to achieve any reasonable energy-efficiency in sensor networks, application and network requirements that dictate sensor nodes to be awake have to be tightly coordinated. Especially in the case of multi-hop routing

  22. Outline for ‘Model Driven Data Acquisition for Sensor Networks’ • Introduction • Motivation for Model-Based Queries • Framework Concept • Model Example – Multivariate Gaussian • Algorithm • Resolving Model-Based Queries • Incorporating Dynamicity • Observation Plan / Cost model • Experiments • BBQ System • Results • Conclusions

  23. Motivation for Model-Based Queries • Declarative Queries adopted as key programming paradigm for large sensor nets • However, interpreting sensor nets as databases results in two major problems: • Misinterpretation of Data • Physically observable world is a set of continuous phenomenon in both time and space • Sensor readings are UNLIKELY to be random samples • Inefficient approximate queries • If sensor readings are not “true” values, need for quantifying uncertainty to provide reliable answers

  24. Motivation for Model-Based Queries • Paper Contribution: To incorporate statistical models of real-world processes into sensor net query processing architecture • Models help in: • Accounting for biases in spatial sampling • Identifying sensors providing faulty data • Extrapolating values for missing sensors

  25. Framework Concept • Goal: Given a query and model, to devise an efficient data acquisition plan to provide “best” possible answer • Major dependencies: • Correlations between sensors captured by the statistical model • Correlation between attributes for given sensor • Correlation between sensors for given attribute • Specific connectivity of the wireless network

  26. Framework ConceptObservation Plan parameters * Correlations in Value * Cost Differential

  27. Framework Concept

  28. Model Example – Multivariate Gaussian

  29. Resolving Model-Based Queries (Range Queries)

  30. Resolving Model-Based Queries(Value Queries) • To compute value of Xi with maximum error ‘e’ and confidence ‘1-delta’: • Compute mean of Xi (where o – observations) • As in range queries, find probability :

  31. Range Queries for Gaussian • Projection for Gaussian is simple – just drop unnecessary values from mean and variance matrix • The integral has to be computed.

  32. Incorporating Dynamicity • Use historical measurements to improve confidence of answers • Given pdf in time ‘t’ • Compute pdf at time ‘t+1’

  33. Incorporating Dynamicity • Assumption: Markovian Model • Dynamicity summarized by “transition model”

  34. Observation Plan / Cost Model • What is the cost of making ‘o’ observations? • C(o) = acquisition cost + transmission cost • Acquisition cost: constant for each attribute • Transmission cost: • Network graph • Edge weights (link quality) • Paths taken could be sub-optimal

  35. Observation Plan / Cost Model • A set of attributes (‘theta’) to observe are determined by computing expected benefit And finding… This, being similar to the traveling salesman’s problem, is best dealt with heuristic algorithms

  36. BBQ System • BBQ: A Tiny-Model Query System • Uses Multivariate Gaussians • Has 24 transition models – for different hour of day

  37. Results • Experiment: 11 sensors on a tree, 83000 measurements, 2/3 used for training and 1/3 for tests • Methodology • BBQ builds a model based on training data • One random query / hour taken – possible observations and model is updated • The answer is compared to the measured value • Compare with two other methods • TinyDB: Each query broadcasted over sensor networks using an overlay tree • Approximate-Caching: Base station maintains a view of the sensor readings

  38. Results

  39. Results

  40. Conclusion • Approximate queries can be well optimized, but model of physical phenomenon is needed • Defining an appropriate model is a challenge • The framework works well for “fairly steady” sensor data values • Statistical model is largely static with refinements to the model based on incoming queries and observations made as a result

More Related