1 / 62

Implementation and Research Issues in Query Processing for Wireless Sensor Networks

Implementation and Research Issues in Query Processing for Wireless Sensor Networks. Wei Hong Intel Research, Berkeley whong@intel-research.net. Sam Madden MIT madden@csail.mit.edu. Adapted by L.B. Declarative Queries. Programming Apps is Hard Limited power budget

garnet
Download Presentation

Implementation and Research Issues in Query Processing for Wireless Sensor Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Implementation and Research Issues in Query Processing for Wireless Sensor Networks Wei Hong Intel Research, Berkeley whong@intel-research.net Sam Madden MIT madden@csail.mit.edu Adapted by L.B.

  2. Declarative Queries • Programming Apps is Hard • Limited power budget • Lossy, low bandwidth communication • Require long-lived, zero admin deployments • Distributed Algorithms • Limited tools, debugging interfaces • Queries abstract away much of the complexity • Burden on the database developers • Users get: • Safe, optimizable programs • Freedom to think about apps instead of details

  3. TinyDB: Prototype declarativequery processor • Platform: Berkeley Motes + TinyOS • Continuous variant of SQL : TinySQL • Power and data-acquisition based in-network optimization framework • Extensible interface for aggregates, new types of sensors

  4. Sensor Network TinyDB Revisited SELECT MAX(mag) FROM sensors WHERE mag > thresh SAMPLE PERIOD 64ms • High level abstraction: • Data centric programming • Interact with sensor network as a whole • Extensible framework • Under the hood: • Intelligent query processing: query optimization, power efficient execution • Fault Mitigation: automatically introduce redundancy, avoid problem areas App Query, Trigger Data TinyDB

  5. Feature Overview • Declarative SQL-like query interface • Metadata catalog management • Multiple concurrent queries • Network monitoring (via queries) • In-network, distributed query processing • Extensible framework for attributes, commands and aggregates • In-network, persistent storage

  6. Architecture TinyDB GUI JDBC TinyDB Client API DBMS PC side 0 Mote side 0 TinyDB query processor 2 1 3 8 4 5 6 Sensor network 7

  7. Data Model • Entire sensor network as one single, infinitely-long logical table: sensors • Columns consist of all the attributes defined in the network • Typical attributes: • Sensor readings • Meta-data: node id, location, etc. • Internal states: routing tree parent, timestamp, queue length, etc. • Nodes return NULL for unknown attributes • On server, all attributes are defined in catalog.xml • Discussion: other alternative data models?

  8. Query Language (TinySQL) SELECT <aggregates>, <attributes> [FROM {sensors | <buffer>}] [WHERE <predicates>] [GROUP BY <exprs>] [SAMPLE PERIOD <const> | ONCE] [INTO <buffer>] [TRIGGER ACTION <command>]

  9. Comparison with SQL • Single table in FROM clause • Only conjunctive comparison predicates in WHERE and HAVING • No subqueries • No column alias in SELECT clause • Arithmetic expressions limited to column op constant • Only fundamental difference: SAMPLE PERIOD clause

  10. TinySQL Examples SELECT nodeid, nestNo, light FROM sensors WHERE light > 400 EPOCH DURATION 1s “Find the sensors in bright nests.” Sensors 1

  11. 2 SELECT AVG(sound) FROM sensors EPOCH DURATION 10s • SELECT region, CNT(occupied) AVG(sound) • FROM sensors • GROUP BY region • HAVINGAVG(sound) > 200 • EPOCH DURATION 10s 3 Regions w/ AVG(sound) > 200 TinySQL Examples (cont.) “Count the number occupied nests in each loud region of the island.”

  12. Event-based Queries • ON event SELECT … • Run query only when interesting events happens • Event examples • Button pushed • Message arrival • Bird enters nest • Analogous to triggers but events are user-defined

  13. Query over Stored Data • Named buffers in Flash memory • Store query results in buffers • Query over named buffers • Analogous to materialized views • Example: • CREATE BUFFER name SIZE x (field1 type1, field2 type2, …) • SELECT a1, a2 FROM sensors SAMPLE PERIOD d INTO name • SELECT field1, field2, … FROM name SAMPLE PERIOD d

  14. SELECT AVG(temp) WHERE light > 400 T:1, AVG: 225 T:2, AVG: 250 Queries Results Aggavg(temp) Name: temp Time to sample: 50 uS Cost to sample: 90 uJ Calibration Table: 3 Units: Deg. F Error: ± 5 Deg F Get f: getTempFunc()… got(‘temp’) get (‘temp’) Tables Samples getTempFunc(…) Inside TinyDB Multihop Network Query Processor ~10,000 Lines Embedded C Code ~5,000 Lines (PC-Side) Java ~3200 Bytes RAM (w/ 768 byte heap) ~58 kB compiled code (3x larger than 2nd largest TinyOS Program) Filterlight > 400 Schema TinyOS TinyDB

  15. Q:SELECT … A Q Q R:{…} R:{…} Q B C Q Q Q Q R:{…} D R:{…} Q R:{…} Q Q Q F E Q Tree-based Routing • Tree-based routing • Used in: • Query delivery • Data collection • In-network aggregation • Relationship to indexing?

  16. Sensor Network Research • Very active research area • Can’t summarize it all • Focus: database-relevant research topics • Some outside of Berkeley • Other topics that are itching to be scratched • But, some bias towards work that we find compelling

  17. Topics • In-network aggregation • Acquisitional Query Processing • Heterogeneity • Intermittent Connectivity • In-network Storage • Statistics-based summarization and sampling • In-network Joins • Adaptivity and Sensor Networks • Multiple Queries

  18. Topics • In-network aggregation • Acquisitional Query Processing • Heterogeneity • Intermittent Connectivity • In-network Storage • Statistics-based summarization and sampling • In-network Joins • Adaptivity and Sensor Networks • Multiple Queries

  19. Tiny Aggregation (TAG) • In-network processing of aggregates • Common data analysis operation • Aka gather operation or reduction in || programming • Communication reducing • Operator dependent benefit • Across nodes during same epoch • Exploit query semantics to improve efficiency! Madden, Franklin, Hellerstein, Hong. Tiny AGgregation (TAG), OSDI 2002.

  20. 1 2 3 4 5 Basic Aggregation • In each epoch: • Each node samples local sensors once • Generates partial state record (PSR) • local readings • readings from children • Outputs PSR during assigned comm. interval • At end of epoch, PSR for whole network output at root • New result on each successive epoch • Extras: • Predicate-based partitioning via GROUP BY

  21. 1 2 3 4 5 Illustration: Aggregation SELECT COUNT(*) FROM sensors Interval 4 Sensor # Epoch Interval # 1

  22. 1 2 3 4 5 Illustration: Aggregation SELECT COUNT(*) FROM sensors Interval 3 Sensor # 2 Interval #

  23. 1 2 3 4 5 Illustration: Aggregation SELECT COUNT(*) FROM sensors Interval 2 Sensor # 1 3 Interval #

  24. 1 2 3 4 5 Illustration: Aggregation SELECT COUNT(*) FROM sensors Interval 1 5 Sensor # Interval #

  25. 1 2 3 4 5 Illustration: Aggregation SELECT COUNT(*) FROM sensors Interval 4 Sensor # Interval # 1

  26. Aggregation Framework • As in extensible databases, TinyDB supports any aggregation function conforming to: Aggn={finit, fmerge, fevaluate} Finit {a0}  <a0> Fmerge {<a1>,<a2>}  <a12> Fevaluate {<a1>}  aggregate value Partial State Record (PSR) Example: Average AVGinit {v}  <v,1> AVGmerge {<S1, C1>, <S2, C2>}  < S1 + S2 , C1 + C2> AVGevaluate{<S, C>}  S/C Restriction: Merge associative, commutative

  27. Taxonomy of Aggregates • TAG insight: classify aggregates according to various functional properties • Yields a general set of optimizations that can automatically be applied Drives an API!

  28. R R R B B B C C C c/n c c/n A A A n = 2 Use Multiple Parents • Use graph structure • Increase delivery probability with no communication overhead • For duplicate insensitive aggregates, or • Aggs expressible as sum of parts • Send (part of) aggregate to all parents • In just one message, via multicast • Assuming independence, decreases variance SELECT COUNT(*) # of parents = n E(cnt) = n * (c/n * p2) Var(cnt) = n * (c/n)2 * p2 * (1 – p2) = V/n P(link xmit successful) = p P(success from A->R) = p2 E(cnt) = c * p2 Var(cnt) = c2 * p2 * (1 – p2) V

  29. With Splitting No Splitting Critical Link! Multiple Parents Results • Better than previous analysis expected! • Losses aren’t independent! • Insight: spreads data over many links

  30. Acquisitional Query Processing (ACQP) • TinyDB acquires AND processes data • Could generate an infinite number of samples • An acqusitional query processor controls • when, • where, • and with what frequency data is collected! • Versus traditional systems where data is provided a priori Madden, Franklin, Hellerstein, and Hong. The Design of An Acqusitional Query Processor. SIGMOD, 2003.

  31. ACQP: What’s Different? • How should the query be processed? • Sampling as a first class operation • How does the user control acquisition? • Rates or lifetimes • Event-based triggers • Which nodes have relevant data? • Index-like data structures • Which samples should be transmitted? • Prioritization, summary, and rate control

  32. Correct ordering (unless pred1 is very selective and pred2 is not): Traditional DBMS (pred2) (pred1) (pred1) Costly mag light (pred2) ACQP (pred2) (pred1) light Cheap mag mag light Operator Ordering: Interleave Sampling + Selection SELECT light, mag FROM sensors WHERE pred1(mag) AND pred2(light) EPOCH DURATION 1s At 1 sample / sec, total power savings could be as much as 3.5mW  Comparable to processor! • E(sampling mag) >> E(sampling light) • 1500 uJ vs. 90 uJ

  33. WINMAX (mag>x) Traditional DBMS WINMAX mag (mag>x) ACQP (light > MAX) light mag light Exemplary Aggregate Pushdown SELECT WINMAX(light,8s,8s) FROM sensors WHERE mag > x EPOCH DURATION 1s • Novel, general pushdown technique • Mag sampling is the most expensive operation!

  34. Topics • In-network aggregation • Acquisitional Query Processing • Heterogeneity • Intermittent Connectivity • In-network Storage • Statistics-based summarization and sampling • In-network Joins • Adaptivity and Sensor Networks • Multiple Queries

  35. Heterogeneous Sensor Networks • Leverage small numbers of high-end nodes to benefit large numbers of inexpensive nodes • Still must be transparent and ad-hoc • Key to scalability of sensor networks • Interesting heterogeneities • Energy: battery vs. outlet power • Link bandwidth: Chipcon vs. 802.11x • Computing and storage: ATMega128 vs. Xscale • Pre-computed results • Sensing nodes vs. QP nodes

  36. Computing Heterogeneity with TinyDB • Separate query processing from sensing • Provide query processing on a small number of nodes • Attract packets to query processors based on “service value” • Compare the total energy consumption of the network • No aggregation • All aggregation • Opportunistic aggregation • HSN proactive aggregation Mark Yarvis and York Liu, Intel’s Heterogeneous Sensor Network Project,ftp://download.intel.com/research/people/HSN_IR_Day_Poster_03.pdf.

  37. 5x7 TinyDB/HSN Mica2 Testbed

  38. 11% aggregators achieve 72% of max data reduction Optimal placement 2/3 distance from sink. Data Packet Saving • How many aggregators are desired? • Does placement matter?

  39. Mobile GTWY Occasionally Connected Sensornets internet TinyDB Server GTWY TinyDB QP Mobile GTWY Mobile GTWY GTWY GTWY TinyDB QP TinyDB QP

  40. Occasionally Connected Sensornets Challenges • Networking support • Tradeoff between reliability, power consumption and delay • Data custody transfer: duplicates? • Load shedding • Routing of mobile gateways • Query processing • Operation placement: in-network vs. on mobile gateways • Proactive pre-computation and data movement • Tight interaction between networking and QP Fall, Hong and Madden, Custody Transfer for Reliable Delivery in Delay Tolerant Networks, http://www.intel-research.net/Publications/Berkeley/081220030852_157.pdf.

  41. Distributed In-network Storage • Collectively, sensornets have large amounts of in-network storage • Good for in-network consumption or caching • Challenges • Distributed indexing for fast query dissemination • Resilience to node or link failures • Graceful adaptation to data skews • Minimizing index insertion/maintenance cost

  42. Functionality Efficient range query for multidimensional data. Approaches Divide sensor field into bins. Locality preserving mapping from m-d space to geographic locations. Use geographic routing such as GPSR. Assumptions Nodes know their locations and network boundary No node mobility E2= <0.6, 0.7> E1 = <0.7, 0.8> Q1=<.5-.7, .5-1> Example: DIM Xin Li, Young Jin Kim, Ramesh Govindan and Wei Hong, Distributed Index for Multi-dimentional Data (DIM) in Sensor Networks, SenSys 2003.

  43. Statistical Techniques • Approximations, summaries, and sampling based on statistics and statistical models • Applications: • Limited bandwidth and large number of nodes -> data reduction • Lossiness -> predictive modeling • Uncertainty -> tracking correlations and changes over time • Physical models -> improved query answering

  44. Correlated Attributes • Data in sensor networks is correlated; e.g., • Temperature and voltage • Temperature and light • Temperature and humidity • Temperature and time of day • etc.

  45. IDSQ • Idea: task sensors in order of best improvement to estimate of some value: • Choose leader(s) • Suppress subordinates • Task subordinates, one at a time • Until some measure of goodness (error bound) is met • E.g. “Mahalanobis Distance” -- Accounts for correlations in axes, tends to favor minimizing principal axis See “Scalable Information-Driven Sensor Querying and Routing for ad hoc Heterogeneous Sensor Networks.” Chu, Haussecker and Zhao. Xerox TR P2001-10113. May, 2001.

  46. Graphical Representation S1 S2 Residual 2 Residual 1 Preferred because it reduces error along principal axis Area of residuals is equal Model location estimate as a point with 2-dimensional Gaussian uncertainty. Principal Axis

  47. MQSN: Model-based Probabilistic Querying over Sensor Networks Joint work with Amol Desphande, Carlos Guestrin, and Joe Hellerstein Query Processor Model 1 3 4 2 5 6 7 8 9

  48. Probabilistic Query select NodeID, Temp ± 0.1C where NodeID in [1..9] with conf(0.95) Observation Plan [Temp, 3], [Temp, 9] MQSN: Model-based Probabilistic Querying over Sensor Networks Query Processor Model Consult Model 1 3 4 2 5 6 7 8 9

  49. Probabilistic Query select NodeID, Temp ± 0.1C where NodeID in [1..9] with conf(0.95) Observation Plan [Temp, 3], [Temp, 9] MQSN: Model-based Probabilistic Querying over Sensor Networks Query Processor Model Consult Model 1 3 4 2 5 6 7 8 9

  50. Data [Temp, 3] = …, [Temp, 9] = … MQSN: Model-based Probabilistic Querying over Sensor Networks Query Results Query Processor Model Update Model 1 3 4 2 5 6 7 8 9

More Related