1 / 56

In-Network Query Processing

In-Network Query Processing. Sam Madden CS294-1 9/30/03. Outline. TinyDB And demo! Aggregate Queries ACQP Break Adaptive Operator Placement …. Outline. TinyDB And demo! Aggregate Queries ACQP Break Adaptive Operator Placement …. Programming Sensor Nets Is Hard.

kimo
Download Presentation

In-Network Query Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. In-Network Query Processing Sam Madden CS294-1 9/30/03

  2. Outline • TinyDB • And demo! • Aggregate Queries • ACQP • Break • Adaptive Operator Placement • …

  3. Outline • TinyDB • And demo! • Aggregate Queries • ACQP • Break • Adaptive Operator Placement • …

  4. Programming Sensor Nets Is Hard • Months of lifetime required from small batteries • 3-5 days naively; can’t recharge often • Interleave sleep with processing • Lossy, low-bandwidth, short range communication • Nodes coming and going • Multi-hop • Remote, zero administration deployments • Highly distributed environment • Limited Development Tools • Embedded, LEDs for Debugging! High-Level Abstraction Is Needed!

  5. A Solution: Declarative Queries • Users specify the data they want • Simple, SQL-like queries • Using predicates, not specific addresses • Our system: TinyDB • Challenge is to provide: • Expressive & easy-to-use interface • High-level operators • “Transparent Optimizations” that many programmers would miss • Sensor-net specific techniques • Power efficient execution framework

  6. TinyDB Demo

  7. SELECT AVG(temp) WHERE light > 400 T:1, AVG: 225 T:2, AVG: 250 Queries Results Aggavg(temp) Name: temp Time to sample: 50 uS Cost to sample: 90 uJ Calibration Table: 3 Units: Deg. F Error: ± 5 Deg F Get f: getTempFunc()… got(‘temp’) get (‘temp’) Tables Samples getTempFunc(…) TinyDB Architecture Multihop Network • Schema: • “Catalog” of commands & attributes Query Processor ~10,000 Lines Embedded C Code ~5,000 Lines (PC-Side) Java ~3200 Bytes RAM (w/ 768 byte heap) ~58 kB compiled code (3x larger than 2nd largest TinyOS Program) Filterlight > 400 Schema TinyOS TinyDB

  8. Declarative Queries for Sensor Networks • Examples: SELECT nodeid, nestNo, light FROM sensors WHERE light > 400 EPOCH DURATION 1s “Find the sensors in bright nests.” Sensors 1

  9. 2 SELECT AVG(sound) FROM sensors EPOCH DURATION 10s • SELECT region, CNT(occupied) AVG(sound) • FROM sensors • GROUP BY region • HAVINGAVG(sound) > 200 • EPOCH DURATION 10s 3 Regions w/ AVG(sound) > 200 Aggregation Queries “Count the number occupied nests in each loud region of the island.”

  10. Benefits of Declarative Queries • Specification of “whole-network” behavior • Simple, safe • Complex behavior via multiple queries, app logic • Optimizable • Exploit (non-obvious) interactions • E.g.: • ACQP operator ordering, Adaptive join operator placement, Lifetime selection, Topology selection • Versus other approaches, e.g., Diffusion • Black box ‘filter’ operators • Intanagonwiwat ,  “Directed Diffusion”, Mobicomm 2000

  11. Outline • TinyDB • And demo! • Aggregate Queries • ACQP • Break • Adaptive Operator Placement • …

  12. Tiny Aggregation (TAG) • Not in today’s reading • In-network processing of aggregates • Common data analysis operation • Aka gather operation or reduction in || programming • Communication reducing • Operator dependent benefit • Exploit query semantics to improve efficiency! Madden, Franklin, Hellerstein, Hong. Tiny AGgregation (TAG), OSDI 2002.

  13. Q:SELECT … A Q Q R:{…} R:{…} Q B C Q Q Q Q R:{…} D R:{…} Q R:{…} Q Q Q F E Q Query Propagation Via Tree-Based Routing • Tree-based routing • Used in: • Query delivery • Data collection • Topology selection is important; e.g. • Krishnamachari, DEBS 2002, Intanagonwiwat, ICDCS 2002, Heidemann, SOSP 2001 • LEACH/SPIN, Heinzelman et al. MOBICOM 99 • SIGMOD 2003 • Continuous process • Mitigates failures

  14. 1 2 3 4 5 Basic Aggregation • In each epoch: • Each node samples local sensors once • Generates partial state record (PSR) • local readings • readings from children • Outputs PSR during assigned comm. interval • Communication scheduling for power reduction • At end of epoch, PSR for whole network output at root • New result on each successive epoch • Extras: • Predicate-based partitioning via GROUP BY

  15. 1 2 3 4 5 Illustration: Aggregation SELECT COUNT(*) FROM sensors Sensor # <- Time 1

  16. 1 2 3 4 5 Illustration: Aggregation SELECT COUNT(*) FROM sensors Sensor # 2 <- Time

  17. 1 2 3 4 5 Illustration: Aggregation SELECT COUNT(*) FROM sensors Sensor # 1 3 <- Time

  18. 1 2 3 4 5 Illustration: Aggregation SELECT COUNT(*) FROM sensors 5 Sensor # <- Time

  19. 1 2 3 4 5 Illustration: Aggregation SELECT COUNT(*) FROM sensors Sensor # <- Time 1

  20. Aggregation Framework • As in extensible databases, TAG supports any aggregation function conforming to: Aggn={finit, fmerge, fevaluate} Finit {a0}  <a0> Fmerge {<a1>,<a2>}  <a12> Fevaluate {<a1>}  aggregate value Partial State Record (PSR) Example: Average AVGinit {v}  <v,1> AVGmerge {<S1, C1>, <S2, C2>}  < S1 + S2 , C1 + C2> AVGevaluate{<S, C>}  S/C Restriction: Merge associative, commutative

  21. Types of Aggregates • SQL supports MIN, MAX, SUM, COUNT, AVERAGE • Any function over a set can be computed via TAG • In network benefit for many operations • E.g. Standard deviation, top/bottom N, spatial union/intersection, histograms, etc. • Compactness of PSR

  22. Taxonomy of Aggregates • TAG insight: classify aggregates according to various functional properties • Yields a general set of optimizations that can automatically be applied Drives an API!

  23. Partial State • Growth of PSR vs. number of aggregated values (n) • Algebraic: |PSR| = 1 (e.g. MIN) • Distributive: |PSR| = c (e.g. AVG) • Holistic: |PSR| = n (e.g. MEDIAN) • Unique: |PSR| = d (e.g. COUNT DISTINCT) • d = # of distinct values • Content Sensitive: |PSR| < n (e.g. HISTOGRAM) “Data Cube”, Gray et. al

  24. Holistic Unique Distributive Algebraic Benefit of In-Network Processing Simulation Results 2500 Nodes 50x50 Grid Depth = ~10 Neighbors = ~20 Uniform Dist over [0,100] • Aggregate & depth dependent benefit!

  25. Outline • TinyDB • And demo! • Aggregate Queries • ACQP • Break • Adaptive Operator Placement • …

  26. Acquisitional Query Processing (ACQP) Traditional DBMS: processes data already in the system Acquisitional DBMS: generates the data in the system! An acquisitional query processor controls • when, • where, • and with what frequency data is collected Versus traditional systems where data is provided a priori

  27. ACQP: What’s Different? • Basic Acquisitional Processing • Continuous queries, with rates or lifetimes • Events for asynchronous triggering • Avoiding Acquisition Through Optimization • Sampling as a query operator • Choosing Where to Sample via Co-acquisition • Index-like data structures • Acquiring data from the network • Prioritization, summary, and rate control

  28. Lifetime Queries • Lifetime vs. sample rate SELECT … EPOCH DURATION 10 s SELECT … LIFETIME 30 days • Extra: Allow a MAX SAMPLE PERIOD • Discard some samples • Sampling cheaper than transmitting

  29. (Single Node) Lifetime Prediction SELECT nodeid, light LIFETIME 24 Weeks

  30. Correct ordering (unless pred1 is very selective and pred2 is not): Traditional DBMS (pred2) (pred1) (pred1) Costly mag light (pred2) ACQP (pred2) (pred1) light Cheap mag mag light Operator Ordering: Interleave Sampling + Selection SELECT light, mag FROM sensors WHERE pred1(mag) AND pred2(light) EPOCH DURATION 1s At 1 sample / sec, total power savings could be as much as 3.5mW  Comparable to processor! • E(sampling mag) >> E(sampling light) • 1500 uJ vs. 90 uJ

  31. WINMAX (mag>x) Traditional DBMS WINMAX mag (mag>x) ACQP (light > MAX) light mag light Exemplary Aggregate Pushdown SELECT WINMAX(light,8s,8s) FROM sensors WHERE mag > x EPOCH DURATION 1s • Novel, general pushdown technique • Mag sampling is the most expensive operation!

  32. In-network storage Placement subject to optimization Event Based Processing • Epochs are synchronous • Might want to issue queries in response to asynchronous events • Avoid unneccessary “polling” CREATE TABLE birds(uint16 cnt) SIZE 1 CIRCULAR ON EVENT bird-enter(…) SELECT b.cnt+1 FROM birds AS b OUTPUT INTO b ONCE

  33. Attribute Driven Network Construction • Goal: co-acquisition -- sensors that sample together route together • Observation: queries often over limited area • Or some other subset of the network • E.g. regions with light value in [10,20] • Idea: build network topology such that like-valued nodes route through each other • For range queries • Relatively static attributes (e.g. location) • Maintenance Issues

  34. SELECT … WHERE a > 5 AND a < 12 • Precomputed intervals • Reported by children as they join 4 a:[1,10] a:[20,40] a:[7,15] 1 2 3 Excluded from query broadcast and result collection! Tracking Co-Acquisition Via Semantic Routing Trees • Idea: send range queries only to participating nodes • Parents maintain ranges of descendants

  35. [7,15] [1,10] [20,40] Other selection policies in paper! 1 2 3 4 [3,6] Parent Selection for SRTs •Idea: Node picks parent whose ancestors’ interval most overlap its descendants’ interval 0 [3,6]  [1,10] = [3,6] [3,6]  [7,15] = ø [3,6]  [20,40] = ø

  36. Simulation Result

  37. Outline • TinyDB • And demo! • Aggregate Queries • ACQP • Break • Adaptive Operator Placement • …

  38. Outline • TinyDB • And demo! • Aggregate Queries • ACQP • Break • Adaptive Operator Placement • …

  39. Rate A Rate B Adaptive & Decentralized Operator Placement • IPSN 2003 Paper • Main Idea • Place operators near data sources • Greater operator rate  Closer placement • For each operator • Explore candidate neighbors • Migrate to lower cost placements • Via extra messages Proper placement depends on path lengths and relative rates!

  40. “Adaptivity” in Databases • Adaptivity : changing query plans on the fly • Typically at the physical level • Where the plan runs • Ordering of operators • Instantiations of operators, e.g. hash join vs merge join • Non-traditional • Conventionally, complete plans are built prior to execution • Using cost estimates (collected from history) • Important in volatile or long running environments • Where a priori estimates are unlikely to be good • E.g., sensor networks

  41. Adaptivity for Operator Placement • Adaptivity comes at a cost • Extra work on each operator, each tuple • In a DBMS, processing per tuple is small • 100’s of instructions per operator • Unless you have to hit the disk! • Costs in this case? • Extra communication hurts • Finding candidate placements (exploration) • Cost advertisements from local node • New costs from candidates • Moving state (migration) • Joins, windowed aggregates

  42. Do Benefits Justify Costs? • Not Evaluated! • 3x reduction on messages vs. external • Excluding exploration & migration costs • Seems somewhat implausible, especially given added complexity • Hard to make migration protocol work • Depends on ability to reliably quiesce child ops. • What else could you do? • Static placement

  43. Summary • Declarative QP • Simplify data collection in sensornets • In-network processing, query optimization for performance • Acquisitional QP • Focus on costs associated with sampling data • New challenge of sensornets, other streaming systems? • Adaptive Join Placement • In-network optimization • Some benefit, but practicality unclear • Operator pushdown still a good idea

  44. Open Problems • Many; a few: • In-network storage and operator placement • Dealing with heterogeneity • Dealing with loss • Need real implementations of many of these ideas • See me! (madden@cs.berkeley.edu)

  45. Questions / Discussion

  46. Making TinyDB REALLY Work • Berkeley Botanical Garden • First “real deployment” • Requirements: • At least 1 month unattended operation • Support for calibrated environmental sensors • Multi-hop routing • What we started with: • Limited power management, no time-synchronization • Motes crashed hard occasionally • Limited, relatively untested multihop routing

  47. Power Consumption in Sensornets • Waking current ~12mA • Fairly evenly spread between sensors, processor, radio • Sleeping current 20-100uA • Power consumption dominated by sensing, reception: • 1s Power up on Mica2Dot sensor board • Most mote apps use “always on” radio • Completely unstructured communication • Bad for battery life

  48. Why Not Use TDMA? • CSMA is very flexible: easy for new nodes to join • Reasonably scalable (relative to Bluetooth) • CSMA implemented, available • We wanted to build something that worked

  49. Power Management Approach Coarse-grained communication scheduling Epoch (10s -100s of seconds) Mote ID 1 … zzz … … zzz … 2 3 4 5 time 2-4s Waking Period

  50. Benefits / Drawbacks • Benefits • Can still use CSMA within waking period • No reservation required: new nodes can join easily! • Waking period duration is easily tunable • Depending on network size • Drawbacks • Longer waking time vs. TDMA? • Could stagger slots based on tree-depth • No “guaranteed” slot reservation • Nothing is guaranteed anyway

More Related