Querying Sensor Networks

Querying Sensor Networks Sam Madden UC Berkeley November 18th, 2002 @ Madison

Introduction • What are sensor networks? • Programming Sensor Networks Is Hard • Especially if you want to build a “real” application • Example: Vehicle tracking application took 2 grad students 2 weeks to build and hundreds of lines of code. • Declarative Queries Are Easy • And, can be faster and more robust than most applications! • Vehicle tracking query: took 2 minutes to write, worked just as well! SELECT MAX(mag) FROM sensors WHERE mag > thresh SAMPLE INTERVAL 64ms

Overview • Sensor Networks • Why Queries in Sensor Nets • TinyDB • Features • Demo • Focus: Tiny Aggregation • The Next Step

Device Capabilities • “Mica Motes” • 8bit, 4Mhz processor • Roughly a PC AT • 40kbit radio • Time to send 1 bit = 800 instrs • Reducing communication is good • 4KB RAM, 128K flash, 512K EEPROM • Sensor board expansion slot • Standard board has light & temperature sensors, accelerometer, magnetometer, microphone, & buzzer • Other more powerful platforms exist • E.g. Sensoria WINS nodes • Trend towards smaller devices • “Smart Dust” – Kris Pister, et al.

Sensor Net Sample Apps Habitat Monitoring. Storm petrels on great duck island, microclimates on James Reserve. Earthquake monitoring in shake-test sites. • Vehicle detection: sensors dropped from UAV along a road, collect data about passing vehicles, relay data back to UAV. • Traditional monitoring apparatus.

Key Constraint: Power • Lifetime from One pair of AA batteries • 2-3 days at full power • 6 months at 2% duty cycle • Communication dominates cost • Because it takes so long (~30ms) to send / receive a message

TinyOS • Operating system from David Culler’s group at Berkeley • C-like programming environment • Provides messaging layer, abstractions for major hardware components • Split phase highly asynchronous, interrupt-driven programming model Hill, Szewczyk, Woo, Culler, & Pister. “Systems Architecture Directions for Networked Sensors.” ASPLOS 2000. See http://webs.cs.berkeley.edu/tos

00 10 21 10 11 12 22 20 21 Communication In Sensor Nets • Radio communication has high link-level losses • typically about 20% @ 5m • Newest versions of TinyOS provide link-level acknowledgments • No end-to-end acknowledgements • Ad-hoc neighbor discovery • Two major routing techniques: tree-based hierarchy and geographic A B C D F E

335 245 278 511 512 513 453 406 442 … … … • SELECT roomNo, AVG(volume) • FROM sensors • GROUP BY roomNo • HAVING AVG(volume) > 200 2 Rooms w/ volume > 200 Declarative Queries for Sensor Networks Light Temp Accel …. T-2 • Examples: SELECT nodeid, light FROM sensors WHERE light > 400 SAMPLE PERIOD 1s 1 T-1 T “epoch”

Declarative Benefits In Sensor Networks • Vastly simplifies execution for large networks • Since locations are described by predicates • Operations are over groups • Enables tolerance to faults • Since system is free to choose where and when operations happen • Data independence • System is free to choose where data lives, how it is represented

Computing In Sensor Nets Is Hard • Why? • Limited power (must optimize for it!) • Lossy communication • Zero administration • Limited processing capabilities, storage, bandwidth • In power-based optimization, we choose: • Where data is processed. • How data is routed • Exploit operator semantics! • Avoid dead nodes • How to order operators, sampling, etc. • What kinds of indices to apply, which data to prioritize …

TinyDB • A distributed query processor for networks of Mica motes • Available today! • Goal: Eliminate the need to write C code for most TinyOS users • Features • Declarative queries • Temporal + spatial operations • Multihop routing • In-network storage

Query {A,B,C,D,E,F} A {B,D,E,F} B C {D,E,F} D F E TinyDB @ 10000 Ft (Almost) All Queries are Continuous and Periodic • Written in SQL-Like Language With Extensions For : • Sample rate • Offline delivery • Temporal Aggregation

TinyDB Demo

Applications + Early Adopters • Some demo apps: • Network monitoring • Vehicle tracking • “Real” future deployments: • Environmental monitoring @ GDI (and James Reserve?) • Generic Sensor Kit • Building Monitoring Demo!

TinyDB Architecture (Per node) SelOperator AggOperator • TupleRouter: • Fetches readings (for ready queries) • Builds tuples • Applies operators • Deliver results (up tree) TupleRouter • AggOperator: • Combines local & neighbor readings Network ~10,000 Lines C Code ~5,000 Lines Java ~3200 Bytes RAM (w/ 768 byte heap) ~58 kB compiled code (3x larger than 2nd largest TinyOS Program) • SelOperator: • Filters readings Radio Stack Schema TinyAllloc • Schema: • “Catalog” of commands & attributes (more later) • TinyAlloc: • Reusable memory allocator!

TAG • In-network processing of aggregates • Aggregates are common operation • Reduces costs depending on type of aggregates • Focus on “spatial aggregation” (Versus “temporal aggregation”) • Exploitation of operator, functional semantics Tiny AGgregation (TAG), Madden, Franklin, Hellerstein, Hong. OSDI 2002 (to appear).

Aggregation Framework • As in extensible databases, we support any aggregation function conforming to: Aggn={fmerge, finit, fevaluate} Fmerge{<a1>,<a2>}  <a12> finit{a0}  <a0> Fevaluate{<a1>}  aggregate value (Merge associative, commutative!) Partial State Record (PSR) Just like parallel database systems – e.g. Bubba! Example: Average AVGmerge {<S1, C1>, <S2, C2>}  < S1 + S2 , C1 + C2> AVGinit{v}  <v,1> AVGevaluate{<S1, C1>}  S1/C1

A B C D F E Query Propagation Review SELECT AVG(light)…

1 2 3 4 5 Pipelined Aggregates Value from 2 produced at time t arrives at 1 at time (t+1) • After query propagates, during each epoch: • Each sensor samples local sensors once • Combines them with PSRs from children • Outputs PSR representing aggregate state in the previous epoch. • After (d-1) epochs, PSR for the whole tree output at root • d = Depth of the routing tree • If desired, partial state from top k levels could be output in kth epoch • To avoid combining PSRs from different epochs, sensors must cache values from children Value from 5 produced at time t arrives at 1 at time (t+3)

1 2 3 4 5 Illustration: Pipelined Aggregation SELECT COUNT(*) FROM sensors Depth = d

1 2 3 4 5 Illustration: Pipelined Aggregation SELECT COUNT(*) FROM sensors Epoch 1 1 Sensor # 1 1 1 Epoch # 1

Grouping • If query is grouped, sensors apply predicate on each epoch • PSRs tagged with group • When a PSR (with group) is received: • If it belongs to a stored group, merge with existing PSR • If not, just store it • At the end of each epoch, transmit one PSR per group

Group Eviction • Problem: Number of groups in any one iteration may exceed available storage on sensor • Solution: Evict! (Partial Preaggregation*) • Choose one or more groups to forward up tree • Rely on nodes further up tree, or root, to recombine groups properly • What policy to choose? • Intuitively: least popular group, since don’t want to evict a group that will receive more values this epoch. • Experiments suggest: • Policy matters very little • Evicting as many groups as will fit into a single message is good * Per-Åke Larson. Data Reduction by Partial Preaggregation. ICDE 2002.

TAG Advantages • In network processing reduces communication • Important for power and contention • Continuous stream of results • In the absence of faults, will converge to right answer • Lots of optimizations • Based on shared radio channel • Semantics of operators

Simulation Environment • Chose to simulate to allow 1000’s of nodes and control of topology, connectivity, loss • Java-based simulation & visualization for validating algorithms, collecting data. • Coarse grained event based simulation • Sensors arranged on a grid, radio connectivity by Euclidian distance • Communication model • Lossless: All neighbors hear all messages • Lossy: Messages lost with probability that increases with distance • Symmetric links • No collisions, hidden terminals, etc.

Simulation Result Simulation Results 2500 Nodes 50x50 Grid Depth = ~10 Neighbors = ~20 Some aggregates require dramatically more state!

Taxonomy of Aggregates • TAG insight: classify aggregates according to various functional properties • Yields a general set of optimizations that can automatically be applied

Optimization: Channel Sharing (“Snooping”) • Insight: Shared channel enables optimizations • Suppress messages that won’t affect aggregate • E.g., in a MAX query, sensor with value v hears a neighbor with value ≥ v, so it doesn’t report • Applies to all exemplary, monotonic aggregates • Can be applied to summary aggregates also if imprecision is allowed • Learn about query advertisements it missed • If a sensor shows up in a new environment, it can learn about queries by looking at neighbors messages. • Root doesn’t have to explicitly rebroadcast query!

Optimization: Hypothesis Testing • Insight: Root can provide information that will suppress readings that cannot affect the final aggregate value. • E.g. Tell all the nodes that the MIN is definitely < 50; nodes with value ≥ 50 need not participate. • Depends on monotonicity • How is hypothesis computed? • Blind guess • Statistically informed guess • Observation over first few levels of tree / rounds of aggregate

Experiment: Hypothesis Testing Uniform Value Distribution, Dense Packing, Ideal Communication

B C B B C C B B C C 1 A A A 1/2 1/2 A A Optimization: Use Multiple Parents • For duplicate insensitive aggregates • Or aggregates that can be expressed as a linear combination of parts • Send (part of) aggregate to all parents • Decreases variance • Dramatically, when there are lots of parents No splitting: E(count) = c * p Var(count) = c2 * p * (1-p) With Splitting: E(count) = 2 * c/2 * p Var(count) = 2 * (c/2)2 * p * (1-p)

With Splitting No Splitting Critical Link! Multiple Parents Results • Interestingly, this technique is much better than previous analysis predicted! • Losses aren’t independent! • Instead of focusing data on a few critical links, spreads data over many links

Fun Stuff • Sophisticated, sensor network specific aggregates • Temporal aggregates

Temporal Aggregates • TAG was about “spatial” aggregates • Inter-node, at the same time • Want to be able to aggregate across time as well • Two types: • Windowed: AGG(size,slide,attr) • Decaying: AGG(comb_func, attr) • Demo! size =4 slide =2 … R1 R2 R3 R4 R5 R6 …

Isobar Finding

TAG Summary • In-network query processing a big win for many aggregate functions • By exploiting general functional properties of operators, optimizations are possible • Requires new aggregates to be tagged with their properties • Up next: non-aggregate query processing optimizations – a flavor of things to come!

Laptops! Distributed DBs! Moore’s Law! Acquisitional Query Processing • Cynical question: what’s really different about sensor networks? • Low Power? • Lots of Nodes? • Limited Processing Capabilities?

Answer • Long running queries on physically embedded devices that control when and and with what frequency data is collected! • Versus traditional systems where data is provided a priori • Next: an acquisitional teaser…

ACQP: What’s Different? • How does the user control acquisition? • Specify rates or lifetimes • Trigger queries in response to events • Which nodes have relevant data? • Need a node index • Construct topology such that nodes that are queried together route together • What sensors should be sampled? • Treat sampling at an operator • Sample cheapest sensors first • Which samples should be transmitted? • Not all of them, if bandwidth or power is limited • Those that are most “valuable”?

Operator Ordering: Interleave Sampling + Selection SELECT light, mag FROM sensors WHERE pred1(mag) AND pred2(light) SAMPLE INTERVAL 1s At 1 sample / sec, total power savings could be as much as 4mW, same as the processor! • Energy cost of sampling mag >> cost of sampling light • 1500 uJ vs. 90 uJ • Correct ordering (unless pred1 is very selective): • 1. Sample light • Sample mag • Apply pred1 • Apply pred2 • 2. Sample light • Apply pred2 • Sample mag • Apply pred1 • 3. Sample mag • Apply pred1 • Sample light • Apply pred2

Querying Sensor Networks