1.18k likes | 1.37k Views
Querying Sensor Networks. Sam Madden UC Berkeley December 13 th , New England Database Seminar. TinyDB Introduction. What is a sensor network? Programming sensor nets is hard! Declarative queries are easy TinyDB : In-network processing via declarative queries Example:
E N D
Querying Sensor Networks Sam Madden UC Berkeley December 13th, New England Database Seminar
TinyDB Introduction • What is a sensor network? • Programming sensor nets is hard! • Declarative queries are easy • TinyDB: In-network processing via declarative queries • Example: • Vehicle tracking application: 2 weeks for 2 students • Vehicle tracking query: took 2 minutes to write, worked just as well! SELECT nodeid FROM sensors WHERE mag > thresh EPOCH DURATION 64ms
Overview • Sensor Networks • Why Queries in Sensor Nets • TinyDB • Features • Demo • Focus: Acquisitional Query Processing
Overview • Sensor Networks • Why Queries in Sensor Nets • TinyDB • Features • Demo • Focus: Acquisitional Query Processing
Device Capabilities • “Mica Motes” • 8bit, 4Mhz processor • Roughly a PC AT • 40kbit CSMA radio • 4KB RAM, 128K flash, 512K EEPROM • Sensor board expansion slot • Standard board has light & temperature sensors, accelerometer, magnetometer, microphone, & buzzer • Other more powerful platforms exist • E.g. UCLA WINS, Medusa, MIT Cricket, Princeton Zebranet • Trend towards smaller devices • “Smart Dust” – Kris Pister, et al.
Sensor Net Sample Apps Habitat Monitoring. Storm petrels on great duck island, microclimates on James Reserve. Earthquake monitoring in shake-test sites. • Vehicle detection: sensors dropped from UAV along a road, collect data about passing vehicles, relay data back to UAV. • Traditional monitoring apparatus.
Metric: Communication • Lifetime from one pair of AA batteries • 2-3 days at full power • 6 months at 2% duty cycle • Communication dominates cost • < few mS to compute • 30mS to send message • Our metric: communication!
TinyOS • Operating system from David Culler’s group at Berkeley • C-like programming environment • Provides messaging layer, abstractions for major hardware components • Split phase highly asynchronous, interrupt-driven programming model Hill, Szewczyk, Woo, Culler, & Pister. “Systems Architecture Directions for Networked Sensors.” ASPLOS 2000. See http://webs.cs.berkeley.edu/tos
A B C D F E Communication In Sensor Nets • Radio communication has high link-level losses • typically about 20% @ 5m • Ad-hoc neighbor discovery • Tree-based routing
Overview • Sensor Networks • Why Queries in Sensor Nets • TinyDB • Features • Demo • Acquisitional Query Processing
Declarative Queries for Sensor Networks Sensors • Examples: SELECT nodeid, light FROM sensors WHERE light > 400 EPOCH DURATION 1s 1
SELECT roomNo, AVG(sound) • FROM sensors • GROUP BY roomNo • HAVINGAVG(sound) > 200 • EPOCH DURATION 10s 3 • SELECTAVG(sound) • FROM sensors • EPOCH DURATION 10s 2 Rooms w/ sound > 200 Aggregation Queries
Declarative Benefits In Sensor Networks • Reduces Complexity • Locations as predicates • Operations are over groups • Fault Tolerance • Control of when & where • Data independence • Control of representation & storage • Indices, join location, materialization points, RAM vs EEPROM, etc.
Computing In Sensor Nets Is Hard • Limited power • Power-based query optimization • Routing indices • In-network computation • Exploitation of operator semantics (TAG, OSDI 2002) • Lossy, low-bandwidth communication • In-network computation • Caching, retransmission, etc. • Data prioritization • Remote, zero administration, long lived deployments • Ad-hoc (fault-tolerant) networking • Lifetime estimation • Semantically aware routing (TAG) • Limited processing capabilities, storage
Overview • Sensor Networks • Why Queries in Sensor Nets • TinyDB • Features • Demo • Focus: Tiny Aggregation • The Next Step
TinyDB • A distributed query processor for networks of Mica motes • Available today! • Goal: Eliminate the need to write C code for most TinyOS users • Features • Declarative queries • Temporal + spatial operations • Multihop routing • In-network storage
Query {A,B,C,D,E,F} A {B,D,E,F} B C {D,E,F} D F E TinyDB @ 10000 Ft (Almost) All Queries are Continuous and Periodic • Written in SQL-Like Language With Extensions For : • Sample rate • Offline delivery • Temporal Aggregation
Applications + Early Adopters • Some demo apps: • Network monitoring • Vehicle tracking • “Real” future deployments: • Environmental monitoring @ GDI (and James Reserve?) • Generic Sensor Kit • Building Monitoring Demo!
Benefit of TinyDB SELECT COUNT(light) SAMPLE PERIOD 4s • Cost metric = #msgs • 16 nodes • 150 Epochs • In-net loss rates: 5% • Centralized loss: 15% • Network depth: 4
TinyDB Architecture (Per node) SelOperator AggOperator • TupleRouter: • Fetches readings (for ready queries) • Builds tuples • Applies operators • Deliver results (up tree) TupleRouter • AggOperator: • Combines local & neighbor readings Network ~10,000 Lines C Code ~5,000 Lines Java ~3200 Bytes RAM (w/ 768 byte heap) ~58 kB compiled code (3x larger than 2nd largest TinyOS Program) • SelOperator: • Filters readings Radio Stack Schema TinyAllloc • Schema: • “Catalog” of commands & attributes (more later) • TinyAlloc: • Reusable memory allocator!
Catalog & Schema Manager • Attribute & Command IF • Components register attributes and commands they support • Commands implemented via wiring • Attributes fetched via accessor command • Catalog API allows local and remote queries over known attributes / commands. • Sensor specific metadata • Power to access attributes • Time to access attribute
Overview • Sensor Networks • Why Queries in Sensor Nets • TinyDB • Features • Demo • Acquisitional Query Processing
Laptops! Distributed DBs! Moore’s Law! Acquisitional Query Processing • Cynical question: what’s really different about sensor networks? • Low Power? • Lots of Nodes? • Limited Processing Capabilities? Being a little bit facetious, but…
Answer • Long running queries on physically embedded devices that control when and and with what frequency data is collected! • Versus traditional systems where data is provided a priori
ACQP: What’s Different? • How does the user control acquisition? • Rates or lifetimes. • Event-based triggers • How should the query be processed? • Sampling as an operator! • Events as joins • Which nodes have relevant data? • Semantic Routing Tree • Nodes that are queried together route together • Which samples should be transmitted? • Pick most “valuable”?
ACQP • How does the user control acquisition? • Rates or lifetimes. • Event-based triggers • How should the query be processed? • Sampling as an operator! • Events as joins • Which nodes have relevant data? • Semantic Routing Tree • Nodes that are queried together route together • Which samples should be transmitted? • Pick most “valuable”?
Implies not all data is xmitted Lifetime Queries • Lifetime vs. sample rate SELECT … LIFETIME 30 days SELECT … LIFETIME 10 days MIN SAMPLE INTERVAL 1s
Processing Lifetimes • At root • Compute SAMPLE PERIOD that satisfies lifetime • If it exceeds MIN SAMPLE PERIOD (MSP), use MSP and compute transmission rate • At other nodes • Use root’s values or slower • Root = bottleneck • Multiple roots? • Adaptive roots?
In-network storage Subject to optimization Event Based Processing • ACQP – want to initiate queries in response to events CREATE BUFFER birds(uint16 cnt) SIZE 1 ON EVENT bird-enter(…) SELECT b.cnt+1 FROM birds AS b OUTPUT INTO b ONCE
More Events ON EVENT bird_detect(loc) AS bd SELECT AVG(s.light), AVG(s.temp) FROM sensors AS s WHERE dist(bd.loc,s.loc) < 10m SAMPLE PERIOD 1s for 10 [Coming soon!]
ACQP • How does the user control acquisition? • Rates or lifetimes. • Event-based triggers • How should the query be processed? • Sampling as an operator! • Events as joins • Which nodes have relevant data? • Semantic Routing Tree • Nodes that are queried together route together • Which samples should be transmitted? • Pick most “valuable”?
Operator Ordering: Interleave Sampling + Selection SELECT light, mag FROM sensors WHERE pred1(mag) AND pred2(light) SAMPLE INTERVAL 1s At 1 sample / sec, total power savings could be as much as 4mW, same as the processor! • E(mag) >> E(light) • 1500 uJ vs. 90 uJ • Possible orderings: • 1. Sample light • Sample mag • Apply pred1 • Apply pred2 • 2. Sample light • Apply pred2 • Sample mag • Apply pred1 • 3. Sample mag • Apply pred1 • Sample light • Apply pred2
Optimizing in ACQP • Sampling = “expensive predicate” • Some subtleties: • Which predicate to “charge”? • Can’t operate without samples • Solution: • Treat sampling as a separate task • Build a partial order • Solve for cheapest schedule using series-parallel scheduling algorithm • Monma & Sidney, 1979, as in Ibaraki & Kameda, TODS, 1984, or Hellerstein, TODS, 1998.
Exemplary Aggregate Pushdown SELECT WINMAX(light,8s,8s) FROM sensors WHERE mag > x SAMPLE INTERVAL 1s Unless > x is very selective, correct ordering is: Sample light Check if it’s the maximum If it is: Sample mag Check predicate If satisfied, update maximum
d t d/2 d Event-Join Duality • Problem: multiple outstanding queries (lots of samples) ON EVENTE(nodeid) SELECT a FROM sensors AS s WHERE s.nodeid = e.nodeid SAMPLE INTERVAL d FOR k • High event frequency → Use Rewrite • Rewrite problem: phase alignment! SELECT s.a FROM sensors AS s, events AS e WHERE s.nodeid = e.nodeid AND e.type = E AND s.time – e.time < k AND s.time > e.time SAMPLE INTERVAL d • Solution: subsample
ACQP • How does the user control acquisition? • Rates or lifetimes. • Event-based triggers • How should the query be processed? • Sampling as an operator! • Events as joins • Which nodes have relevant data? • Semantic Routing Tree • Nodes that are queried together route together • Which samples should be transmitted? • Pick most “valuable”?
Attribute Driven Topology Selection • Observation: internal queries often over local area • Or some other subset of the network • E.g. regions with light value in [10,20] • Idea: build topology for those queries based on values of range-selected attributes • For range queries • Relatively static trees • Maintenance Cost
Attribute Driven Query Propagation SELECT … WHERE a > 5 AND a < 12 Precomputed intervals = Semantic Routing Tree (SRT) 4 [1,10] [20,40] [7,15] 1 2 3
Attribute Driven Parent Selection Even without intervals, expect that sending to parent with closest value will help 1 2 3 [1,10] [20,40] [7,15] [3,6] [1,10] = [3,6] [3,7] [7,15] = ø [3,7] [20,40] = ø 4 [3,6]
Simulation Result ~14% Reduction
ACQP • How does the user control acquisition? • Rates or lifetimes. • Event-based triggers • How should the query be processed? • Sampling as an operator! • Events as joins • Which nodes have relevant data? • Semantic Routing Tree • Nodes that are queried together route together • Which samples should be transmitted? • Pick most “valuable”?
Adaptive = 2x Successful Xmissions Adaptive Rate Control
Delta Encoding • Must pick most valuable data • How? • Domain Dependent • E.g., largest, average, shape preserving, frequency preserving, most samples, etc. • Simple idea for time-series: order biggest-change-first
Choosing Data To Send • Score each item • Send largest score • Out of order -> Priority Queue • Discard / aggregate when full Time Value t=4 t=5 [5,2.5] [1,2] [2,6] [3,15] [4,1] [5,4]
Choosing Data To Send t=1 [1,2]
Choosing Data To Send t=5 [1,2] |2-15| = 13 [2,6] [3,15] [4,1] |2-4| = 2 |2-6| = 4
Choosing Data To Send t=5 [3,15] [1,2] [2,6] [4,1] |15-4| = 11 |2-6| = 4
Choosing Data To Send t=5 [3,15] [1,2] [4,1] [2,6]