850 likes | 870 Views
Explore the challenges and solutions of processing queries over streaming, lossy, wireless, and power-constrained sensor data sources in this research overview. Learn about Sensor Networks, TinyOS, Central Query Processor, In-Network Processing, and more.
E N D
Queries Over Streaming Sensor Data Samuel Madden Qualifying Exam University of California, Berkeley May 14th, 2002
Introduction • Sensor networks are here • Berkeley on the cutting edge • Data collection, monitoring are a driving application • My research • Query processing for sensor networks • Server (DBMS) side issues • In-network issues • Goal: Understand how to pose, distribute, and process queries over streaming, lossy, wireless, and power-constrained data sources such as sensor networks.
Overview • Introduction • Sensor Networks & TinyOS • Research Goals • Completed Research: Sensor Network QP • Central Query Processor • In Network, on Sensors • Research Plan • Future Implementation & Research Efforts • Time line • Related Work
Overview • Introduction • Sensor Networks & TinyOS • Research Goals • Completed Research: Sensor Network QP • Central Query Processor • In Network, on Sensors • Research Plan • Future Implementation & Research Efforts • Time line • Related Work
Sensor Networks & TinyOS • A collection of small, radio-equipped, battery powered networked microprocessors • Typically Ad-hoc & Multihop Networks • Single devices unreliable • Very low power; tiny batteries or solar cells power for months • Berkeley’s Version: ‘Mica Motes’ • TinyOS operating system (services) • 4K RAM, 512K EEPROM, 128K code space • Lossy: 20% loss @ 5M in Ganesan et al. experiments • Communication Very Expensive • 800 instrs/bit xmitted • Apps: Environment Monitoring, Personal Nets, Object Tracking • Data processing plays a key role!
Overview • Introduction • Sensor Networks & TinyOS • Research Goals • Completed Research: Sensor Network QP • Central Query Processor • In Network, on Sensors • Visualizations • Research Plan • Future Implementation & Research Efforts • Time line • Related Work
Motivation • Why apply database approach to sensor network data processing? • Declarative Queries • Data independence • Optimization opportunities • Hide low-level complexities • Familiar Interface • Work sharing • Adaptivity • Proper interfaces can leverage existing database systems • TeleTiny architecture offers all of these • Suitable for a variety of lossy, streaming environments (not just TinyOS!) • Sharing & Adaptivity are Themes
Lots of help! Fjords: ICDE 2002, with Franklin CACQ: SIGMOD 2002, with Shah, Hellerstein, Raman, Franklin TAG: WMCSA 2002, with Szewczyk, Culler, Franklin, Hellerstein, Hong Catalog: with Hong Visualizations + Interfaces Completed Research Query Processor Partially Complete + Future Work CACQ Long running queries that share work Queries Answers User Workstation Sensor Proxy Mediate between sensor & QP Disk TAG In Network Aggregation Catalog + Sensor Schema Real world deployment @ Intel Berkeley TeleTiny Implementation Architecture Telegraph Fjords Handle push-based data
Overview • Introduction • Sensor Networks & TinyOS • Research Goals • Completed Research: Sensor Network QP • Central Query Processor • In Network, on Sensors • Research Plan • Future Implementation & Research Efforts • Time line • Related Work
Sensor Network Query Processing Challenges • Query Processor Must Be Able To: • Tolerate lossy data delivery • Handle failure of individual data sources • Conserve power on devices whenever possible • Perhaps by using on-board processing • E.g. Applying selection predicates in network • Or by sharing work where ever possible • Handle push-based data • Handle streaming data
Visualizations + Simulations Query Processor Telegraph + CACQ Long running queries that share work Queries Fjords Handle push-based data Answers User Workstation Sensor Proxy Mediate between sensor & QP Disk Server-side Sensor QP • Mechanisms • Continuous Queries • Sensor Proxies • Fjord Query Plan Architecture • Stream Sensitive Operators
Continuous Queries (CQ) • Long running queries • User installs • Continuously receive answers until deinstallation • Common in streaming domain • Instantaneous snapshots don’t tell you much; may not be interested in history • Monitoring Queries • Examine light levels and locate rooms that are in use • Monitor the temperature in my workspace and adjust the temperature to be in the range (x,y)
Continuously Adaptive Continuous Queries (CACQ) • Given user queries over current sensor data • Expect that many queries will be over the same data sources (e.g. traffic sensors) • Queries over current data always looking at same tuples • Those queries can share • Current tuples • Work (e.g. selections) • Sharing reduces computation, communication • Continuously Adaptive • When sharing work, queries come and go • Over long periods of time, selectivities change • Assumptions that were valid at the start of the query no longer valid
R1 SELECT * FROM R WHERE s1(a),s2(b) a SELECT * FROM R WHERE s3(a),s4(b) SELECT * FROM R WHERE s5(a),s6(b) b R CACQ Overview S1 S3 R2 R1 R2 S5 R2 R2 R1 S2 R2 R1 R2 S4 R1 S6
CACQ - Adaptivity Niagara CQ Query 1 Query 2 A A A A D D D B B B B Reject? C C C Data Stream S Data Stream S Working Sharing via Tuple Lineage Q1: SELECT * FROM s WHERE A, B, CQ2: SELECT * FROM s WHERE A, B, D Conventional Queries Query 1 Query 2 s(C,D,B,A) s s s(C,D,B) s s s(C,D) s s s(C) s s() s s s Data Stream S
CACQ Contributions • Continuous adaptivity (operator reordering) via eddies • All queries within same eddy • Routing policies to enable that reordering • Explicit Tuple Lineage • Within each tuple, store where has been, where it must go • Maximizes sharing of tuples between queries • Grouped Filter • Predicate index that applies range & equality selections for multiple queries at the same time
Query 1 Query 3 Query 2 Stocks.sym = Articles.sym Stocks.sym = Articles.sym Stocks.sym = Articles.sym UDF(stocks.sym) UDF(stocks.sym) UDF(stocks) Articles Articles Articles Stocks Stocks Stocks CACQ vs. NiagaraCQ • Performance Comparable for One Experiment in NCQ Paper • Example where CACQ destroys NCQ: |result| > |stocks| Expensive SELECT stocks.sym, articles.text FROM stocks,articles WHERE stocks.sym = articles.sym AND UDF(stocks)
Query 1 Query 2 Query 3 UDF1 UDF2 UDF3 Expensive Query 1 Query 2 Query 3 S1|2|3 S1A S2A S3A |result| > |stocks| UDF3 Stocks.sym = Articles.sym S S1 S2 S3 UDF3 UDF2 UDF1 UDF2 Articles Stocks S UDF1 Niagara Option #2 CACQ Niagara Option #1 CACQ vs. NiagaraCQ #2 SA SA SA S A
CACQ Review • Many Queries, One Eddy • Fine Grained Adaptivity • Grouped Filter Predicate Index • Tuple Lineage
Query Registration Parsed Queries [sources, ops] [fields, filters, aggregates, rates] Query [tuples] Sensor Proxy • CQ is a query processing mechanism; need to get data from sensors • Mediate between Sensors and Query Processor • Push operators out to sensors • Hide query processing, knowledge of multiple queries from sensors • Hide details of sensors from query processor • Enable power-sensitivity Query Processor
Fjording The Stream • Sensors, even through proxy, deliver data unusually • Query plan implementation • Useful for streams and distributed environments • Combine push (streaming) data and pull (static) data • E.g. traffic sensors with CHP accident reports
Summary of Server Side QP • CACQ • Enables sharing of work between long running queries • Enable adaptivity for long running queries • Sensor Proxy • Hides QP complexity from sensors, power issues from QP • Fjords • Enable combination of push and pull data • Non-blocking processing integral to the query processor SIGMOD ICDE
Query Processor Telegraph Queries Answers User Workstation TAG In Network Aggregation Catalog + Sensor Schema Real world deployment @ Intel Berkeley TeleTiny Implementation Sensor Side Sensor QP • Research thus far allows central QP to play nice with sensors • Doesn’t address how sensors can help with QP • Use their processors to processes queries • Advertise their capabilities and data sources • Control data delivery rates • Detect, report, and mitigate errors and failures • Two pieces thus far: • Tiny Aggregation (TAG) : WMCSA Paper, Resubmission in Progress • Catalog • Lots of work in progress!
Catalog • Problem: Given a heterogeneous environment full of motes, how do I know what data they can provide or process? • Solution: Store a small catalog on each device describing its capabilities • Mirror that catalog centrally to avoid overloading sensors • Enables data independence • Catalog Content: • For each attribute: • Name, Type, Size • Units (e.g. farenheit) • Resolution (e.g. 10 bits) • Calibration Information • Accessor functions • Cost information • Power, time, maximum sample rate
Tiny Aggregation (TAG) • How can sensors be leveraged in query processing? • Insight: Aggregate queries common case! • Users want summaries of information across hundreds or thousands of nodes • Information from individual nodes: • Often uninteresting • Could be expensive to retrieve at fine granularity • Take advantage of tree-based multihop routing • Common way to collect data at a centralized location • Combine data at each level to compute aggregates in network
Advantages of TAG • Order of magnitude decrease in communication for some aggregates • Streaming results: • Converge after transient errors • Successive results in half the messages of initial result • Reduces the burden on the upper levels of routing tree • Declarative queries enable: • Optimizations based on a classification of aggregate properties • Very simple to deploy, use
Query 1 1 1 1 2 3 2 2 2 3 3 3 4 5 6 4 4 4 5 5 5 6 6 6 TAG Example SELECT COUNT * FROM SENSORS
(5, 0, 1) 1 Sensor ID Epoch Count 2 3 4 5 6 TAG Example SELECT COUNT * FROM SENSORS Epoch: 0 (4, 0, 1) (5, 0, 1) (6, 0, 1)
(5, 0, 1) 1 Sensor ID Epoch Count 2 3 4 5 6 TAG Example SELECT COUNT * FROM SENSORS Epoch: 1 (2, 0, 2) (3, 0, 2) (4, 1, 1) (5, 1, 1) (6, 1, 1)
(5, 0, 1) 1 Sensor ID Epoch Count 2 3 4 5 6 TAG Example 1,0,6 SELECT COUNT * FROM SENSORS Epoch: 2 (2, 1, 3) (3, 1, 2) (4, 2, 1) (5, 2, 1) (6, 2, 1)
(5, 0, 1) 1 Sensor ID Epoch Count 2 3 4 5 6 TAG Example • Value at Root (d-1) Epochs Old • New Value Every Epoch • Nodes must cache old values 1,1,6 SELECT COUNT * FROM SENSORS Epoch: 3 (2, 2, 3) (3, 2, 2) (4, 3, 1) (5, 3, 1) (6, 3, 1)
TAG: Optimizations + Loss Tolerance • Optimizations to Decrease Message Overhead • When computing a MAX, nodes can suppress their own transmissions if they hear neighbors with greater values • Or, root can propagate down a ‘hypothesis’ • Suppress values that don’t change between epochs • Techniques to Handle Lossiness of Network • Cache child results • Send results up multiple paths in the routing tree • Grouping • Techniques for handling too many groups (aka group eviction)
Experiment: Basic TAG Dense Packing, Ideal Communication
Sensor QP Summary • In-Sensor Query Processing Consists of • TAG, for in-network aggregation • Order of magnitude reduction in communication costs for simple aggregates. • Techniques for grouping, loss tolerance, and further reduction in costs • Catalog, for tracking queryable attributes of sensors • In upcoming implementation • Selection predicates • Multiplexing multiple queries over network
Overview • Introduction • Sensor Networks & TinyOS • Research Goals • Completed Research: Sensor Network QP • Central Query Processor • In Network, on Sensors • Research Plan • Future Implementation & Research Efforts • Time line • Related Work
What’s Left? • Development Tasks • TeleTiny Implementation • Sensor Proxy Policies & Implementation • Telegraph (or some adaptive QP) Interface • Research Tasks • Publish / Follow-on to TAG • Query Semantics • Real-world Deployment Study • Techniques for Reporting & Managing Resources + Loss
TeleTiny Implementation • In Progress (Goal: Ready for SIGMOD ’02 Demo) • In TinyOS, for Mica Motes, with Wei Hong & JMH • Features: • SELECT and aggregate queries processed in-network • Ability to query arbitrary attributes • Including power, signal strength, etc. • Flexible architecture that can be extended with additional operators • Multiple simultaneous queries • UDF / UDAs via VM • Status: • Aggregation & Selection engine built • No UDFs • Primitive routing • No optimizations • Catalog interface designed, stub implementation • 20kb of code space!
Sensor Proxy • Sensor Proxy Issues: • How to choose what runs on centrally and what runs on the motes? • Some operators obvious (e.g. join?): • Storage or computation demands preclude running in-network • Other operators there is a choice: • Limited resources mean motes will not have capacity for all pushable operators. • So which subset of operators to push?
Sensor Proxy (cont) • Cost-based query optimization problem; what to optimize? • Power load on network • Central CPU costs • Basic approach: • Push down as much as possible • Push high-update rate, low-state aggregate queries first • Benefit most from TAG • Satisfy other queries by sampling at minimum rate that can satisfy all queries, processing centrally
Research: Real World Study • Goal: Characterize performance of TeleTiny on a building monitoring network running in the Intel-Research Lab in the PowerBar™ building. • To: • Demonstrate effectiveness of our approach • Derive a number of important workload and real-world parameters that we can only speculate about • Be cool. • Also, Telegraph Integration, which should offer: • CACQ over real sensors • Historical data interface • Queries that combine historical data and streaming sensor data • Fancy adaptive / interactive features • E.g. adjust sample rates on user demand
Real World Study (Cont.) • Measurements to obtain: • Types of queries • Snapshot vs. continuous • Loss + Failure Characteristics • % lost messages, frequency of disconnection • Power Characteristics • Amount of Storage • Server Load • Variability in Data Rates • Is adaptivity really needed? • Lifetime of Queries
Research: Reporting & Mitigating Resource Consumption + Loss • Resource scarcity & loss are endemic to the domain • Problem: What techniques can be used to • Accommodate desired workload despite limited resources? • Mitigate + inform users of losses? • Key Issue because: • Dramatically affects usability of system • Otherwise users will roll-their-own • Dramatically affects quality of system • Results are poor without some additional techniques • Within themes of my research • Sharing of resources • Adaptivity to losses
Some Resource + Loss Tolerance Techniques • Identify locations of loss • E.g. annotate reported values with information about lost children • Provide user with tradeoffs for smoothing loss • TAG • Cache results: temporal smearing • Send to multiple parents: more messages, less variance • Or, as in STREAM project, compute lossy summaries of streams, • Offer user alternatives to unanswerable queries • E.g. ask if a lower sample rate would be OK? • Or if a nearby set of sensors would suffice? • Educate. (Lower expectations!) • Employ Admission Control, Leases
Timeline • May - June 2002: • Complete sensor-side software • Schema API • Catalog Server • UDFs • SIGMOD Demo • ICDE Paper on stream semantics • Resubmit TAG (to OSDI, hopefully.) • June - August 2002: • Telegraph Integration • Sensor proxy implementation • Instrument + Deploy Lab Monitoring, Begin Data Collection
Timeline (cont.) • August - November 2002 • Telegraph historical results integration / implementation • SIGMOD paper on Lab Monitoring deployment • August - January 2003 • Explore and implement mechanisms for handling resource constraints + faults • February 2003 • VLDB Paper on Resource Constraints • February - June 2003 • Complete Dissertation
Overview • Introduction • Sensor Networks & TinyOS • Research Goals • Completed Research: Sensor Network QP • Central Query Processor • In Network, on Sensors • Research Plan • Future Implementation & Research Efforts • Time line • Related Work
Related Work • Database Research • Cougar (Cornell) • Sequences + Streams • SEQ (Wisconsin) + Temporal Database Systems • Stanford STREAM • Architecture similar to CACQ • State management • Query Semantics • Continuous Queries • NiagaraCQ (Wisconsin) • Psoup (Chandrasekaran & Franklin) • X/YFilter (Altinel & Franklin, Diao & Franklin) • Adaptive / Interactive Query Processing • CONTROL (Hellerstein, et. al) • Eddies (Avnur & Hellerstein) • Xjoin / Volcano (Urhan & Franklin, Graefe)
Related Work (Cont.) • Sensor / Networking Research • UCLA / ISI / USC (Estrin, Heidemann, et al.) • Diffusion: Sensor-fusion + Routing • Low-level naming: Mechanisms for data collection, joins? • Application specific aggregation • Impact of Network Density on Data Aggregation • Aka Greedy Aggregation, or how to choose a good topology • Network measurements (Ganesan, et al.) • MIT (Balakrishnan, Morris, et al.) • Fancy routing protocols (LEACH / Span) • Insights into data delivery scheduling for power efficiency • Intentional Naming System (INS) • Berkeley / Intel • TinyOS (Hill, et al.), lots of discussion & ideas
Summary • Query processing is a key feature for improving usability of sensor networks • TeleTiny Solution Brings: • On the query processor • Ability to combine + query data as it streams in • Adaptivity and performance • In the sensor network • Power efficiency via in-network evaluation • Catalog • Upcoming research work: • Real world deployment + study • Evaluation of techniques for resource usage + loss mitigation • TAG resubmission • Graduation, Summer 2003!