430 likes | 612 Views
High Fan-in. HiFi Systems: Network-Centric Query Processing for the Physical World. Michael Franklin UC Berkeley 2.13.04. Introduction. Continuing improvements in sensor devices Wireless motes RFID Cellular-based telemetry Cheap devices can monitor the environment at a high rate.
E N D
High Fan-in HiFi Systems: Network-Centric Query Processing for the Physical World Michael Franklin UC Berkeley 2.13.04
Introduction • Continuing improvements in sensor devices • Wireless motes • RFID • Cellular-based telemetry • Cheap devices can monitor the environment at a high rate. • Connectivity enables remote monitoring at many different scales. • Widely different concerns at each of these levels and scales.
Plan of Attack • Motivation/Applications/Examples • Characteristics of HiFi Systems • Foundational Components • TelegraphCQ • TinyDB • Research Issues • Conclusions
High Fan-in The Canonical HiFi System
RFID - Retail Scenario • “Smart Shelves” continuously monitor item addition and removal. • Info is sent back through the supply chain.
“Extranet” Information Flow Manufacturer C Retailer A Aggregation/ Distribution Service Manufacturer D Retailer B
M2M - Telemetry/Remote Monitoring • Energy Monitoring - Demand Response • Traffic • Power Generation • Remote Equipment
Time-Shift Trend Prediction • National companies can exploit East Coast/ West Coast time differentials to optimize West Coast operations.
Virtual Sensors • Sensors don’t have to be physical sensors. • Network Monitoring algorithms for detecting viruses, spam, DoS attacks, etc. • Disease outbreak detection
HiFi System Properties • High Fan-In, globally-distributed architecture. • Large data volumes generated at edges. • Filtering and cleaning must be done there. • Successive aggregation as you move inwards. • Summaries/anomalies continually, details later. • Strong temporal focus. • Strong spatial/geographic focus. • Streaming data and stored data. • Integration within and across enterprises.
seconds Time Scale years One View of the Design Space Archiving (provenance and schema evolution) Filtering,Cleaning,Alerts Monitoring, Time-series Data mining (recent history) Combined Stream/Disk Processing On-the-fly processing Disk-based processing
local Geographic Scope global Central Office Regional Centers Several Readers Another View of the Design Space Archiving (provenance and schema evolution) Filtering,Cleaning,Alerts Monitoring, Time-series Data mining (recent history)
Degree of Detail Aggregate Data Volume One More View of the Design Space Archiving (provenance and schema evolution) Filtering,Cleaning,Alerts Monitoring, Time-series Data mining (recent history) Dup Elim history: hrs Interesting Events history: days Trends/Archive history: years
TinyDB TelegraphCQ Building Blocks
TelegraphCQ: Monitoring Data Streams • Streaming Data • Network monitors • Sensor Networks • News feeds • Stock tickers • B2B and Enterprise apps • Supply-Chain, CRM, RFID • Trade Reconciliation, Order Processing etc. • (Quasi) real-time flow of events and data • Must manage these flows to drive business (and other) processes. • Can mine flows to create/adjust business rules or to perform on-line analysis.
TelegraphCQ (Continuous Queries) • An adaptive system for large-scale shared dataflow processing. • Based on an extensible set of operators: 1) Ingress (data access)operators • Wrappers, File readers, Sensor Proxies 2) Non-BlockingData processing operators • Selections (filters), XJoins, … 3)Adaptive Routing Operators • Eddies, STeMs, FLuX, etc. • Operators connected through “Fjords” • queue-based framework unifying push&pull. • Fjords will also allow us to easily mix and match streaming and stored data sources.
per tuple intra- operator inter- operator static plans late binding ??? Query Scrambling, MidQuery Re-opt Dynamic, Parametric, Competitive, … Eddies, CACQ ??? current DBMS XJoin, DPHJ Convergent QP PSoup Extreme Adaptivity • This is the region that we are exploring in the Telegraph project. • Traditional query optimization depends on statistical knowledge of the data and a stable environment. The streaming world has neither.
static dataflow eddy A B C D Adaptivity Overview [Avnur & Hellerstein 2000] D C A B • How to order and reorder operators over time? • Traditionally, use performance, economic/admin feedback • won’t work for never-ending queries over volatile streams • Instead, use adaptive record routing. • Reoptimization = change in routing policy
Shared Memory Query Plan Queue TelegraphCQBack End TelegraphCQBack End TelegraphCQ Front End Eddy Control Queue Planner Parser Listener Modules Modules Split Query Result Queues Mini-Executor CQEddy CQEddy Proxy } Split Split Catalog Scans Scans Shared Memory Buffer Pool Wrappers TelegraphCQ Wrapper ClearingHouse Disk The TelegraphCQ Architecture A single CQEddy can encode multiple queries.
The StreaQuel Query Language SELECTprojection_list FROMfrom_list WHEREselection_and_join_predicates ORDEREDBY TRANSFORM…TO WINDOW…BY • Target language for TelegraphCQ • Windows can be applied to individual streams • Window movement is expressed using a “for loop construct in the “transform” clause • We’re not completely happy with our syntax at this point.
Current Status - TelegraphCQ • System developed by modifying PostgreSQL. • Initial Version released Aug 03 • Open Source (PostgreSQL license) • Shared joins with windows and aggregates • Archived/unarchived streams • Next major release planned this summer. • Initial users include • Network monitoring project at LBL (Netlogger) • Intrusion detection project at Eurecom (France) • Our own project on Sensor Data Processing • Class projects at Berkeley, CMU, and ??? Visit http://telegraph.cs.berkeley.edu for more information.
SELECT MAX(mag) FROM sensors WHERE mag > thresh SAMPLE PERIOD 64ms App Query, Trigger Data TinyDB Sensor Network • Query-based interface to sensor networks • Developed on TinyOS/Motes • Benefits • Ease of programming and retasking • Extensible aggregation framework • Power-sensitive optimization and adaptivity • Sam Madden (Ph.D. Thesis) in collaboration with Wei Hong (Intel). http://telegraph.cs.berkeley.edu/tinydb
Declarative Queries in Sensor Nets • Many sensor network applications can be described using query language primitives. • Potential for tremendous reductions in development and debugging effort. SELECT nestNo, light FROM sensors WHERE light > 400 EPOCH DURATION 1s “Report the light intensities of the bright nests.” Sensors
Regions w/ AVG(sound) > 200 Aggregation Query Example “Count the number occupied nests in each loud region of the island.” • SELECT region, CNT(occupied) AVG(sound) • FROM sensors • GROUP BY region • HAVING AVG(sound) > 200 • EPOCH DURATION 10s
Query Language (TinySQL) SELECT <aggregates>, <attributes> [FROM {sensors | <buffer>}] [WHERE <predicates>] [GROUP BY <exprs>] [SAMPLE PERIOD <const> | ONCE] [INTO <buffer>] [TRIGGER ACTION <command>]
Query {A,B,C,D,E,F} A {B,D,E,F} B C {D,E,F} D F E Sensor Queries @ 10000 Ft (Almost) All Queries are Continuous and Periodic • Written in SQL • With Extensions For : • Sample rate • Offline delivery • Temporal Aggregation M. Franklin, UC Berkeley, Feb. 04
1 2 3 4 5 In-Network Processing: Aggregation SELECT COUNT(*) FROM sensors Interval 4 Sensor # Epoch Interval #
1 2 3 4 5 In-Network Processing: Aggregation SELECT COUNT(*) FROM sensors Interval 4 Sensor # Epoch Interval # 1
1 2 3 4 5 In-Network Processing : Aggregation SELECT COUNT(*) FROM sensors Interval 3 Sensor # 2 Interval #
1 2 3 4 5 In-Network Processing : Aggregation SELECT COUNT(*) FROM sensors Interval 2 Sensor # 1 3 Interval #
1 2 3 4 5 In-Network Processing : Aggregation SELECT COUNT(*) FROM sensors Interval 1 5 Sensor # Interval #
1 2 3 4 5 In-Network Processing : Aggregation SELECT COUNT(*) FROM sensors Interval 4 Sensor # Interval # 1
In Network Aggregation: Example Benefits 2500 Nodes 50x50 Grid Depth = ~10 Neighbors = ~20 M. Franklin, UC Berkeley, Feb. 04
Taxonomy of Aggregates • TinyDB insight: classify aggregates according to various functional properties • Yields a general set of optimizations that can automatically be applied
Current Status - TinyDB • System built on top of TinyOS (~10K lines embedded C code)Latest release 9/2003 • Several deployments including redwoods at UC Botanical Garden 36m 33m: 111 32m: 110 30m: 109,108,107 20m: 106,105,104 10m: 103, 102, 101 Visit http://telegraph.cs.berkeley.edu/tinydb for more information.
HiFi Systems TinyDB TelegraphCQ Putting It All Together?
Mid-tier (???) Ursa-Minor (TinyDB-based) Ursa-Major (TelegraphCQ w/Archiving) Ursa - A HiFi Implementation • Current effort towards building an integrated infrastructure that spans the large scale in: • Time • Geography • Resources
TelegraphCQ/TinyDB Integration • Fjords [Madden & Franklin 02] provide the dataflow plumbing necessary to use TinyDB as a data stream. • Main issues revolve around what to run where. • TCQ is a query processor • TinyDB is also a query processor • Optimization criteria include: total cost, response time, answer quality, answer likelihood, power conservation on motes, … • Project on-going, should work by summer. • Related work: Gigascope work at AT&T
TCQ-based Overlay Network • TCQ is primarily a single node system • Flux operators [Shah et al 03] support cluster-based processing. • Want to run TCQ at each internal node. • Primary issue is support for wide-area temporal and geographic aggregation. • In an adaptive manner, of course • Currently under design. • Related work: Astrolabe, IRISNet, DBIS, …
Querying the Past, Present, and Future • Need to handle archived data • Adaptive compression can reduce processing time. • Historical queries • Joins of Live and Historical Data • Deal with later arriving detail info • Archiving Storage Manager - A Split-stream SM for stream and disk-based processing. • Initial version of new SM running. • Related Work: Temporal and Time-travel DBs
XML, Integration, and Other Realities • Eventually need to support XML • Must integrate with existing enterprise apps. In many areas, standardization well underway • Augmenting moving data • Related Work: YFilter [Diao & Franklin 03], Mutant Queries [Papadimos et al. OGI], 30+ years of data integration research, 10+ years of XML research, … High Fan-in High Fan-out
HiFi Systems Conclusions • Sensors, RFIDs, and other data collection devices enable real-time enterprises. • These will create high fan-in systems. • Can exploit recent advances in streaming and sensor data management. • Lots to do!