HiFi Systems: Network-Centric Query Processing for the Physical World

High Fan-in HiFi Systems: Network-Centric Query Processing for the Physical World Michael Franklin UC Berkeley 2.13.04

Introduction • Continuing improvements in sensor devices • Wireless motes • RFID • Cellular-based telemetry • Cheap devices can monitor the environment at a high rate. • Connectivity enables remote monitoring at many different scales. • Widely different concerns at each of these levels and scales.

Plan of Attack • Motivation/Applications/Examples • Characteristics of HiFi Systems • Foundational Components • TelegraphCQ • TinyDB • Research Issues • Conclusions

High Fan-in The Canonical HiFi System

RFID - Retail Scenario • “Smart Shelves” continuously monitor item addition and removal. • Info is sent back through the supply chain.

“Extranet” Information Flow Manufacturer C Retailer A Aggregation/ Distribution Service Manufacturer D Retailer B

M2M - Telemetry/Remote Monitoring • Energy Monitoring - Demand Response • Traffic • Power Generation • Remote Equipment

Time-Shift Trend Prediction • National companies can exploit East Coast/ West Coast time differentials to optimize West Coast operations.

Virtual Sensors • Sensors don’t have to be physical sensors. • Network Monitoring algorithms for detecting viruses, spam, DoS attacks, etc. • Disease outbreak detection

HiFi System Properties • High Fan-In, globally-distributed architecture. • Large data volumes generated at edges. • Filtering and cleaning must be done there. • Successive aggregation as you move inwards. • Summaries/anomalies continually, details later. • Strong temporal focus. • Strong spatial/geographic focus. • Streaming data and stored data. • Integration within and across enterprises.

seconds Time Scale years One View of the Design Space Archiving (provenance and schema evolution) Filtering,Cleaning,Alerts Monitoring, Time-series Data mining (recent history) Combined Stream/Disk Processing On-the-fly processing Disk-based processing

local Geographic Scope global Central Office Regional Centers Several Readers Another View of the Design Space Archiving (provenance and schema evolution) Filtering,Cleaning,Alerts Monitoring, Time-series Data mining (recent history)

Degree of Detail Aggregate Data Volume One More View of the Design Space Archiving (provenance and schema evolution) Filtering,Cleaning,Alerts Monitoring, Time-series Data mining (recent history) Dup Elim history: hrs Interesting Events history: days Trends/Archive history: years

TinyDB TelegraphCQ Building Blocks

TelegraphCQ: Monitoring Data Streams • Streaming Data • Network monitors • Sensor Networks • News feeds • Stock tickers • B2B and Enterprise apps • Supply-Chain, CRM, RFID • Trade Reconciliation, Order Processing etc. • (Quasi) real-time flow of events and data • Must manage these flows to drive business (and other) processes. • Can mine flows to create/adjust business rules or to perform on-line analysis.

TelegraphCQ (Continuous Queries) • An adaptive system for large-scale shared dataflow processing. • Based on an extensible set of operators: 1) Ingress (data access)operators • Wrappers, File readers, Sensor Proxies 2) Non-BlockingData processing operators • Selections (filters), XJoins, … 3)Adaptive Routing Operators • Eddies, STeMs, FLuX, etc. • Operators connected through “Fjords” • queue-based framework unifying push&pull. • Fjords will also allow us to easily mix and match streaming and stored data sources.

per tuple intra- operator inter- operator static plans late binding ??? Query Scrambling, MidQuery Re-opt Dynamic, Parametric, Competitive, … Eddies, CACQ ??? current DBMS XJoin, DPHJ Convergent QP PSoup Extreme Adaptivity • This is the region that we are exploring in the Telegraph project. • Traditional query optimization depends on statistical knowledge of the data and a stable environment. The streaming world has neither.

static dataflow eddy A B C D Adaptivity Overview [Avnur & Hellerstein 2000] D C A B • How to order and reorder operators over time? • Traditionally, use performance, economic/admin feedback • won’t work for never-ending queries over volatile streams • Instead, use adaptive record routing. • Reoptimization = change in routing policy

Shared Memory Query Plan Queue TelegraphCQBack End TelegraphCQBack End TelegraphCQ Front End Eddy Control Queue Planner Parser Listener Modules Modules Split Query Result Queues Mini-Executor CQEddy CQEddy Proxy } Split Split Catalog Scans Scans Shared Memory Buffer Pool Wrappers TelegraphCQ Wrapper ClearingHouse Disk The TelegraphCQ Architecture A single CQEddy can encode multiple queries.

The StreaQuel Query Language SELECTprojection_list FROMfrom_list WHEREselection_and_join_predicates ORDEREDBY TRANSFORM…TO WINDOW…BY • Target language for TelegraphCQ • Windows can be applied to individual streams • Window movement is expressed using a “for loop construct in the “transform” clause • We’re not completely happy with our syntax at this point.

Example Window Query: Landmark

Current Status - TelegraphCQ • System developed by modifying PostgreSQL. • Initial Version released Aug 03 • Open Source (PostgreSQL license) • Shared joins with windows and aggregates • Archived/unarchived streams • Next major release planned this summer. • Initial users include • Network monitoring project at LBL (Netlogger) • Intrusion detection project at Eurecom (France) • Our own project on Sensor Data Processing • Class projects at Berkeley, CMU, and ??? Visit http://telegraph.cs.berkeley.edu for more information.

SELECT MAX(mag) FROM sensors WHERE mag > thresh SAMPLE PERIOD 64ms App Query, Trigger Data TinyDB Sensor Network • Query-based interface to sensor networks • Developed on TinyOS/Motes • Benefits • Ease of programming and retasking • Extensible aggregation framework • Power-sensitive optimization and adaptivity • Sam Madden (Ph.D. Thesis) in collaboration with Wei Hong (Intel). http://telegraph.cs.berkeley.edu/tinydb

Declarative Queries in Sensor Nets • Many sensor network applications can be described using query language primitives. • Potential for tremendous reductions in development and debugging effort. SELECT nestNo, light FROM sensors WHERE light > 400 EPOCH DURATION 1s “Report the light intensities of the bright nests.” Sensors

Regions w/ AVG(sound) > 200 Aggregation Query Example “Count the number occupied nests in each loud region of the island.” • SELECT region, CNT(occupied) AVG(sound) • FROM sensors • GROUP BY region • HAVING AVG(sound) > 200 • EPOCH DURATION 10s

Query Language (TinySQL) SELECT <aggregates>, <attributes> [FROM {sensors | <buffer>}] [WHERE <predicates>] [GROUP BY <exprs>] [SAMPLE PERIOD <const> | ONCE] [INTO <buffer>] [TRIGGER ACTION <command>]

Query {A,B,C,D,E,F} A {B,D,E,F} B C {D,E,F} D F E Sensor Queries @ 10000 Ft (Almost) All Queries are Continuous and Periodic • Written in SQL • With Extensions For : • Sample rate • Offline delivery • Temporal Aggregation M. Franklin, UC Berkeley, Feb. 04

1 2 3 4 5 In-Network Processing: Aggregation SELECT COUNT(*) FROM sensors Interval 4 Sensor # Epoch Interval #

1 2 3 4 5 In-Network Processing: Aggregation SELECT COUNT(*) FROM sensors Interval 4 Sensor # Epoch Interval # 1

1 2 3 4 5 In-Network Processing : Aggregation SELECT COUNT(*) FROM sensors Interval 3 Sensor # 2 Interval #

1 2 3 4 5 In-Network Processing : Aggregation SELECT COUNT(*) FROM sensors Interval 2 Sensor # 1 3 Interval #

1 2 3 4 5 In-Network Processing : Aggregation SELECT COUNT(*) FROM sensors Interval 1 5 Sensor # Interval #

1 2 3 4 5 In-Network Processing : Aggregation SELECT COUNT(*) FROM sensors Interval 4 Sensor # Interval # 1

In Network Aggregation: Example Benefits 2500 Nodes 50x50 Grid Depth = ~10 Neighbors = ~20 M. Franklin, UC Berkeley, Feb. 04

Taxonomy of Aggregates • TinyDB insight: classify aggregates according to various functional properties • Yields a general set of optimizations that can automatically be applied

Current Status - TinyDB • System built on top of TinyOS (~10K lines embedded C code)Latest release 9/2003 • Several deployments including redwoods at UC Botanical Garden 36m 33m: 111 32m: 110 30m: 109,108,107 20m: 106,105,104 10m: 103, 102, 101 Visit http://telegraph.cs.berkeley.edu/tinydb for more information.

HiFi Systems TinyDB TelegraphCQ Putting It All Together?

Mid-tier (???) Ursa-Minor (TinyDB-based) Ursa-Major (TelegraphCQ w/Archiving) Ursa - A HiFi Implementation • Current effort towards building an integrated infrastructure that spans the large scale in: • Time • Geography • Resources

TelegraphCQ/TinyDB Integration • Fjords [Madden & Franklin 02] provide the dataflow plumbing necessary to use TinyDB as a data stream. • Main issues revolve around what to run where. • TCQ is a query processor • TinyDB is also a query processor • Optimization criteria include: total cost, response time, answer quality, answer likelihood, power conservation on motes, … • Project on-going, should work by summer. • Related work: Gigascope work at AT&T

TCQ-based Overlay Network • TCQ is primarily a single node system • Flux operators [Shah et al 03] support cluster-based processing. • Want to run TCQ at each internal node. • Primary issue is support for wide-area temporal and geographic aggregation. • In an adaptive manner, of course • Currently under design. • Related work: Astrolabe, IRISNet, DBIS, …

Querying the Past, Present, and Future • Need to handle archived data • Adaptive compression can reduce processing time. • Historical queries • Joins of Live and Historical Data • Deal with later arriving detail info • Archiving Storage Manager - A Split-stream SM for stream and disk-based processing. • Initial version of new SM running. • Related Work: Temporal and Time-travel DBs

XML, Integration, and Other Realities • Eventually need to support XML • Must integrate with existing enterprise apps. In many areas, standardization well underway • Augmenting moving data • Related Work: YFilter [Diao & Franklin 03], Mutant Queries [Papadimos et al. OGI], 30+ years of data integration research, 10+ years of XML research, … High Fan-in  High Fan-out

HiFi Systems Conclusions • Sensors, RFIDs, and other data collection devices enable real-time enterprises. • These will create high fan-in systems. • Can exploit recent advances in streaming and sensor data management. • Lots to do!

HiFi Systems: Network-Centric Query Processing for the Physical World

HiFi Systems: Network-Centric Query Processing for the Physical World

Presentation Transcript

FBCB2 Blue Force Tracking Presentation to FCC

Network Security

Traditional IR systems

The Sum is Greater Than the Parts Global Query Optimization in Federated Systems

QUERY OPTIMIZATION AND QUERY PROCESSING

Chapter 7: Social Network Analysis

Introduction Background Distributed DBMS Architecture Distributed Database Design

Distributed Databases

Adaptive Query Processing

CPS216: Advanced Database Systems Notes 06:Query Execution (Sort and Join operators)

6 . Distributed Query Optimization

Outline

Adaptive Query Processing with Eddies

Introduction to real-time systems

CS 245: Database System Principles

Data Mining 2

Chapter 20

INTRODUCTION TO PEOPLESOFT QUERY

NETWORK LAYER

Speech Processing

Building the Next Smart City With Mobile Cyber-Physical Systems