1 / 43

HiFi Systems: Network-Centric Query Processing for the Physical World

High Fan-in. HiFi Systems: Network-Centric Query Processing for the Physical World. Michael Franklin UC Berkeley 2.13.04. Introduction. Continuing improvements in sensor devices Wireless motes RFID Cellular-based telemetry Cheap devices can monitor the environment at a high rate.

makya
Download Presentation

HiFi Systems: Network-Centric Query Processing for the Physical World

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. High Fan-in HiFi Systems: Network-Centric Query Processing for the Physical World Michael Franklin UC Berkeley 2.13.04

  2. Introduction • Continuing improvements in sensor devices • Wireless motes • RFID • Cellular-based telemetry • Cheap devices can monitor the environment at a high rate. • Connectivity enables remote monitoring at many different scales. • Widely different concerns at each of these levels and scales.

  3. Plan of Attack • Motivation/Applications/Examples • Characteristics of HiFi Systems • Foundational Components • TelegraphCQ • TinyDB • Research Issues • Conclusions

  4. High Fan-in The Canonical HiFi System

  5. RFID - Retail Scenario • “Smart Shelves” continuously monitor item addition and removal. • Info is sent back through the supply chain.

  6. “Extranet” Information Flow Manufacturer C Retailer A Aggregation/ Distribution Service Manufacturer D Retailer B

  7. M2M - Telemetry/Remote Monitoring • Energy Monitoring - Demand Response • Traffic • Power Generation • Remote Equipment

  8. Time-Shift Trend Prediction • National companies can exploit East Coast/ West Coast time differentials to optimize West Coast operations.

  9. Virtual Sensors • Sensors don’t have to be physical sensors. • Network Monitoring algorithms for detecting viruses, spam, DoS attacks, etc. • Disease outbreak detection

  10. HiFi System Properties • High Fan-In, globally-distributed architecture. • Large data volumes generated at edges. • Filtering and cleaning must be done there. • Successive aggregation as you move inwards. • Summaries/anomalies continually, details later. • Strong temporal focus. • Strong spatial/geographic focus. • Streaming data and stored data. • Integration within and across enterprises.

  11. seconds Time Scale years One View of the Design Space Archiving (provenance and schema evolution) Filtering,Cleaning,Alerts Monitoring, Time-series Data mining (recent history) Combined Stream/Disk Processing On-the-fly processing Disk-based processing

  12. local Geographic Scope global Central Office Regional Centers Several Readers Another View of the Design Space Archiving (provenance and schema evolution) Filtering,Cleaning,Alerts Monitoring, Time-series Data mining (recent history)

  13. Degree of Detail Aggregate Data Volume One More View of the Design Space Archiving (provenance and schema evolution) Filtering,Cleaning,Alerts Monitoring, Time-series Data mining (recent history) Dup Elim history: hrs Interesting Events history: days Trends/Archive history: years

  14. TinyDB TelegraphCQ Building Blocks

  15. TelegraphCQ: Monitoring Data Streams • Streaming Data • Network monitors • Sensor Networks • News feeds • Stock tickers • B2B and Enterprise apps • Supply-Chain, CRM, RFID • Trade Reconciliation, Order Processing etc. • (Quasi) real-time flow of events and data • Must manage these flows to drive business (and other) processes. • Can mine flows to create/adjust business rules or to perform on-line analysis.

  16. TelegraphCQ (Continuous Queries) • An adaptive system for large-scale shared dataflow processing. • Based on an extensible set of operators: 1) Ingress (data access)operators • Wrappers, File readers, Sensor Proxies 2) Non-BlockingData processing operators • Selections (filters), XJoins, … 3)Adaptive Routing Operators • Eddies, STeMs, FLuX, etc. • Operators connected through “Fjords” • queue-based framework unifying push&pull. • Fjords will also allow us to easily mix and match streaming and stored data sources.

  17. per tuple intra- operator inter- operator static plans late binding ??? Query Scrambling, MidQuery Re-opt Dynamic, Parametric, Competitive, … Eddies, CACQ ??? current DBMS XJoin, DPHJ Convergent QP PSoup Extreme Adaptivity • This is the region that we are exploring in the Telegraph project. • Traditional query optimization depends on statistical knowledge of the data and a stable environment. The streaming world has neither.

  18. static dataflow eddy A B C D Adaptivity Overview [Avnur & Hellerstein 2000] D C A B • How to order and reorder operators over time? • Traditionally, use performance, economic/admin feedback • won’t work for never-ending queries over volatile streams • Instead, use adaptive record routing. • Reoptimization = change in routing policy

  19. Shared Memory Query Plan Queue TelegraphCQBack End TelegraphCQBack End TelegraphCQ Front End Eddy Control Queue Planner Parser Listener Modules Modules Split Query Result Queues Mini-Executor CQEddy CQEddy Proxy } Split Split Catalog Scans Scans Shared Memory Buffer Pool Wrappers TelegraphCQ Wrapper ClearingHouse Disk The TelegraphCQ Architecture A single CQEddy can encode multiple queries.

  20. The StreaQuel Query Language SELECTprojection_list FROMfrom_list WHEREselection_and_join_predicates ORDEREDBY TRANSFORM…TO WINDOW…BY • Target language for TelegraphCQ • Windows can be applied to individual streams • Window movement is expressed using a “for loop construct in the “transform” clause • We’re not completely happy with our syntax at this point.

  21. Example Window Query: Landmark

  22. Current Status - TelegraphCQ • System developed by modifying PostgreSQL. • Initial Version released Aug 03 • Open Source (PostgreSQL license) • Shared joins with windows and aggregates • Archived/unarchived streams • Next major release planned this summer. • Initial users include • Network monitoring project at LBL (Netlogger) • Intrusion detection project at Eurecom (France) • Our own project on Sensor Data Processing • Class projects at Berkeley, CMU, and ??? Visit http://telegraph.cs.berkeley.edu for more information.

  23. SELECT MAX(mag) FROM sensors WHERE mag > thresh SAMPLE PERIOD 64ms App Query, Trigger Data TinyDB Sensor Network • Query-based interface to sensor networks • Developed on TinyOS/Motes • Benefits • Ease of programming and retasking • Extensible aggregation framework • Power-sensitive optimization and adaptivity • Sam Madden (Ph.D. Thesis) in collaboration with Wei Hong (Intel). http://telegraph.cs.berkeley.edu/tinydb

  24. Declarative Queries in Sensor Nets • Many sensor network applications can be described using query language primitives. • Potential for tremendous reductions in development and debugging effort. SELECT nestNo, light FROM sensors WHERE light > 400 EPOCH DURATION 1s “Report the light intensities of the bright nests.” Sensors

  25. Regions w/ AVG(sound) > 200 Aggregation Query Example “Count the number occupied nests in each loud region of the island.” • SELECT region, CNT(occupied) AVG(sound) • FROM sensors • GROUP BY region • HAVING AVG(sound) > 200 • EPOCH DURATION 10s

  26. Query Language (TinySQL) SELECT <aggregates>, <attributes> [FROM {sensors | <buffer>}] [WHERE <predicates>] [GROUP BY <exprs>] [SAMPLE PERIOD <const> | ONCE] [INTO <buffer>] [TRIGGER ACTION <command>]

  27. Query {A,B,C,D,E,F} A {B,D,E,F} B C {D,E,F} D F E Sensor Queries @ 10000 Ft (Almost) All Queries are Continuous and Periodic • Written in SQL • With Extensions For : • Sample rate • Offline delivery • Temporal Aggregation M. Franklin, UC Berkeley, Feb. 04

  28. 1 2 3 4 5 In-Network Processing: Aggregation SELECT COUNT(*) FROM sensors Interval 4 Sensor # Epoch Interval #

  29. 1 2 3 4 5 In-Network Processing: Aggregation SELECT COUNT(*) FROM sensors Interval 4 Sensor # Epoch Interval # 1

  30. 1 2 3 4 5 In-Network Processing : Aggregation SELECT COUNT(*) FROM sensors Interval 3 Sensor # 2 Interval #

  31. 1 2 3 4 5 In-Network Processing : Aggregation SELECT COUNT(*) FROM sensors Interval 2 Sensor # 1 3 Interval #

  32. 1 2 3 4 5 In-Network Processing : Aggregation SELECT COUNT(*) FROM sensors Interval 1 5 Sensor # Interval #

  33. 1 2 3 4 5 In-Network Processing : Aggregation SELECT COUNT(*) FROM sensors Interval 4 Sensor # Interval # 1

  34. In Network Aggregation: Example Benefits 2500 Nodes 50x50 Grid Depth = ~10 Neighbors = ~20 M. Franklin, UC Berkeley, Feb. 04

  35. Taxonomy of Aggregates • TinyDB insight: classify aggregates according to various functional properties • Yields a general set of optimizations that can automatically be applied

  36. Current Status - TinyDB • System built on top of TinyOS (~10K lines embedded C code)Latest release 9/2003 • Several deployments including redwoods at UC Botanical Garden 36m 33m: 111 32m: 110 30m: 109,108,107 20m: 106,105,104 10m: 103, 102, 101 Visit http://telegraph.cs.berkeley.edu/tinydb for more information.

  37. HiFi Systems TinyDB TelegraphCQ Putting It All Together?

  38. Mid-tier (???) Ursa-Minor (TinyDB-based) Ursa-Major (TelegraphCQ w/Archiving) Ursa - A HiFi Implementation • Current effort towards building an integrated infrastructure that spans the large scale in: • Time • Geography • Resources

  39. TelegraphCQ/TinyDB Integration • Fjords [Madden & Franklin 02] provide the dataflow plumbing necessary to use TinyDB as a data stream. • Main issues revolve around what to run where. • TCQ is a query processor • TinyDB is also a query processor • Optimization criteria include: total cost, response time, answer quality, answer likelihood, power conservation on motes, … • Project on-going, should work by summer. • Related work: Gigascope work at AT&T

  40. TCQ-based Overlay Network • TCQ is primarily a single node system • Flux operators [Shah et al 03] support cluster-based processing. • Want to run TCQ at each internal node. • Primary issue is support for wide-area temporal and geographic aggregation. • In an adaptive manner, of course • Currently under design. • Related work: Astrolabe, IRISNet, DBIS, …

  41. Querying the Past, Present, and Future • Need to handle archived data • Adaptive compression can reduce processing time. • Historical queries • Joins of Live and Historical Data • Deal with later arriving detail info • Archiving Storage Manager - A Split-stream SM for stream and disk-based processing. • Initial version of new SM running. • Related Work: Temporal and Time-travel DBs

  42. XML, Integration, and Other Realities • Eventually need to support XML • Must integrate with existing enterprise apps. In many areas, standardization well underway • Augmenting moving data • Related Work: YFilter [Diao & Franklin 03], Mutant Queries [Papadimos et al. OGI], 30+ years of data integration research, 10+ years of XML research, … High Fan-in  High Fan-out

  43. HiFi Systems Conclusions • Sensors, RFIDs, and other data collection devices enable real-time enterprises. • These will create high fan-in systems. • Can exploit recent advances in streaming and sensor data management. • Lots to do!

More Related