1 / 13

Towards Adaptive Dataflow Infrastructure

Towards Adaptive Dataflow Infrastructure. Joe Hellerstein, UC Berkeley. Online Query Processing: The CONTROL Project (’96-’01). Data Analysis on massive datasets takes forever No feedback, 100% accuracy Challenge: make queries more like image delivery

ora-thomas
Download Presentation

Towards Adaptive Dataflow Infrastructure

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Towards Adaptive Dataflow Infrastructure Joe Hellerstein, UC Berkeley

  2. Online Query Processing:The CONTROL Project (’96-’01) • Data Analysis on massive datasets takes forever • No feedback, 100% accuracy • Challenge: make queries more like image delivery • But images are pre-encoded in progressive format • Query is ad hoc • Solution: Online Aggregation • Continuous sampling w/o replacement • New pipelining query processing algorithms with good statistical properties (e.g. Ripple Joins) and user control (Online Reordering – “Juggle”) • Estimators and confidence intervals for aggregates • Streaming samples, streaming answers

  3. Images Are Aggregates

  4. Can do Online “Enumeration” Too • “Potter’s wheel”

  5. Volatility in Streaming Queries:Analogies for Sensors • Query engines map queries to dataflows • Flow graph laid out by a query optimizer (typically on cluster) • Query executor runs the flow • User priorities change during CONTROL queries • Breaks “compile-then-run” query optimization paradigm • Dynamic reordering of commutative tasks: f(g(x))? g(f(x)) ? • Dynamic reordering of data objects: x1, x2, x3, … • Requires dynamic competition among choices: f(x) or f’(x)? • Volatile networks are similar • Hard to predict rates of consumption/production a priori • Volatile over time, and queries may run “forever” • Imagine interactive user “cockpit" on the sensor net! • Added metrics of power and data quality • And different kinds of volatility, no doubt

  6. Adaptive Dataflow: Convergence of DBs/Nets • The idea from two angles • Queries are flows, query optimization is routing • Sensor queries need nets-style adaptivity • New networking SW looks like a query engine • Click, Scout. Also CANs. • Sensor Qs need DB-style semantic optimization (up to app) • Telegraph: An Adaptive Dataflow System • Boxes & Arrows dataflow programming • Adaptive reoptimization of the flow graph (Eddies) • Adaptive prioritization of the delivery (Juggle) • Adaptive load-balancing/FT across nodes (FLuX) • Mix Push/Pull to blend streams and pools (Fjords)

  7. Extra Slides on Telegraph

  8. Telegraph Apps to Date • Web Queries: Election 2000 • http://fff.cs.berkeley.edu • Enhanced P2P functionality • Query by album or artist, via joins with web data • Working on pure P2P query processing • Initial sensor app • Join I-80 traffic movement with webcams and incidents • Smart Dust Mote simulations

  9. Telenap: Amazon Meets Napster

  10. Movie Stars Who Donated to Bush

  11. Query >> Search: http://fff.cs.berkeley.edu • “Federated Facts and Figures” • Yahoo join FECInfo

  12. Query >> Search:http://fff.cs.berkeley.edu • “Federated Facts and Figures” • APBNews join FECInfo

More Related