130 likes | 220 Views
Towards Adaptive Dataflow Infrastructure. Joe Hellerstein, UC Berkeley. Online Query Processing: The CONTROL Project (’96-’01). Data Analysis on massive datasets takes forever No feedback, 100% accuracy Challenge: make queries more like image delivery
E N D
Towards Adaptive Dataflow Infrastructure Joe Hellerstein, UC Berkeley
Online Query Processing:The CONTROL Project (’96-’01) • Data Analysis on massive datasets takes forever • No feedback, 100% accuracy • Challenge: make queries more like image delivery • But images are pre-encoded in progressive format • Query is ad hoc • Solution: Online Aggregation • Continuous sampling w/o replacement • New pipelining query processing algorithms with good statistical properties (e.g. Ripple Joins) and user control (Online Reordering – “Juggle”) • Estimators and confidence intervals for aggregates • Streaming samples, streaming answers
Can do Online “Enumeration” Too • “Potter’s wheel”
Volatility in Streaming Queries:Analogies for Sensors • Query engines map queries to dataflows • Flow graph laid out by a query optimizer (typically on cluster) • Query executor runs the flow • User priorities change during CONTROL queries • Breaks “compile-then-run” query optimization paradigm • Dynamic reordering of commutative tasks: f(g(x))? g(f(x)) ? • Dynamic reordering of data objects: x1, x2, x3, … • Requires dynamic competition among choices: f(x) or f’(x)? • Volatile networks are similar • Hard to predict rates of consumption/production a priori • Volatile over time, and queries may run “forever” • Imagine interactive user “cockpit" on the sensor net! • Added metrics of power and data quality • And different kinds of volatility, no doubt
Adaptive Dataflow: Convergence of DBs/Nets • The idea from two angles • Queries are flows, query optimization is routing • Sensor queries need nets-style adaptivity • New networking SW looks like a query engine • Click, Scout. Also CANs. • Sensor Qs need DB-style semantic optimization (up to app) • Telegraph: An Adaptive Dataflow System • Boxes & Arrows dataflow programming • Adaptive reoptimization of the flow graph (Eddies) • Adaptive prioritization of the delivery (Juggle) • Adaptive load-balancing/FT across nodes (FLuX) • Mix Push/Pull to blend streams and pools (Fjords)
Telegraph Apps to Date • Web Queries: Election 2000 • http://fff.cs.berkeley.edu • Enhanced P2P functionality • Query by album or artist, via joins with web data • Working on pure P2P query processing • Initial sensor app • Join I-80 traffic movement with webcams and incidents • Smart Dust Mote simulations
Query >> Search: http://fff.cs.berkeley.edu • “Federated Facts and Figures” • Yahoo join FECInfo
Query >> Search:http://fff.cs.berkeley.edu • “Federated Facts and Figures” • APBNews join FECInfo