110 likes | 211 Views
TelegraphCQ: Continuous Dataflow Processing for an Uncertain World. Sirish Chandrasekaran, Owen Cooper, Amol Deshpande, Michael J. Franklin, Joseph M. Hellerstein,Wei Hong*, Sailesh Krishnamurthy, Sam Madden, Vijayshankar Raman**, Fred Reiss, and Mehul Shah University of California, Berkeley
E N D
TelegraphCQ: Continuous Dataflow Processing for an Uncertain World Sirish Chandrasekaran, Owen Cooper, Amol Deshpande, Michael J. Franklin, Joseph M. Hellerstein,Wei Hong*, Sailesh Krishnamurthy, Sam Madden, Vijayshankar Raman**, Fred Reiss, and Mehul Shah University of California, Berkeley *Intel Berkeley Laboratory **IBM Almaden Research Center http://telegraph.cs.berkeley.edu/
Contents • Background and Motivation • Telegraph – Architecture • Window Semantics in TelegraphCQ • TelegraphCQ – Design Overview • TelegraphCQ – Architecture • Conclusion • All diagrams and contents are directly adapted/taken from the paper itself!
TelegraphCQ – Background and Motivation • Adaptive Dataflow Architecture – systems that could adjust their processing on-the-fly in response to • Changes in user needs [HACO+99] • Intermittent delays in accessing data across WANs [UFA98] • Shared Processing • CACQ [MSHR02] • PSoup [CF02] • Limitations - • processing restricted to in-memory data • No scheduling and resource management for queries with little or no overlap • No Quality of Service (QoS) for adapting to resource limitations • No tradeoff between flexibility and overhead
Telegraph - Architecture • Extensible set of composable dataflow modules/operators • Producer-Consumer design with Fjords API • Push as well as Pull queues • Ingress and Caching • Query Processing • Adaptive Routing
Adaptive Processing – Eddies & SteMs • EDDY – • continuously route tuples according to a routing policy • per tuple basis routing requiring associated state to the tuple • SteMs – • Temporary repository of tuples • Stores homogeneous tuples • Supports build (insert), probe (search) and eviction (deletion)operations
Fjords – InterModule Communication • Allow use of mixture of push and pull connections between modules • a pull-queue is implemented using a blocking dequeue on the consumer side and a blocking enqueue on the producer side. • A push-queue is implemented using non-blocking enqueue and dequeue; control is returned to the consumer when the queue is empty • Execute query over any combination of streaming and static data sources Flux – Scaling Up Dataflow Processing • Interposed between a producer-consumer operator pair in a pipelined, partitioned dataflow • Fault-tolerant, Load-balancing eXchange • Load-balancing via online repartitioning of the input stream and corresponding state of operators • Fault-tolerance by leveraging these state movement mechanisms to replicate an operator’s internal state and in-flight data
Initial CQ Approaches CACQ • First CQ engine exploiting adaptive query processing framework • Modification of Eddies- execution of multiple queries by executing a single “super”- query as disjunction of all the queries • Tuple Lineage – state to determine the client • Grouped Filters – index for single variable Boolean factors over the same attribute for optimizing selections in the shared execution PSoup • Extends CACQ • Allows queries to access historical data – treats data and queries symmetrically • Adds support for disconnected operation-users can register queries
Window Semantics in TelegraphCQ • Rich windowing schemes over both already-arrived as well as incoming data • Various window semantics are- • Snapshot query: execute exactly once over one window e.g. “Select the closing prices for MSFT on the first five days of trading” • Landmark query: fixed beginning point and a forward moving endpoint e.g. “Select all the days after the hundredth trading day, on which the closing price of MSFT has been greater than $50. Keep this query standing in the system for a thousand trading days” • Sliding query: forward moving beginning and end e.g. “On every fifth trading day starting today, calculate the average closing price of MSFT for the five most recent trading days. Keep the query standing for fifty trading days” • Temporal Band-Join: join tuples in one stream with those in another based on timestamp e.g. “For the five most recent trading days starting today, select all stocks that closed higher than MSFT on a given day. Keep the query standing for twenty trading days”
TelegraphCQ – Design Overview • Adapted the architecture of PostgreSQL • Implemented the new system in C/C++ to leverage the open source PostgreSQL code base • Reused components with different levels of changes
TelegraphCQ – Architecture • Three processes that comprise the TelegraphCQ server • FrontEnd • Wrapper • Providing Abstraction of External Source • Separate Process( non-blocking) • Executor • Execution Object Providing Execution Context for Multiple Queries • Dispatch Unit Performing Actual Work
Conclusion • TelegraphCQ provides adaptive dataflow and shared processing architecture • Eddy and SteM form building blocks for adaptive processing • Features like Fjord’s inter-module communication (push and pull connections) and Flux – Fault-tolerant and Load-balancing Exchange • CACQ (tuple-lineage and group-filters) PSoup (Symmetrical treatment of data and queries) • Built over the PostgreSQL framework Thank you