TelegraphCQ

TelegraphCQ 49221052 施賀傑 69521041 何承恩

Outline • Introduction • Data Movement Implies Adaptivity • Telegraph - an Ancestor of TelegraphCQ • Adaptive Building Blocks • Initial CQ Approaches • TelegraphCQ • Conclusion and Future Work

Introduction • TelegraphCQ is an extension to the Telegraph project • Handling large streams of continuous queries over high-volume, highly-variable data streams • Traditional data processing environment is not suitable for motion data • Large scale • Unpredictability of the environment • Need for close interaction with users

Data Movement Implies Adaptivity Traditional database are inappropriate for dataflow processing • Streaming data • Pushing, instead of pulling • Have to be processed on the fly • Continuous Queries (CQ) • Queries are continuously active • Data initiates access to queries

Data Movement Implies Adaptivity • Shared processing • Avoid blocking or interrupt dataflow • Processing each query individually can be slow and wasteful of resources • Queries should have some commonalities • Other Sources of Unpredictability • Deeply networked environment • User may need to adjust the query on the fly based on the previous result

Telegraph - an Ancestor of TelegraphCQ • Designed to provide adaptability to individual dataflow graphs • Two new prototypes to extend Telegraph to support shared processing over streams • CACQ • PSoup

Adaptive Building Blocks Telegraph consist a set of modules • Module Types • Ingress and Caching • Query Processing • Adaptive Routing

Adaptive Building Blocks

Adaptive Building Blocks • Ingress and Caching • Interface with external data sources • HTML/XML screen scraper (TeSS) • Proxy for fetching data from peer-to-peer networks (TeleNap) • Query Processing • Routing tuples through query modules on a tuple-by-tuple basis • A special type of module known as a State Module (SteM)

Adaptive Building Blocks • Adaptive Routing • Construct a query plan that contains adaptive routing modules • Be able to re-optimize the plan while a query is running • Eddy : route data to other query operators • Juggle : perform online reordering • FLuX : route tupples to support parallelism with load-balancing and fault-tolerance

Eddy • Continuously route tuples among a set of other modules according to a routing policy

Eddy • Routing policy • Naive Eddy: • Handle only operators with different costs but equal selectivity • Deliver tuples to the two selection equally • Fast Eddy: • Improve the Naive Eddy with Lottery Scheduling • Tuple to operator → costs a “ticket” • Operator return a tuple → a “ticket” is debited • Benefit: nearly optimal performance with less effort

Eddy • Our query

Eddy • Suppose s1 and s2 have the same selectivity • Set s2 cost 5 delay units

Eddy • Suppose s1 and s2 have the same cost • Set the selectivity of s2 fixed at 50%

Eddy

SteMs • SteMs • A temporary repository of tuples

Fjords • An inter-module communications API • Allow query plans to use a mixture of push and pull connections between modules

Initial CQ Approaches • CACQ • PSoup • Limitation of CACQ and PSoup • Restricted their processing to data that could fit in memory • Did not investigate scheduling and resource management issues for queries with little or no overlap • Did not explicitly deal with the notion of QoS for adapting to resource limitation • Did not explore opportunities for varying the degree of adaptivity to tradeoff flexibility and overhead

CACQ First continuous query engine to exploit the adaptive query processing framework of Telegraph Modify Eddies to execute multiple queries simultaneously Use grouped filters to optimize selections in the shared execution of the individual queries.

CACQ

PSoup • Extend the mechanisms developed in CACQ in two main ways • Allow queries to access historical data • Support disconnected operation • New queries can be applied to old data • New data can be applied to old queries • Accomplished by creating a query SteM

PSoup

PSoup • For example: add a new query

PSoup

PSoup • Exercise: add a new data using PSoup with example step by step • R.a=3 R.b=6

PSoup

TelegraphCQ

TelegraphCQ

Presentation Transcript

TelegraphCQ: Continuous Dataflow Processing for an Uncertain World

TelegraphCQ