270 likes | 360 Views
TelegraphCQ. 49221052 施賀傑 69521041 何承恩. Outline. Introduction Data Movement Implies Adaptivity Telegraph - an Ancestor of TelegraphCQ Adaptive Building Blocks Initial CQ Approaches TelegraphCQ Conclusion and Future Work. Introduction.
E N D
TelegraphCQ 49221052 施賀傑 69521041 何承恩
Outline • Introduction • Data Movement Implies Adaptivity • Telegraph - an Ancestor of TelegraphCQ • Adaptive Building Blocks • Initial CQ Approaches • TelegraphCQ • Conclusion and Future Work
Introduction • TelegraphCQ is an extension to the Telegraph project • Handling large streams of continuous queries over high-volume, highly-variable data streams • Traditional data processing environment is not suitable for motion data • Large scale • Unpredictability of the environment • Need for close interaction with users
Data Movement Implies Adaptivity Traditional database are inappropriate for dataflow processing • Streaming data • Pushing, instead of pulling • Have to be processed on the fly • Continuous Queries (CQ) • Queries are continuously active • Data initiates access to queries
Data Movement Implies Adaptivity • Shared processing • Avoid blocking or interrupt dataflow • Processing each query individually can be slow and wasteful of resources • Queries should have some commonalities • Other Sources of Unpredictability • Deeply networked environment • User may need to adjust the query on the fly based on the previous result
Telegraph - an Ancestor of TelegraphCQ • Designed to provide adaptability to individual dataflow graphs • Two new prototypes to extend Telegraph to support shared processing over streams • CACQ • PSoup
Adaptive Building Blocks Telegraph consist a set of modules • Module Types • Ingress and Caching • Query Processing • Adaptive Routing
Adaptive Building Blocks • Ingress and Caching • Interface with external data sources • HTML/XML screen scraper (TeSS) • Proxy for fetching data from peer-to-peer networks (TeleNap) • Query Processing • Routing tuples through query modules on a tuple-by-tuple basis • A special type of module known as a State Module (SteM)
Adaptive Building Blocks • Adaptive Routing • Construct a query plan that contains adaptive routing modules • Be able to re-optimize the plan while a query is running • Eddy : route data to other query operators • Juggle : perform online reordering • FLuX : route tupples to support parallelism with load-balancing and fault-tolerance
Eddy • Continuously route tuples among a set of other modules according to a routing policy
Eddy • Routing policy • Naive Eddy: • Handle only operators with different costs but equal selectivity • Deliver tuples to the two selection equally • Fast Eddy: • Improve the Naive Eddy with Lottery Scheduling • Tuple to operator → costs a “ticket” • Operator return a tuple → a “ticket” is debited • Benefit: nearly optimal performance with less effort
Eddy • Our query
Eddy • Suppose s1 and s2 have the same selectivity • Set s2 cost 5 delay units
Eddy • Suppose s1 and s2 have the same cost • Set the selectivity of s2 fixed at 50%
SteMs • SteMs • A temporary repository of tuples
Fjords • An inter-module communications API • Allow query plans to use a mixture of push and pull connections between modules
Initial CQ Approaches • CACQ • PSoup • Limitation of CACQ and PSoup • Restricted their processing to data that could fit in memory • Did not investigate scheduling and resource management issues for queries with little or no overlap • Did not explicitly deal with the notion of QoS for adapting to resource limitation • Did not explore opportunities for varying the degree of adaptivity to tradeoff flexibility and overhead
CACQ First continuous query engine to exploit the adaptive query processing framework of Telegraph Modify Eddies to execute multiple queries simultaneously Use grouped filters to optimize selections in the shared execution of the individual queries.
PSoup • Extend the mechanisms developed in CACQ in two main ways • Allow queries to access historical data • Support disconnected operation • New queries can be applied to old data • New data can be applied to old queries • Accomplished by creating a query SteM
PSoup • For example: add a new query
PSoup • Exercise: add a new data using PSoup with example step by step • R.a=3 R.b=6