150 likes | 321 Views
Streaming Queries over Streaming Data. Authors : Sirish Chandrasekaran, UC Berkeley Michael J. Franklin , UC Berkeley OriginallyPresented at the 28 th VLDB Conference, Hong Kong, China, 2002 Presented by Amit Choudhri for CMSC 891 J. I. Java – based system called PSoup
E N D
Streaming Queries over Streaming Data Authors : • Sirish Chandrasekaran, UC Berkeley • Michael J. Franklin , UC Berkeley OriginallyPresented at the 28th VLDB Conference, Hong Kong, China, 2002 • Presented by Amit Choudhri for CMSC 891 J
I • Java – based system called PSoup • Processes both ad-hoc and continuous queries • Treats data and queries symmetrically as streams • Streams are duals of each other • New queries can be applied to old data • New data can be applied to old queries
PSoup uses query specifications of the form:SELECT select_list FROM from_list WHERE conjoined_boolean_factors BEGIN begin_time END end_time • Select – From – Where is the Standing Query Clause (SQC) • Begin – End specifies the input window for which results have to be computed.
State Modules (SteM): • Query SteM ( one for all query specifications ) • Data SteM ( one per data stream ) • Historical data querying : new queries probe old data • Continuous querying : new data streams probe old queries • RESULT MATERIALIZATION
Mechanism • Each registered query gets a unique QueryID • Client uses the QueryID as a handle for further invocations. • Between invocations, PSoup matches data to query predicates, stores matches in a Result Structure . • Input window is applied to Result Structure to materialize the current results and return them.
Implementation extensions to Telegraph for PSoup : • Eddy : tuple router • SteM : data structures for probe and insert methods over their contents • Red – Black tree – based structure for SteM indexing • Results Structure : to store metadata about the tuples that satisfied the SQC
Performance Analysis • NoMat vs. PSoup – P vs. PSoup – C : • PSoup – P uses bit array for results structure • PSoup – C uses linked list for results structure • Materializing results of queries supports higher query invocation rates • Indexing queries and lazily applying input windows improves the maximum data throughput.
Optimization – removing redundancy in join processing • Using “single query - multiple data “ composite tuples for common predicates . • Using “single data - multiple query “ composite tuples for new data insertion .
Conclusion • PSoup supports queries that require access to data that appeared both before and after the query specification. • PSoup supports disconnected operation by separating computation of results from their delivery using result materialization.
Further work • Make PSoup capable of archiving data streams to disk, instead of its current implementation as a main memory system. • Allow PSoup to be used as a query browser for temporal data instead of only for current window calculations.