1 / 15

Streaming Queries over Streaming Data

Streaming Queries over Streaming Data. Authors : Sirish Chandrasekaran, UC Berkeley Michael J. Franklin , UC Berkeley OriginallyPresented at the 28 th VLDB Conference, Hong Kong, China, 2002 Presented by Amit Choudhri for CMSC 891 J. I. Java – based system called PSoup

sharne
Download Presentation

Streaming Queries over Streaming Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Streaming Queries over Streaming Data Authors : • Sirish Chandrasekaran, UC Berkeley • Michael J. Franklin , UC Berkeley OriginallyPresented at the 28th VLDB Conference, Hong Kong, China, 2002 • Presented by Amit Choudhri for CMSC 891 J

  2. I • Java – based system called PSoup • Processes both ad-hoc and continuous queries • Treats data and queries symmetrically as streams • Streams are duals of each other • New queries can be applied to old data • New data can be applied to old queries

  3. PSoup uses query specifications of the form:SELECT select_list FROM from_list WHERE conjoined_boolean_factors BEGIN begin_time END end_time • Select – From – Where is the Standing Query Clause (SQC) • Begin – End specifies the input window for which results have to be computed.

  4. State Modules (SteM): • Query SteM ( one for all query specifications ) • Data SteM ( one per data stream ) • Historical data querying : new queries probe old data • Continuous querying : new data streams probe old queries • RESULT MATERIALIZATION

  5. Mechanism • Each registered query gets a unique QueryID • Client uses the QueryID as a handle for further invocations. • Between invocations, PSoup matches data to query predicates, stores matches in a Result Structure . • Input window is applied to Result Structure to materialize the current results and return them.

  6. Selection queries over a single stream

  7. Selection query processing : entry of new data

  8. Join Queries over Multiple Streams

  9. Implementation extensions to Telegraph for PSoup : • Eddy : tuple router • SteM : data structures for probe and insert methods over their contents • Red – Black tree – based structure for SteM indexing • Results Structure : to store metadata about the tuples that satisfied the SQC

  10. Performance Analysis • NoMat vs. PSoup – P vs. PSoup – C : • PSoup – P uses bit array for results structure • PSoup – C uses linked list for results structure • Materializing results of queries supports higher query invocation rates • Indexing queries and lazily applying input windows improves the maximum data throughput.

  11. Optimization – removing redundancy in join processing • Using “single query - multiple data “ composite tuples for common predicates . • Using “single data - multiple query “ composite tuples for new data insertion .

  12. Conclusion • PSoup supports queries that require access to data that appeared both before and after the query specification. • PSoup supports disconnected operation by separating computation of results from their delivery using result materialization.

  13. Further work • Make PSoup capable of archiving data streams to disk, instead of its current implementation as a main memory system. • Allow PSoup to be used as a query browser for temporal data instead of only for current window calculations.

More Related