1 / 14

Chapter 10: Stream-based Data Management

Chapter 10: Stream-based Data Management. Title: Design, Implementation, and Evaluation of the Linear Road Benchmark on the Stream Processing Core Authors: Navendu Jain, Lisa Amini, et. al. Design, Implementation, and Evaluation of the Linear Road Benchmark on the Stream Processing Core.

yon
Download Presentation

Chapter 10: Stream-based Data Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 10: Stream-based Data Management • Title: Design, Implementation, and Evaluation of the Linear Road Benchmark on the Stream Processing Core • Authors: Navendu Jain, Lisa Amini, et. al.

  2. Design, Implementation, and Evaluation of the Linear Road Benchmark on the Stream Processing Core • Problem • Problem Statement • Why is this problem important? • Why is this problem hard? • Approaches • Approach description, key concepts • Contributions (novelty, improved) • Assumptions

  3. Problem Statement • Given • Stream data, continuous queries in large-scale distributed environments • Streaming data application (Linear Road) • Stream processing middleware (Stream Processing Core, SPC) • Find: • Performance bottlenecks of streaming data applications • Objectives • Understand the performance characteristics of the stream data application • Constraints • SPC is constantly overloaded with respect to the available resources. • Processing elements are a mix of I/O-bound as well as CPU-bound. • It is unrealistic for applications to store the full history of a stream in memory.  Memory-bound.

  4. Why is this problem important? • High volume, continuous data are ubiquitous. • Text and transactional data • Digital audio, video, and image • Instant messages, network packet traces • Sensor data • Stream processing applications become important in the networking and database community.

  5. Why is this problem Hard? • Stream data are • Large volume • High data rates • Generated by multiple distributed data sources • Rapidly updated • Processing stream data requires • Filtering • Aggregation • Correlation • A system supporting the stream data processing applications should consider • Scalability • Latency • Resource utilization

  6. Novelty of Contribution • Related Work • DataCutter, StreaMIT: Connections between applications are statically determined. • TelegraphCQ, Aurora, Borealis, STREAM: provide support for stream data manipulation from a database-centric perspective, but, process streams of tuples individually. (i.e., small-scale) • Benchmarks: Previous works on Linear Road did not report any performance number • Contributions • SPC is dynamic application composition. • Evaluate the SPC using the Linear Road application employing multiple distributed configurations.  Highly scalable implementation of the Linear Road application • Study the behavior of the streaming infrastructure support for large-scale continuous and historical queries.  Addressing performance bottlenecks and tuning them.

  7. SPC Architecture • Publish-subscribe model • Each processing element (PE) that consumes and produces stream data specifies the characteristics of the streams. • SPC dynamically determines the stream connections by matching stream descriptors as new applications and new data sources join and leave the system. • Reusing streams • Results in significant resource savings. • Discovers useful info. over an ever-changing set of data sources.

  8. Performance Challenges and Optimizations in SPC • Challenges • The PEs consist of performing • Small amount of processing on large volumes of data • Large amount of processing on lower volumes of data • Thus, a mix of I/O-bound & CPU-bound • Impossible to store stream history in memory  memory-bound • Optimizations • SDO filtering: SPC can filter out unwanted objects  saving resources. • Events: PEs can subscribe to system events.  Can adapt its algorithm. • Dynamic copies of PEs

  9. Linear Road Benchmark • Simulates the traffic characteristics of a simple urban expressway system. • Input to the Linear Road benchmark is stream data format. • Requires stream-based data management system (SDMS) to process a set of continuous and historical queries.

  10. Prototype Implementation • Design principles • Modularity • Data Aggregation • Network and Data Locality • Flexible Programming Environment • Linear Road in SPC • The figure shows the query network infrastructure comprising 15 PEs.

  11. Experiments • Input data is increasing over time for stress-test • Scalability

  12. Experiments • Analyzing Bottleneck PEs • PE Placement Policy

  13. Summary • Paper’s focus • Understanding the performance characteristics of stream processing applications in a distributed setup • Ideas • Design and implementation of the Linear Road benchmark on the SPC middleware. • Identify the main performance bottlenecks to achieve scalability and low query response latency • Contributions • Demonstrate a scalable distributed implementation of Linear Road • Highlight the importance of addressing performance bottlenecks • Analytical Validation • Experiments • Prototyping

  14. Assumptions, Rewrite today • Assumptions • Restrict evaluation to SPC support for the Linear Road application assuming that their design decisions are performance results are applicable to other streaming applications. • The system is constantly overloaded with respect to the available resources. • PEs are I/O, CPU, and memory bound. • Rewrite today • Apply the ideas to other types of streaming applications. • More extensive experiments on performance tuning.

More Related