140 likes | 299 Views
Chapter 10: Stream-based Data Management. Title: Design, Implementation, and Evaluation of the Linear Road Benchmark on the Stream Processing Core Authors: Navendu Jain, Lisa Amini, et. al. Design, Implementation, and Evaluation of the Linear Road Benchmark on the Stream Processing Core.
E N D
Chapter 10: Stream-based Data Management • Title: Design, Implementation, and Evaluation of the Linear Road Benchmark on the Stream Processing Core • Authors: Navendu Jain, Lisa Amini, et. al.
Design, Implementation, and Evaluation of the Linear Road Benchmark on the Stream Processing Core • Problem • Problem Statement • Why is this problem important? • Why is this problem hard? • Approaches • Approach description, key concepts • Contributions (novelty, improved) • Assumptions
Problem Statement • Given • Stream data, continuous queries in large-scale distributed environments • Streaming data application (Linear Road) • Stream processing middleware (Stream Processing Core, SPC) • Find: • Performance bottlenecks of streaming data applications • Objectives • Understand the performance characteristics of the stream data application • Constraints • SPC is constantly overloaded with respect to the available resources. • Processing elements are a mix of I/O-bound as well as CPU-bound. • It is unrealistic for applications to store the full history of a stream in memory. Memory-bound.
Why is this problem important? • High volume, continuous data are ubiquitous. • Text and transactional data • Digital audio, video, and image • Instant messages, network packet traces • Sensor data • Stream processing applications become important in the networking and database community.
Why is this problem Hard? • Stream data are • Large volume • High data rates • Generated by multiple distributed data sources • Rapidly updated • Processing stream data requires • Filtering • Aggregation • Correlation • A system supporting the stream data processing applications should consider • Scalability • Latency • Resource utilization
Novelty of Contribution • Related Work • DataCutter, StreaMIT: Connections between applications are statically determined. • TelegraphCQ, Aurora, Borealis, STREAM: provide support for stream data manipulation from a database-centric perspective, but, process streams of tuples individually. (i.e., small-scale) • Benchmarks: Previous works on Linear Road did not report any performance number • Contributions • SPC is dynamic application composition. • Evaluate the SPC using the Linear Road application employing multiple distributed configurations. Highly scalable implementation of the Linear Road application • Study the behavior of the streaming infrastructure support for large-scale continuous and historical queries. Addressing performance bottlenecks and tuning them.
SPC Architecture • Publish-subscribe model • Each processing element (PE) that consumes and produces stream data specifies the characteristics of the streams. • SPC dynamically determines the stream connections by matching stream descriptors as new applications and new data sources join and leave the system. • Reusing streams • Results in significant resource savings. • Discovers useful info. over an ever-changing set of data sources.
Performance Challenges and Optimizations in SPC • Challenges • The PEs consist of performing • Small amount of processing on large volumes of data • Large amount of processing on lower volumes of data • Thus, a mix of I/O-bound & CPU-bound • Impossible to store stream history in memory memory-bound • Optimizations • SDO filtering: SPC can filter out unwanted objects saving resources. • Events: PEs can subscribe to system events. Can adapt its algorithm. • Dynamic copies of PEs
Linear Road Benchmark • Simulates the traffic characteristics of a simple urban expressway system. • Input to the Linear Road benchmark is stream data format. • Requires stream-based data management system (SDMS) to process a set of continuous and historical queries.
Prototype Implementation • Design principles • Modularity • Data Aggregation • Network and Data Locality • Flexible Programming Environment • Linear Road in SPC • The figure shows the query network infrastructure comprising 15 PEs.
Experiments • Input data is increasing over time for stress-test • Scalability
Experiments • Analyzing Bottleneck PEs • PE Placement Policy
Summary • Paper’s focus • Understanding the performance characteristics of stream processing applications in a distributed setup • Ideas • Design and implementation of the Linear Road benchmark on the SPC middleware. • Identify the main performance bottlenecks to achieve scalability and low query response latency • Contributions • Demonstrate a scalable distributed implementation of Linear Road • Highlight the importance of addressing performance bottlenecks • Analytical Validation • Experiments • Prototyping
Assumptions, Rewrite today • Assumptions • Restrict evaluation to SPC support for the Linear Road application assuming that their design decisions are performance results are applicable to other streaming applications. • The system is constantly overloaded with respect to the available resources. • PEs are I/O, CPU, and memory bound. • Rewrite today • Apply the ideas to other types of streaming applications. • More extensive experiments on performance tuning.