1 / 17

Chapter 10: Stream-based Data Management

Chapter 10: Stream-based Data Management. Title: Retrospective on Aurora Authors: Hari Balakrishnan, et. al. Design, Implementation, and Evaluation of the Linear Road Benchmark on the Stream Processing Core. Problem Problem Statement Why is this problem important?

keziah
Download Presentation

Chapter 10: Stream-based Data Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 10: Stream-based Data Management • Title: Retrospective on Aurora • Authors: Hari Balakrishnan, et. al.

  2. Design, Implementation, and Evaluation of the Linear Road Benchmark on the Stream Processing Core • Problem • Problem Statement • Why is this problem important? • Why is this problem hard? • Approaches • Approach description, key concepts • Contributions (novelty, improved) • Assumptions

  3. Problem Statement • Given • Stream data • Experience on the development of five stream-based applications using Aurora stream processing engine • Find: • Key requirements of streaming applications • Objectives • Reflect on the design of Aurora based on this experience • Eliminate the limitations and address new challenges on a follow-on project, Borealis • Constraints • Data streams arrive in no particular order. • Data streams arrive without any temporal regularity.

  4. Why is this problem important? • Stream-processing applications • Financial Services – stock ticker • Transportation – congestion pricing, dynamic tolls • Sensor Networks – Environment monitoring • Defense – Battalion monitoring

  5. Why is this problem Hard? • High update rate • Time-series • Streaming applications entail time series. • Time series operations are not well supported by current DBMSs. • Real-time constraints • Outbound processing, where data are stored before being processed, cannot deliver real-time latency. • SPEs must adopt inbound processing, where query processing is performed directly on incoming messages. • Spikes in message load. • Incoming traffic is bursty. • Quality of Service (QOS) requirements

  6. Novel Contributions • Comparison with SQL-centric related Work: • Data Flow Network (DFN) centric • Developer – compose DFN using graphical user interface • Optimizer – rearrange DFN, e.g. swap boxes, • Compiler – Translate DFN to intermediate representation • Run-time – Schedule tasks based on QOS requirements • Other Contributions – Lessons Learnt • Identify characteristics of streaming applications • from 5 case studies • Identify core performance tuning ideas

  7. Aurora Architecture • Aurora is based on a dataflow-style ‘boxes & arrows’ paradigm unlike others using SQL style query interface. (i.e., performing query back and forth adds system overhead and latency.) • Can be spread across any number of machines for scalability and availability. Input Operator Output Aurora Operators Aurora GUI

  8. Aurora Case Study 1: Financial Services • An application detects feed problems and triggers switch between feeds in real time. • Hierarchical Alarm • Low alarm is triggered when update is delayed beyond threshold (e.g., 5 sec). • High alarm is triggered when low alarms accumulate beyond threshold (e.g., 100 times). • Boxes in red circle separate the alarms from both Reuters and Comstock into alarms from NYSE and alarms from NASDAQ. Filter & Merging techniques • This case study illustrates the ability to detect stream imperfections and extend functionality using user-defined Map functions.

  9. Aurora Case Study 2: Linear Road Benchmark • Linear Road is a bench mark for stream processing eingines. • Simulates an unban highway system that uses ‘variable tolling’ (i.e, congestion-based pricing). • Linear Road should support for • Two continuous queries • Calculates a segment toll every time a vehicle enters the segment. • Detects and reports accidents and adjusts tolls accordingly. • Three Historical queries • Request an account balance • Day’s total expenditure for a given vehicle • Prediction of travel time between two segments using historical data • Each of these queries must be answered with a specified accuracy and within a specified response time.

  10. Aurora Case Study 3: Battalion Monitoring • Aircrafts gather data and send them to monitoring stations. • Enemy units cross a given line, signaling an attack. • The limited resource is the bandwidth between aircraft and ground. When an attack is initiated, selective dropping of data is allowed to serve important classes. • Authors could test their load-shedding techniques. • Insert random drop boxes to discard a fraction of their input tuples. • Insert semantic, predicate-based drop filters. • Observations • The semantic load-shedding techniques achieve the least value utility loss. • As load increases, two techniques show similar performance. • At high loads, all algorithms converge to same loss levels.

  11. Aurora Case Study 4: Environmental Monitoring • Monitoring toxins in water. • Stream data is fish behavior (e.g., breathing rate) and water quality (e.g., temperature). • When the fish behave abnormally, an alarm is sounded. • The water data contain 1,2, and 4 hour sliding windows. • Ease of developing stream applications • Aurora proved very convenient for sliding window calculation. • Aurora’s GUI proved invaluable.

  12. Aurora Case Study 5: Medusa • Is a distributed stream-processing system using Aurora. • Takes Aurora queries and distributes them across multiple nodes. • Offers several Benefits: • Incremental scalability over multiple nodes. • High availability by mutual monitoring between nodes. • Composition of stream feeds from different participants. • Handling load spikes by federated system.

  13. Lessons Learnt: Application Characteristics • Common Queries • Historical data using Open window • Last 10 week’s worth of toll data for each driver • Aggregate - How much a driver has spent on tolls over past 10 weeks? • Tables of historical data with arbitrary update patterns • Synchronization • Stream applications rely on shared data and computation. • WaitFor (P: Predicate, T: Timeout) • Unpredictable stream behavior • Financial services application detects arrival rate of a stream. • Military application adjust resources during times of stress.

  14. Lessons Learnt: Performance Tuning • Requirements • Main memory implementation • Data movement across DFN elements • Scheduling of DFN elements • Performance Decisions • Memory copying – memcpy() implementations • Scheduler • Reduce scheduler overheads by aggressive profiling • Tight loops • keep unnecessary house-keeping out of tight loops • Data-structures • Optimize data-structures used to implement DFN elements

  15. Future Plans: Borealis • Dynamic revision of query results • Intelligently corrects query results that have already been emitted with the corrected data that arrive later. • Dynamic query modification • E.g., traders wish to be alerted of interesting events, where the def’n of ‘interesting’ varies. • Distributed optimization • Server-heavy or sensor-heavy optimization problem becomes emerging. • More flexible optimization to handle a very large # of devices • Implementation plans

  16. Summary • Paper’s focus • Identify the requirements of stream applications by the experience from the design and implementation of Aurora stream-processing engine • Ideas • Describe five applications and their implementation in detail. • Reflect on the design of Aurora based on the experience. • Discuss future ideas on follow-on project. • Contributions • Identify key requirements of streaming applications • Analytical Validation • Case study

  17. Assumptions, Rewrite today • Assumptions • Archiving is not necessary! • Performance more important than declarative query language • Rewrite today • Compare performance with competition, e.g. STREAM • Allow archiving along with stream processing • Consider other applications • RFID, cell phone applications • Include current status of Borealis implementation.

More Related