320 likes | 552 Views
Monitoring Streams -- A New Class of Data Management Applications. Don Carney Brown University Uğur Çetintemel Brown University Mitch Cherniack Brandeis University Christian Convey Brown University Sangdon Lee Brown University Greg Seidman Brown University Michael Stonebraker MIT
E N D
Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis University Christian Convey Brown University Sangdon Lee Brown University Greg Seidman Brown University Michael Stonebraker MIT Nesime Tatbul Brown University Stan Zdonik Brown University
Background • MIT/Brown/Brandeis team • First Aurora, then Borealis • Practical system • Designed for Scalablility: 106 stream inputs, queries • QoS-Driven Resource Management • Stream Storage Management • Realiability/ Fault Tolerance • Distribution and Adaptivity • First stream startup: StreamBase • Financial applications
Example Stream Applications • Market Analysis • Streams of Stock Exchange Data • Critical Care • Streams of Vital Sign Measurements • Physical Plant Monitoring • Streams of Environmental Readings • Biological Population Tracking • Streams of Positions from Individuals of a Species
Not Your Average DBMS • External, Autonomous Data Sources • Querying Time-Series • Triggers-in-the-large • Real-time response requirements • Noisy Data, Approximate Query Results
Outline 2. Aurora Overview/ Query Model • Runtime Operation • Adaptivity ®
App App App QoS QoS QoS • Each Provides: • Aover input data streams • A Quality-Of-Service Specification ( ) • (specifies utility of partial or late results) Application Query QoS Aurora from 100,000 Feet Query . . . . . . . . . Query . . . . . . . . . . . . Query
App App App QoS QoS QoS Aurora from 100 Feet Slide s s . . . . . . s s m . . . . . . . . . È m Tumble s m • Query Operators (Boxes) • Simple: FILTER, MAP, RESTREAM • Binary: UNION, JOIN, RESAMPLE • Windowed: TUMBLE, SLIDE, XSECTION, WSORT • Queries = Workflow (Boxes and Arcs) • Workflow Diagram = “Aurora Network” • Boxes = Query Operators • Arcs = Streams • Streams (Arcs) • stream: tuple sequence from common source • (e.g., sensor) • tuples timestamped on arrival (Internal use: QoS)
App App App QoS QoS QoS Aurora in Action Slide s s s s s s . . . . . . s s s s s s s App m s s s s m m s . . . . . . È È È . . . È È È È m m m App Tumble Tumble Tumble s s s s m m s m Arcs ® Tuple Queues “Box-at-a-time” Scheduling Outputs Monitored for QoS
Queues … … … … … O1 O2 O3 App continuous query QoS QoS QoS O4 O5 App … ad-hoc query O8 O9 O7 3 Days view Continuous and Historical Queries 1 Hour Connection Point
Quality-of-Service (QoS) B C A Specifies “Utility” Of Imperfect Query Results Delay-Based (specify utility of late results) Delivery-Based, Value-Based (specify utility of partial results) QoS Influences… Scheduling, Storage Management, Load Shedding %TuplesDelivered Output Value Delay
Talk Outline • Introduction 2. Aurora Overview 3. Runtime Operation 4. Adaptivity 5. Related Work and Conclusions ®
inputs outputs Storage Manager q1 q2 . . . s s qi m Buffer . . . . . . È È Persistent Store Catalog q1 q2 . . . qn … … … … … … Runtime OperationBasic Architecture Router Scheduler Box Processors QOS Monitor
Runtime OperationScheduling: Maximize Overall QoS Delay = 2 sec Utility = 0.5 Choice 1: A: Cost: 1 sec (…, age: 1 sec) Delay = 5 sec Utility = 0.8 B: Cost: 2 sec Choice 2: (…, age: 3 sec) Schedule Box A now rather than later Ideal: Maximize Overall Utility Presently exploring scalable heuristics (e.g., feedback-based)
… … … z z z y y y x x x AB B (A (z)) B (A (y)) B (A (x)) Box Trains: B B (A (z), A (y), A (x)) A A (z, y, x) Tuple Trains: Runtime OperationScheduling: Minimizing Per Tuple Processing Overhead Train Scheduling: B A A (z) A (y) A (x) B (A (z)) B (A (y)) B (A (x)) Default Operation: = Context Switch
Runtime OperationStorage Management • Run-time Queue Management Prefetch Queues Prior to Being Scheduled Drop Tuples from Queues to Improve QoS 2. Connection Point Management Support Efficient (Pull-Based) Access to Historical Data E.g., indexing, sorting, clustering, …
Talk Outline • Introduction 2. Aurora Overview 3. Runtime Operation 4. Adaptivity 5. Related Work and Conclusions ®
Stream Query Optimization • Differences with Traditional Query Optimization?
Stream Query Optimization • New classes of operators (windows) may mean new rewrites • New execution modes (continuous/pipelining) • More dynamic fluctuations in statistics compile time optimization not possible • Global optimization not practical; as huge query networks Adaptive optimization. • Other cost models taking memory into account, not throughput but output rate, etc. • Query optimization and load shedding
Query Optimization Compile-time, Global Optimization Infeasible Too Many Boxes Too Much Volatility in Network, Data Dynamic, Local Optimization Threshold re when to optimize
Motivation of ‘Query Migration’ • Continuous query over streams • Statistics unknown before start • Statistics changing during execution • Stream rates, arrival pattern, distribution, etc • Need for dynamic adaptation • Plan re-optimization • Change the shape of query plan tree
Run-time Plan Re-Optimization • Step 1 - Decide when to optimize • Statistics Monitoring • Step 2 – Generate new query plan • Query Optimization • Step 3 – Replace current plan by new plan • Plan Migration
Adaptivity in Query Optimization Dynamic Optimization : Migration 1. Identify Subnetwork 2. Buffer Inputs 3. Drain Subnetwork 4. Optimize Subnetwork 5. Turn on Taps
Stateful Operator in CQ Example: Symmetric NL join w/ window constraints • Why stateful • Need non-blocking operators in CQ • Operator needs to output partial results • State data structure keep received tuples ax b2 ax b3 State A State B Key Observation: The purge of tuples in states relies on processing of new tuples. AB b1 b2 b3 b4 b5 ax A B ax
Naïve Migration Strategy Revisited BC AB Deadlock Waiting Problem: • Steps (1) Pause execution of old plan (2) Drain out alltuples inside old plan (3) Replace old plan by new plan (4) Resume execution of new plan A B C (2) All tuples drained (3) Old Replaced By new (4) Processing Resumed
AdaptivityQuery Optimization State Movement Protocol Parallel Track Protocol
Moving State Strategy • Basic idea • Share common states between two migration boxes • Key steps • State Matching • Match states based on IDs. • State Moving • Create new pointers for matched states in new box • What’s left? • Unmatched states in new box QABCD QABCD CD AB SABC SBCD SD SA CD BC SD SBC SAB SC BC AB SB SC SA SB QA QB QC QD QA QB QC QD Old Box New Box
Parallel Track Strategy • Basic idea • Execute both plans in parallel and gradually “push” old tuples out of old box by purging • Key steps • Connect boxes • Execute in parallel • Until old box “expired” (no old tuple or sub-tuple) • Disconnect old box • Start execute new box only QABCD QABCD SABC SD SBCD SA CD AB SBC SAB SD SC BC CD SA SB SB SC BC AB QA QB QC QD QD QA QB QC
AdaptivityLoad Shedding 1. Two Load Shedding Techniques: • Random Tuple Drops Add DROP box to network(DROP a special case of FILTER) Position to affect queries w/ tolerant delivery-based QoS reqts • Semantic Load Shedding FILTER values with low utility (acc to value-based QoS) 2. Triggered by QoS Monitor e.g., after Latency Analysis reveals certain applications are continuously receiving poor QoS
Output rate = min (1/c, r) * s Monitor each application’s Delay-based QoS I I I I I I I I I O O O O O O O O O C,S C,S C,S C,S C,S C,S C,S C,S C,S Problem: Too many apps in “bad zone” P P P P P P P P P AdaptivityDetecting Overload Throughput Analysis Cost = c Selectivity = s Input rate = r 1/c > r Þ Problem Latency Analysis
5 3 2 1 4 0 6 ImplementationRuntime
Conclusions Aurora Stream Query Processing System • Designed for Scalability • QoS-Driven Resource Management • Continuous and Historical Queries • Stream Storage Management • Implemented Prototype Web site: www.cs.brown.edu/research/aurora/