380 likes | 462 Views
Modeling Stream Processing Applications for Dependability Evaluation. Gabriela Jacques-Silva †‡ , Zbigniew Kalbarczyk † , Bugra Gedik ‡ , Henrique Andrade ‡ , Kun-Lung Wu ‡ , Ravishankar K. Iyer †. † University of Illinois at Urbana-Champaign ‡ IBM Research – T. J. Watson Research Center.
E N D
Modeling Stream Processing Applications for Dependability Evaluation Gabriela Jacques-Silva†‡, Zbigniew Kalbarczyk†, Bugra Gedik‡, Henrique Andrade‡, Kun-Lung Wu‡, Ravishankar K. Iyer† †University of Illinois at Urbana-Champaign ‡IBM Research – T. J. Watson Research Center
Outline • Streaming applications • Modeling a streaming application • Stream operator, stream connections and tuples • Representation of faults and error propagation • Extending model to include fault tolerance techniques • Evaluation
Extract knowledge from live data streams on-the-fly. data streams Percentage of positive feedback 9.57% stream operators 5.42% 3.16% 2.52% 1.28% tuples 2
Different approaches to fault tolerance have different resource consumption and performance impact. • Some techniques aims at providing no data loss an no data duplication guarantees Percentage of positive feedback 9.57% 5.42% 3.16% 2.52% 1.28%
Different approaches to fault tolerance have different resource consumption and performance impact. partial fault tolerance • Decreases time to achieve stable output as compared to no recovery • Achieves approximate results, which are tolerable by some streaming applications Percentage of positive feedback 9.32% 5.11% 2.84% 2.27% 1.09% 4
An evaluation framework helps to understand the relative merits of different techniques. • Previous approaches focus on performance evaluation • Fault injection may be expensive, mainly when evaluating the application under different setups and parameters Checkpoint Partial graph replication
Summary of contributions • Modeling framework for evaluating streaming applications under faults that lead to data loss and data corruption • Considers consequences of error propagation • Based on generic models specified via Stochastic Activity Networks (SAN) • Abstractions for stream operators, stream connections, and tuples • Modeled three fault tolerance techniques • Checkpointing, partial replication, and full replication
Modeling framework uses Stochastic Activity Network formalism. • SANs can express the non-deterministic behavior and parallel execution of streaming application • Nomenclature • Place container for a natural number • Activity transition between places • Token item in a place • Input gate enforce condition to activity • Output gate executes function after activity
Framework is based on the abstraction of three key components of a SPA. • Stream operator state transition model • Captures arity, selectivity and processing time IG1 F1 Waiting for input int < 9
Framework is based on the abstraction of three key components of a SPA. • Stream operator state transition model • Captures arity, selectivity and processing time Processing tuple IG1 F1 Waiting for input int < 9
Framework is based on the abstraction of three key components of a SPA. • Stream operator state transition model • Captures arity, selectivity and processing time Processing tuple IG1 F1 Waiting for input int < 9
Framework is based on the abstraction of three key components of a SPA. • Stream operator state transition model • Captures arity, selectivity and processing time input stream connections Processing tuple IG1 F1 Waiting for input int < 9
Framework is based on the abstraction of three key components of a SPA. • Stream operator state transition model • Captures arity, selectivity and processing time input stream connections Processing tuple IG1 F1 Waiting for input int < 9 Sending output OG1 output buffer
Framework is based on the abstraction of three key components of a SPA. • Stream operator state transition model • Captures arity, selectivity and processing time input stream connections Processing tuple IG1 F1 Waiting for input int < 9 output stream connections Sending output OG1 output buffer OG2
Framework is based on the abstraction of three key components of a SPA. • Stream connections state sharing between output and input streams
Framework is based on the abstraction of three key components of a SPA. • Stream connections state sharing between output and input streams
Framework is based on the abstraction of three key components of a SPA. • Tuples tokens flying through input and output streams • Representation of tuple sizes, but no attribute values
Stream operator failure model considers crashes and SDCs. • Crash data loss for partial fault tolerance techniques 9.32% 5.11% 2.84% 2.27% 1.09%
Stream operator failure model considers crashes and SDCs. • Crash data loss for partial fault tolerance techniques • Silent data corruption corruption of attribute values 9.53% 5.42% 3.14% 2.52% 1.28%
Base model is augmented to represent error propagation. • Once a failure occurs, operators may generate inaccurate data • Represented via tainted tuples and tainted stream connections input stream connection Processing tuple Waiting for input output stream connection Sending output
Base model is augmented to represent error propagation. • Once a fault occurs, operators may generate inaccurate data • Represented via tainted tuples and tainted stream connections Processing tainted tuple input stream connection tainted input stream connection Processing tuple Waiting for input output stream connection Sending output tainted output stream connection is tainted
Stateless operators do not generate tainted tuples after crash and restore. No crash Crash 10 5 6 3 10 5 F1 F1 X X int < 9 int < 9 6 3 Once operator recovers, the data is accurate
Stateful operators generate tainted tuples after crash and restore. No crash Crash – after restore 1 2 8 7 6 5 10 6 9 8 16 10 F1 F1 2 3 3 4 X X 6 5 4 7 After recovery, operator produces tainted tuples until its internal state refreshes
Stateful operators generate tainted data upon crash of any operator in the upstream set. No crash Change in internal state 1 7 6 5 4 3 6 5 10 6 F1 F1 2 3 int < 9 4
Stateful operators generate tainted data upon crash of any operator in the upstream set. Crash Internal state is unchanged 2 9 8 7 9 8 16 10 F1 F1 3 4 X X int < 9 6 5 7 After crashed operator recovers, operator produces tainted tuples until its internal state refreshes
Checkpoint of Operator State No crash Crash – after restore • Model is parameterized to capture how long it takes to produce good results after a failure 1 2 8 7 6 5 10 6 9 8 16 10 F1 F1 2 3 3 4 X X 6 5 4 7 G. Jacques-Silva et al. “Language Level Checkpointing Support for Stream Processing Applications”. DSN 2009.
Partial Graph Replication • Replicated operators and stream connections on composed application model • Extra logic in replicated operators to perform replica failover failover active op1,A op1,B deactivate backup G. Jacques-Silva et al. “Language Level Checkpointing Support for Stream Processing Applications”. DSN 2009.
Full graph replication • Extra logic for operators to perform de-duplication on tuples coming from redundant streams • Aims at no tuple loss and non duplicate delivery J.-H. Hwang et al. “Fast and highly-available stream processing over wide area networks”. ICDE 2008.
Checkpoint vs. Partial Replication Under Crashes • Target Bargain Discovery • Stateless - source, sink, 4 filters • Stateful – aggregate and join • Operator MTTF - 30, 50, 70 and 90 min • Model parameters taken from application executing in IBM System S Partial replication + Checkpoint Checkpoint f(x) f(x) 2 f(x)2 f(x)2 f(x) f(x)2 1 f(x) f(x)1 f(x)1 f(x)1 f(x)
Evaluation Metrics • Availability • All operators are alive and are not producing tainted data • Total number of tainted tuples • Total number of tainted tuples stored by the sink operator • Percentage of tainted tuples • Fraction of tainted tuples stored by the sink over total number of tuples produced by the golden run
Partial replication provides better availability than checkpoint.
Partial replication produces less tainted tuples than checkpoint.
Impact of SDC on Full Replication Technique • Target Bargain Discovery • Operator MTTF – 120 min 2 f(x)2 f(x)2 f(x)2 2 1 f(x)1 f(x)1 1 f(x)1 f(x)1 f(x)2
Percentage of tainted tuples is small when compared to golden run. 120 min
SDC breaks non-duplication guarantee promised by full replication technique. tainted tuples + non-tainted tuples > non-tainted tuples of golden run + confidence interval 120 min
Summary • Modeling framework to evaluate the dependability provided by different techniques • Assemble applications by composing stream operators, stream connections and tuples • Demonstrated framework with three fault tolerance techniques • Validation by comparing results with real fault injections and application executing in IBM System S • Future • Automatic model composition based on application source code and physical deployment
Modeling Stream Processing Applications for Dependability Evaluation Gabriela Jacques-Silva†‡, Zbigniew Kalbarczyk†, Bugra Gedik‡, Henrique Andrade‡, Kun-Lung Wu‡, Ravishankar K. Iyer† †University of Illinois at Urbana-Champaign ‡IBM Research – T. J. Watson Research Center