170 likes | 339 Views
Design Optimization of Time- and Cost-Constrained Fault-Tolerant Distributed Embedded Systems. Viaceslav Izosimov, Paul Pop , Petru Eles, Zebo Peng Embedded Systems Lab (ESLAB) Linköping University, Sweden. Hardware solutions MARS, TTA, X-by-Wire Permanent faults Costly for transient faults.
E N D
Design Optimization of Time- and Cost-ConstrainedFault-Tolerant Distributed Embedded Systems Viaceslav Izosimov, Paul Pop, Petru Eles, Zebo PengEmbedded Systems Lab (ESLAB)Linköping University, Sweden
Hardware solutions • MARS, TTA, X-by-Wire • Permanent faults • Costly for transient faults vs. • Software solutions • Re-execution/rollback recovery • Checkpointing/rollback recovery • Replication, primary-backup… • Software solutions • Re-execution/rollback recovery • Checkpointing/rollback recovery • Replication, primary-backup… • Online preemptive • Flexible vs. • Off-line non-preemptive • Predictable Motivation • Hard real-time applications • Timing constraints • Cost constraints • Faults • Predictable • Transient • Intermittent
Outline • Motivation • System architecture and fault-model • Fault-tolerance techniques • Problem formulation • Motivational examples • Tabu-search optimization strategy • Experimental results • Contributions and Message
Transient faults Processes:Re-execution and replication Messages:Static schedule table Time Triggered Protocol (TTP) • Bus access scheme: time-division multiple-access (TDMA) • Schedule table located in each TTP controller: message descriptor list (MEDL) S1 S3 S2 S4 S1 S3 S2 S4 Slot TDMA Round Cycle of two rounds Fault-Tolerant Time-Triggered Systems Processes:Static cyclic scheduling ... Messages:Fault-tolerant protocol
2 N1 P1 N1 P1 P1 N2 P1 P1 N2 N3 P1 Re-executedreplicas Replication Fault-Tolerant Techniques N1 P1 P1 P1 Re-execution
Fault-model: transient faults Application: set of process graphs Architecture: time-triggered system Problem Formulation • Given • Fault model • Number of transient faults in the system period • System architecture • Application • WCETs, message sizes, periods, deadlines • Determine • Schedulable and fault-tolerant design implementation • Fault-tolerance policy assignment • Mapping of processes and messages • Schedule tables for processes and messages
Contingency schedules P2 P2 P3 P4 N1: S2 m1 P1 P1 N2: S12 P2 P4 P3 P3 P4 P4 P3 P1 P4 m2 P5 N3: S14 2 S1 S11 N1 N2 N3 m2 S2 S12 P5 m1 Contingencyschedules P1 P2 P3 P4 Static Scheduling [Kandasamy et al. 03] Transparentre-execution Recoveryslack P2 P3 P4 N1: S1 m1 P1 N2: S11 m2 P5 N3: S14 N1 N2 N3 Root schedules S1 S11 S14 P2 P5 P5 P1 S2 S6 S9 S12 S15 S18 S3 S4 S5 S7 S8 S10 S13
Deadline Deadline N1 P1 P2 P3 Missed N1 P1 P2 P3 P1 N2 P1 P2 P3 Met N2 P1 P2 P3 m1 P3 P2 TTP TTP S1 S2 m2 m1 m1 m2 S1 S2 m1 m1 N1 P1 P2 P3 N1 P1 P2 Met Missed N2 N2 P3 N1 N2 TTP S1 S2 N1 N2 TTP S1 S2 P1 40 50 P2 40 50 P3 60 70 m2 m1 1 P1 P2 P3 A2 Re-execution vs. Replication Re-execution is better Replication is better A1
Deadline N1 P1 P2 P4 No fault-tolerance: application crashes N1 P1 P2 P4 N2 P3 N2 P3 TTP S1 S2 m2 TTP S1 S2 m2 1 Optimizationof fault-tolerancepolicy assignment N1 P1 P2 P4 Met N1 P1 P2 P3 P4 N1 N2 Missed N2 P1 P3 Missed P1 40 50 N2 P1 P2 P3 P4 P2 60 80 P3 60 80 TTP S1 S2 m2 m1 P4 40 50 TTP S1 S2 m1 m2 m1 m2 m3 m3 N1 N2 N1 P1 P2 P4 N2 P3 m2 P3 TTP S1 S2 m2 P1 m3 P2 P4 m1 Fault-Tolerant Policy Assignment
N1 P1 P2 P4 N2 P3 TTP S1 S2 m4 m2 Deadline Deadline N1 N1 P1 P1 P2 P2 P3 P4 P4 Met P1 m1 m2 N2 N2 P3 Missed P2 P3 N1 N2 TTP TTP S1 S1 S2 S2 m2 m2 m4 m4 m3 m4 P1 40 X P4 N1 N2 P2 60 70 P3 60 70 P4 40 X 1 Mapping and Fault-Tolerance Best mapping without considering fault-tolerance Simultaneous mapping and fault-tolerance
Optimization Strategy • Design optimization: • Fault-tolerance policy assignment • Mapping of processes and messages • Root schedules • Three tabu-search optimization algorithms: • Mapping and Fault-Tolerance Policy assignment (MRX) • Re-execution, replication or both • Mapping and only Re-Execution (MX) • Mapping and only Replication (MR) Tabu-search List scheduling
Designtransformations Designtransformations N1 N2 P1 40 50 P2 60 75 P3 60 75 P4 40 50 1 N1 N2 m2 P3 P1 m3 P2 P4 m1 MRX Tabu-Search Example
Mapping and replication (MR) • Case study • Vehicle cruise controller • MRX: schedulable fault-tolerant application with 65% overhead 80 Mapping and re-execution (MX) 20 Experimental Results Schedulability improvement under resource constraints 100 90 80 70 60 Avgerage % deviation from MRX 50 40 30 20 10 Mapping and policy assignment (MRX) 0 20 40 60 80 100 Number of processes
Optimization of fault-tolerance policy assignment needed for cost-effective fault tolerance Contributions and Message • Contributions • Combined re-execution and replication • Optimization algorithms for fault-tolerance policy assignment • Efficient contingency schedule generation