300 likes | 484 Views
Lifetime Reliability-Aware Task Allocation and Scheduling for MPSoC Platforms. Lin Huang, Feng Yuan and Qiang Xu Reliable Computing Laboratory Department of Computer Science & Engineering The Chinese University of Hong Kong DATE’09. Lifetime Reliability of Embedded Multiprocessor Platform.
E N D
Lifetime Reliability-Aware Task Allocation and Scheduling for MPSoC Platforms Lin Huang, Feng Yuan and Qiang Xu Reliable Computing Laboratory Department of Computer Science & Engineering The Chinese University of Hong Kong DATE’09
Lifetime Reliability of Embedded Multiprocessor Platform • Multiprocessor system-on-a-chip (MPSoC) • Platform-based design • Hardware / software co-synthesis • Reliability issue • IC product wear-out lifetime reliability threats • Time dependent dielectric breakdown (TDDB), electromigration (EM), stress migration (SM), negative bias temperature instability (NBTI) • Soft errors
Prior Work • Prior work in reliability-driven task allocation and scheduling • Constant failure rate • Limitation of thermal-aware task scheduling • Might improve the system’s lifetime reliability implicitly • Not readily applicable, especially for heterogeneous MPSoC
MPSoC Platform P1 P2 Problem Motivation Example • Electromigration • Suppose , and all other parameters are the same • P1 ages much faster than P2, dominating the MPSoC lifetime
T0 Task Graph T1 MPSoC Platform T2 P1 P2 T3 T4 P1 Periodical Schedule T2 T4 T0 T1 T3 P2 Problem Formulation • Task allocation and scheduling • Output • Aim: to maximize the expected service life (mean time to failure, MTTF) of the MPSoC system under the performance constraint Binding & Scheduling
Temperature Variation Example Lifetime Reliability Estimation • Electromigration • Denote by the reliability of a single processor at time • Expected service life • Weibull distribution Computed by existing hard error models Reflect some important factors (e.g., architecture properties)
P1 Periodical Schedule T2 T4 P2 T0 T1 T3 Main Approach– Simulated Annealing • Solution representation • (schedule order sequence; resource assignment sequence) • For example, (0, 1, 3, 2, 4; P2, P2, P2, P1, P1) • Schedule order sequence: partial order defined by task graph • Every solution corresponds to a feasible schedule • Schedule Reconstruction
Main Approach– Simulated Annealing • Transforms of directed acyclic graph • Expanded task graph • Undirected complement graph • Lemma: Given a valid schedule order , swapping adjacent nodes leads to another valid schedule order, provided there is an edge between these two nodes in the complement graph T0 T1 T0 T1 T0 T1 T2 T3 T4 T2 T3 T4 T2 T3 T4 Task Graph Expanded Task Graph Complement Graph
Main Approach– Simulated Annealing • Theorem: Starting from a valid schedule order we are able to reach any other valid schedule order after finite times of adjacent swapping • For example 3 2 0 4 1 0 2 3 4 1 2 0 3 4 1 2 0 3 1 4 T0 T1 T0 T1 T0 T1 T2 T3 T4 T2 T3 T4 T2 T3 T4 Task Graph Expanded Task Graph Complement Graph
Main Approach– Simulated Annealing • Moves • M1: Swap two adjacent nodes in both schedule order sequence and resource assignment sequence, if there is an edge between these two nodes in the complement graph • M2: Swap two adjacent nodes in resource assignment sequence • M3: Change the resource assignment of a task T0 T1 T0 T1 T0 T1 T2 T3 T4 T2 T3 T4 T2 T3 T4 Task Graph Expanded Task Graph Complement Graph
Main Approach– Simulated Annealing • Three moves are defined, so that • Starting from a valid schedule order A, we are able to reach any other valid schedule order B after finite times of adjacent swapping • Cost function • First term guarantees a schedule meet all tasks’ deadlines • Second term indicates the system lifetime Significant large
Main Approach– Simulated Annealing • Key problem: Computation time • Source of time overhead • Run temperature simulator EVERY TIME we reach a new solution • Simulator is called 3×105 times • Every time trace the temperature variation for entire service life • In range of years • Accurate calculation requires fine- grained variation trace file • Significant / within very short time • An efficient cost computation strategy is essential! SA parameters
Revisit System Lifetime Reliability Estimation – Speedup I • It will be better if we are able to compute MTTF by tracing the temperature variation of only one period
Revisit System Lifetime Reliability Estimation – Speedup I A subdivision of time ……
Revisit System Lifetime Reliability Estimation – Speedup I Given Aging effect in one period Property: does not vary from period to period This property enables us to tracethe temperature variation of only ONE period
Revisit System Lifetime Reliability Estimation – Speedup I The expected service life of one processor is Provided no redundant processors in the system, expected service life of entire system is
Revisit System Lifetime Reliability Estimation – Speedup II • Given • Instead of computing the aging effect in every period, we propose to compute the aging effect of periods at one time
Revisit System Lifetime Reliability Estimation – Speedup III • Accurate calculation requests setting the length of time intervals as very small value • Use steady temperature rather than accurate temporal temperature Temperature Variation Example Task Schedule
Revisit System Lifetime Reliability Estimation – Speedup IV Need to run temperature simulator every time we reach a new solution There can be at most kinds of processor usage combinations in task schedules Given = 3, = 4, we need only 255 times pre-computation, each for a steady temperature Estimate processors’ temperature for various processor usage combinations in pre-calculation phase only
Revisit System Lifetime Reliability Estimation – Speedup IV Time slot The set of under-used processors The power consumption of the tasks running on these processors Categorize the tasks into types according to power consumption E.g., Processor index under usage Task power consumption type
Revisit System Lifetime Reliability Estimation – Speedup IV Pre-calculate the steady temperature of processor in time slot The aging effect in unit time in this case is therefore The aging effect of P1 in this schedule in a period is
Revisit System Lifetime Reliability Estimation – Summary A summary of speedup techniques Rewrite MTTF expression in terms of aging effect in one period Compute the aging effect of several periods at one time Approximate aging effect in one period based on the task changes and using steady temperature Call temperature estimation simulator in the pre-calculation phase only The time consumption of pre-calculation can be even reduced
Experimental Setup • Random task graphs generated by TGFF • Task numbers range from 20 to 260 • Hypothetical MPSoC platforms • Processor core numbers range from 2 to 8 • Homogeneous / Heterogeneous • Take electromigration model in [Goel-IEEEPress07] as example • Note that, our model also applied to other failure mechanisms • Compare our method with a thermal-aware task scheduling algorithm proposed in [Xie-JVLSISP06]
Accuracy • Comparison between approximated MTTF and accurate value
Lifetime Reliability of Various Platforms with Various Task Graphs Δ: Difference ratio between MTTF of simulated annealing and that of thermal aware DR: Deadline Relaxation
Efficiency • The simulated annealing process requests 50-200s of CPU time on Intel(R) Core(TM) 2 CPU 2.13GHz for each case • 4 processors 49 tasks – 84s • 8 processors 101 tasks – 158s • The CPU time spending on pre-calculation ranges from 3s to 160s
Conclusion • Technology advancement has brought with adverse impact of on lifetime reliability of MPSoC embedded systems • Prior work on task allocation and scheduling does not explicitly take wearout failure into account • We propose an analytical model to estimate the lifetime reliability of multiprocessor platforms under periodical tasks • We present a novel lifetime reliability-aware algorithm based on simulated annealing technique • We propose several speedup techniques to simplify the design space exploration process with satisfactory solution quality • Experimental results demonstrate the effectiveness
Lifetime Reliability-Aware Task Allocation and Scheduling for MPSoC Platforms Thank you for your attention !