80 likes | 199 Views
Generalized Reliability-Oriented Energy Management for Real-time Embedded Applications. Baoxian Zhao Hakan Aydin Dakai Zhu Computer S cience Department Computer Science Department
E N D
Generalized Reliability-Oriented Energy Management for Real-time Embedded Applications Baoxian Zhao Hakan Aydin Dakai Zhu Computer Science Department Computer Science Department George Mason University University of Texas at San Antonio DAC 2011 Sponsored by NSF CNS-1016855, CNS-1016974 and CAREER Awards CNS-0546244 ,CNS-0953005
Introduction and Motivation • Dynamic Voltage Scaling (DVFS) • Adjusts CPU voltage and frequency on the fly to save energy • Increases task response times • Transient faults / soft errors • Increasingly common with technology scaling and reduced design margins • Reliability: Probability of completing the task successfully • DVFS and transient faults • Execution at low frequency/voltagelevels has a significant and negative effect on the system reliability • Due to the exponentially increased transient fault rates at low supply voltage and frequency levels [Zhu et al., ICCAD’04] • Due to the increased execution time of the task
Existing Solutions • Reliability-Aware Power Management (RA-PM) • [Zhu and Aydin, ICCAD’06, RTAS’07, IEEE TC’09] • Use DVFS only for a subset of tasks; no DVFS for others • For every scaled task, schedule a separate recovery task • Preserve the original reliability of the task set • Shared Recovery Technique [Zhao et al ICCAD’09] • Single recovery task shared by all tasks • This Work: Generalized Shared Recovery (GSHR) Technique • Targets any reliability level set by the designer • May be lower or much higher than the original reliability • Use multiple shared recovery tasks as appropriate
Generalized Shared Recovery (GSHR) • Energy-Optimal Reliability Configuration Problem • Determine optimal frequency assignments f1, f2,…, fn, and optimal number (k)of recoveries to: • Minimize Energy • Subject to: Reliability Constraint Deadline Constraint
Our Solutions • Uniform Frequency (UF) • Assign a unique frequency to all the tasks to meet the deadline and reliability constraints • Incremental Reliability Configuration Search (IRCS) • Iteratively scale down tasks by one level at a time by comparing their “energy/reliability ratios (ERRs)” • ERR is a utility measure giving energy savings per unit reliability degradation • Compared against Exhaustive Search (OPT) and traditional RA-PM schemes
Simulation results The six discrete frequency levels are modeled after Intel Xscale processor, Transient faults follow Poisson distribution : λ0 =10-6, fmax=1 andfmin =0.1
Conclusions • GSHR: A general framework for real-time embedded systems • Achieves arbitrary reliability levels with minimum energy consumption • Recovery tasks shared by all tasks as needed • Ultimate aim: optimal co-management of energy and reliability Please see our poster for additional details!!