10 likes | 111 Views
Mitigating Soft Errors in Embedded Systems through Selective Data Protection using Partially Protected Caches (PPC). Q critical. . CS. SER. N flux. x. x. exp. {-. }. Q s. where. Q critical. =. V. C. x. Page Mapping. Radiation-Induced Soft Errors. Selective Data Protection.
E N D
Mitigating Soft Errors in Embedded Systems through Selective Data Protection using Partially Protected Caches (PPC) Qcritical CS SER Nflux x x exp {- } Qs where Qcritical = V C x Page Mapping Radiation-Induced Soft Errors Selective Data Protection Partially Protected Caches Experiments Soft Errors Problem Statement Architecture Experimental Framework Synthesis (Synopsys) • Radiation-induced soft errors are transient and non-destructive faults in electronic devices • Reduce failures due to soft errors in caches • Minimize power and performance overheads CACTI Cache Configurations Processor Application (MiBench etc) Processor Pipeline • SER (Soft Error Rate) • FIT (Failures in Time) • 1K ~ 100K FIT @ 0.18 µm Selective Protection No Protection Protection Data Data Observation 28 years MTTF • Suppose you could protect pages from soft errors independently SAFE UNSAFE PPC Compiler (gcc) PPC Multimedia Data informed Application Data Memory N x 1 KB page 1 Month MTTF Unprotected Main Cache Protected Mini Cache Random Error Injection Cache Simulator (SimpleScalar) 1 Radiation (Alpha, Neutron, etc.) Executable 2 Hamming Code (32,6) Number of Failures N KB Accelerated Soft Error Injection 1000 Simulations source drain K + REPORT : Failure Rate Runtime Energy - FC FNC + FNC FC + 17 hours MTTF - - + Page Mapping Memory Controller + N - - - + + - - • Failure • application crashes • infinite loop • broken header • wrong output, etc. Experimental Results Transistor Soft Errors on Increase Memory • High Integration • Process Technology • Voltage Scaling • Latitude and Altitude • Loss in Quality of Service is not a failure FNC Comparable Failure Rate Only a few pages are sensitive to soft errors Soft Errors in Caches FC PPC (Partially Protected Caches) • Soft errors are more important in memory • No masking in memory • Redundancy techniques are popular for memory • Not applicable for cache ,sensitive to performance • Caches are most vulnerable to soft errors • Occupy majority area Intel Itanium II (0.18 um) – More than 50 % Area • Failure Rate of PPC is close to that of Safe • Unprotected Main Cache • Protected Mini Cache • Smaller than Unprotected Main Cache • Protection technique is orthogonal to PPC • Performance and power overheads 32% Reduction • All pages are not equally important Compiler Software Support – Data Partitioning • Map FNC data into Unprotected Main Cache • Map FC data into Protected Mini Cache • Failure Critical (FC) Data • loop bounds, loop iterators, branch decision variables, etc. • Soft errors on FC may cause failures • Non Failure Critical (FNC) Data • multimedia data (e.g.: image pixel values) • Only loss in Quality of Service • PPC has 32% runtime reduction from Safe ECC-based Protection is expensive • Provide the protection to only FC data in a PPC • High overheads in terms of power, area, and performance On average, 52% pages are FNC 29% Reduction • e.g., ECC protection: • Hamming Code(32,6) • Performance • by up to 95% • Energy • by up to 22% • Area • by more than 18% Coding Multimedia Applications ECC Unprotected Cache sample code (FNC, FC) • Simple Data Partitioning • Multimedia Data : FC • Other Data : FNC if ( condition ) { for ( loop = 1; loop < 64 ; loop++ ) { local = MM[loop] / ( 2*constant); MM[loop] = min(127, max( -127, MM[loop] ) ); }} • No protection • against soft errors • High failure rate • PPC has 29% energy reduction from Safe Decoding Protected Cache 1Donald Bren School of ICS University of California, Irvine, CA 92697 {kyoungwl, isse, dutt, nalini}@ics.uci.edu 2School of Computing and Informatics Arizona State University, Tempe, AZ 85281 Aviral.Shrivastava@asu.edu For more details : http://forge.ics.uci.edu/ K. Lee1, A. Shrivastava2, I. Issenin1, N. Dutt1, N. Venkatasubramanian1 June 2007