210 likes | 381 Views
Mahdi Fazeli , Seyed Ghassem Miremadi , Hossein Asadi , Seyed Nematollah Ahmadian. A Fast and Accurate Multi-Cycle Soft Error Rate Estimation Approach to Resilient Embedded Systems Design. Presenter : Saman Aliari University of Illinois at Urbana Chamapign.
E N D
Mahdi Fazeli, SeyedGhassemMiremadi, HosseinAsadi, SeyedNematollahAhmadian A Fast and Accurate Multi-Cycle Soft Error Rate Estimation Approach to Resilient Embedded Systems Design Presenter: SamanAliari University of Illinois at Urbana Chamapign Department of Computer Engineering Sharif University of Technology Tehran, IRAN
Speech Outlines • Soft Errors • SER Modeling in Multi-Cycle Operation • SER Modeling in Single Cycle Operation • Proposed SER Modeling in Multi Cycle Operation • Tool Overview • Experimental Results and Discussions • Conclusions
What is soft error? Energetic Particle 1 1 0 • Transient Faults • Due to radiation events • 1 0 or 0 1 • Alpha particles or Neutrons • Memory, Flip-flops, Combinational Logic 1 0 1 0 1 0
Evidences of Particle Strikes • 2000 [Forbes Magezine’00] • SUN Enterprise servers crash, due to Cache problem • 2001[ITRS’01] • Soft errors as a major issue in chip design • 2003 [EE Times’04] • Cisco routers failure, due to soft errors • 2004[Xilinx.com] • Xilinx FPGAs highly sensitive to soft errors • 2005[Selse.org] • Soft error workshop (70% industry attendees) • 2011 [ZeroSoft’06] • Expected 70% chips to fail in a year
Multi-Cycle Soft Error Propagation First Cycle: The SET does not propagate to the Primary Output (PO) Second Cycle: The error propagates to the Primary Output (PO)
B C D E D A SER Modeling in Single Cycle • Nominal FIT Logic Derating Timing Derating Electrical Derating • Nominal FIT: • Occurrence rate of cosmic rays at error site • Computed once for library characterization • Logical Derating • Timing Derating • Electrical Derating FF 1 1
Logical Derating Modeling • The Main Idea: • Traversing structural paths from SEU site to POs and FFs • Using Signal Probabilities (SP) for off-path signals • SPA: probability of gate “A” having logic value “1” • Effective techniques available for SP computation off-path signals SP =0.2 SP =0.4 B C C B FF E D A on-path signals • EPP(AD) = SPB = 0.2 EPP: Error Propagation Probability • EPP(AE) = EPP(AD)(1-SPC) = 0.20.6 = 0.12
Propagation Rules: On-Path Gates • Reconvergent Paths • Error propagated to two or more inputs of a gate • Polarity of propagated error matters! • Need of 4 logic values to represent state of each line • 0, 1 : no error propagation (Error masked) • a: error propagation with same polarity as error site • ā : error propagation with opposite polarity as error site • Pa(Ui), Pā(Ui), P1(Ui), P0(Ui) • Developed Error Propagation Probability (EPP) Rules • For all logic gates
Propagation Rules • On-path gates: Pa(Ui) + Pā(Ui) + P1(Ui) + P0(Ui) = 1 • Off-path gates: P1(Ui) + P0(Ui) = 1
Timing Derating Modeling • Find all possible propagated waveforms • Enhanced static timing analysis • Record all possible transitions at each reachable gate • Due to glitch at error site • How? • Create glitch of width w • Represented by two events: (a,t), (ā,t+w) • For both positive and negative glitches • Inject two events (a,t), (ā,t+w) at error site • Find all events at the outputs of all on-path gates • Calculate the error propagation probabilities Pa, Pā for each event • The propagation is done until reaching a PO or FF. • Error propagation probabilities for all possible waveforms are computed • For each waveform, Latching Probability is computed as follows: • S: Setup Time, H: Hold Time, W: Glitch Width, T:Clock Period
Timing Logic Derating • Different Glitches may propagate to the POs or FFs due to re-convergent fan-out
Electrical Derating Modeling • Algorithm: Computing electrical masking while propagating events • Vomin(Gj, inputk): Minimum voltage of input k of Gj • Vomax(Gj, inputk): Maximum voltage of input k of Gj • Vomin(Gj): Minimum voltage of Gj output • Vomax(Gj): Maximum voltage of Gj output • PWo: Output pulse width • For each gate Gj in List(Gi) do • For each valid waveform (Wl) in Event List(Gj) do • Vomin(inputs) = Max(V omin of gate inputs on waveform Wl); • Vomax(inputs) = Min(V omax of gate inputs on waveform Wl); • Compute Vomin(Gj) • Compute Vomax(Gj) • Compute Pwousing computed Vomin(Gj ) and Vomax(Gj) • end • end
A Case Study: Error Propagation for Two Clock Cycles All three deratings may occur Only logical derating may occur
The Tool: MLET Multi-Cycle Logical-Electrical-Timing Derating
Experimental Results: Run Time • On average, 4 orders of magnitude faster than MC based simulation • Time required to compute SPs is also 5 orders of magnitude less than MC based simulation Execution times for MC simulation approach, SP computation, and MLET approach
Experimental Results: Accuracy • The MLET have an accuracy of about 97% as compared to the MC fault injection approach Difference of derating factors obtained by MLET using various SP variances compared to MC simulations (for an injected pulse width of 50 ps)
Multi-Cycle SERs Multi-cycle SER estimation of s820 and s832 ISCAS’89 circuits using MLET
Conclusions & Future Work • SER Estimation is very challenging as it requires dynamic analysis of transients. • The existing SER estimation approaches rely on investigation of error propagation probabilities for only single cycle resulting in inaccurate system failure rate. • We have proposed a very fast and accurate analytical approach so called MLET which has four main features: • It runs very fast. • All three masking factors are considered. • The effects of error propagation in re-convergent fan-outs are modeled. • The effect of multi-cycle error propagation on overall circuit SER is considered.
Conclusions & Future Work Cont’d • Experimental results extracted for some ISCAS89 circuit benchmark show that MLET is: • 4 orders of magnitude faster than the MC simulation based fault injection method • It has an accuracy of about 97%. • Future work:we are going to estimate the SER of a circuit in the presence of Multiple Event Transients (METs) as a reliability concern in ultra deep sub-micron technologies
Related Work: SER Modeling • Circuit/Logic-Level Approach • Fault injection • SERA by Zhang et. al. [ICCAD’04] • SEAT-LA by Rajaraman et. al. [VLSID’06] • Mohanram et. al. [ITC’03] • Maheshwari et. al. [DFT’03] • Asadi et. al. [DSN’03] [PRDC’04] • Seifert et. al. [TDMR’04] • Probabilistic Transfer Matrices (PTM) • Krishnaswamy et. al. [DATE’05] • Binary Decision Diagram (BDD) • FASER by Zhang et. al. [ISQED’06] [SELSE’05]