250 likes | 524 Views
ESD Protection: How good is it for suppressing EMI (Task 3.1: Bridgwood)How simultaneous switching of million devices causes EMI and reduces chip operating margin and reliability How external EM pulses can further aggravate chip margins and reliability (Task 3.2: Mazumder) Characterization of functional and behavioral failures attributed to EMI (Task 3.3: Dutt) .
E N D
1. Effects of EMI on Digital Systems
Participants: Prof. P. Mazumder
The University of Michigan
Prof. M. Bridgwood
Clemson University
Prof. S. Dutt
University of Illinois, Chicago
3. Conducted WB and NB Interaction with Digital Devices Impedances of Interfaces through Switching
- Inputs, Outputs, Control Lines, Power supplies
Vulnerability Thresholds
- Disruption and Damage
- Single Nodes
- Multiple Nodes and Devices
Waveforms
- Pulses
- Damped Rings
4. DRAM Input Circuit Structure
6. Modeling of Simple Capacitive StructureThrough Breakdown Events
7. SIA Roadmap - IC Technology
8. Task 3.2.1: EMI generation due to Tr. Switching
Task 3.2.2: Effects of EMI on chip operations
Task 3.2.3: EMI Simulator Design
9. Noise Distribution Paths Direct radiation from chip surface
Caused by high-frequency current within the chip
Level of radiation is small in comparison to the following ones
Conducting noise from the signal ports
Off-chip wires act as antennae
Effect of this noise source is significant … but it’s an easy problem
Power-line conducting noise
High-frequency large power/ground current
Most significant source of EMI problem … and is difficult to solve
10. Power-Line Conducting Noise Modeling core power network
Power-line capacitance modeling
Switching current model
11. Power Network Modeling
12. Switching Current Simulation Time-varying switching current consumed in circuit blocks is first simulated assuming ideal power supply voltage using SPICE
Circuit simulation is performed for the power network with the switching current information added
By iterating this annotation process, one can achieve better simulation accuracy
13. Power-Line Conducting Noise
15. Clock Network
Transmission line modeling of clock wires
Differential Quadrature Method (DQM)
Model Reduction by Krylov Subspace Method
Study of clock jitters and synchronization
failures due to ringing and deformities
FD-TLM based VEDICS Tool (designed at Univ. of Michigan)
More accurate than lumped model yet more efficient than other field solvers
16. EMI Simulator
17. System Level Studies for Estimating EMI Effects
18. Funded and Past Work Recently-funded work on FT (S. Dutt) [Verma, MS Thesis, UIC,’01], [Verma & Dutt, ICCAD’01 subm.] [Dutt, et al., ICCAD’99], [Mahapatra & Dutt, FTCS’99]-- Funded in part by DARPA-ACS, Xilinx Inc.:
On-line test and fault reconfiguration of field-programmable gate arrays using a roving tester
Key is effective incremental re-placement and re-routing to dynamically move the roving tester
EM-induced faults:
High level computer failure detection due to different types of EM signals [Mojert et al., EMC’01]; no cause-effect or classification analysis.
Failure in real-time communication & control systems from communication line errors due to EM signals [Kohlberg & Carter, EMC’01]
19. Assumptions/Scenarios of Past Work Past Work on general fault detection:
Faults directly affect transistors & on-chip interconnects
Random single (sometimes double) faults
Deterministic faults
Types of faults: permanent, transient, intermittent; intermittent not generally tackled
Past Work on EM-induced faults:
No how/why/what analysis and classification of computer failure due to EM interference
20. Different Scenarios in Proposed Work Faults directly affects off-chip signal lines (memory address, data and control lines) and power/ground (p/g) lines
p/g line faults => multiple faults (clustered if p/g lines are partitioned, else random)
Signal line faults => incorrect instr./data => multiple clustered faults along control/data path
Window of susceptibility if p/g lines shielded -- probabilistic model (e.g., susceptible on cache misses)
May need to tackle intermittent faults due to periodic EM pulses
Detailed error analysis and classification due to EM-induced faults
21. Proposed Work Comprehensive VHDL processor and memory model
Will include variable-width variable-period fault injection capability for off-chip signal lines (to simulate different pulse widths and periods).
Similar fault-injection capability for on-chip wires with a probabilistic component
22. Proposed Work (contd.) Will determine and classify the following type of computer system behavioral error (i.e., program errors) due to different patterns, extent, duration and location of faults:
Control flow errors -- incorrect sequence of instruction execution. Causes; address gen. error, memory faults, bus faults
Data errors. Causes: computation errors, memory & bus faults
Hung processor & crashes. Causes: C.U. transition to dead-end states, invalid instruction, out-of-bound address, divide-by-zero, spurious interrupts (?)
To the best of our knowledge, more comprehensive analysis of fault effects on a computer system than that attempted previously
Comprehensive analysis is needed due to the nature of EM effects--all pervasive, periodic, clustered
23. Proposed Work--Methodologies: Control Flow Checking [Mahmood & McCluskey, TC’88] A node is a block of instructions with a branch at the end
A derived signature of a node is a function (e.g., xor, LFSR) of all its instructions
A program graph is one in which there is an arc from node u to v if the branch at u can lead to node v
24. Proposed Work--Methodologies: Algorithm-Based Fault Tolerance[Huang & Abraham, TC’84], [Dutt & Assad, TC‘96] Use properties of the computation to check correctness of computed data
E.g., linearity property: f(v1+v2) = f(v1) + f(v2), of computation f( ) can be used to check it:
Pre-compute v’ = v1 + v2 + … + vk (input checksum)
Compute f(v1), …., f(vk)
Compute u = f(v) + f(v2) + …. + f(vk) (output checksum)
Check if f(v’) = u; inequality indicates computation error(s)
Can be used for linear computations such as matrix multiplication, matrix addition, Gaussian elimination [Huang & Abraham, TC’84], [Dutt & Assad, TC‘96]
25. Goals, Questions & Future Outlook Correlate the probability/frequency of different types of computer system errors to [pattern, extent, duration, location] of EM-induced faults
Correlate types of logic faults w/ similar descriptors to functional errors (output error of ALU, Control Unit) -- classification of catastrophic vs. non-catastrophic logic faults
Q: Are there patterns of errors that lead to computer crashes w/ high probability?
Q: If so, can the detection of such patterns be used to shut down the computer in a fail-safe manner (save state & data for later resumption)?
26. Goals, Questions & Future Outlook (contd.) Q: Are there patterns of errors that are characteristic of EM-induced faults versus random single/double faults?
Q: If so, can these be used as “early detection & warning” of EM interference?
Future: Based on the correlation of system errors to EM faults, determine fault tolerance/error minimization techniques for EM-induced faults