420 likes | 598 Views
Defect Tolerance for Yield Enhancement of FPGA Interconnect Using Fine-grain and Coarse-grain Redundancy. Anthony J. Yu Guy G.F. Lemieux September 15, 2005. Outline. Introduction and motivation Previous works New architectures Coarse-grain redundancy (CGR) Fine-grain redundancy (FGR)
E N D
Defect Tolerancefor Yield Enhancementof FPGA InterconnectUsing Fine-grain and Coarse-grain Redundancy Anthony J. Yu Guy G.F. Lemieux September 15, 2005
Outline • Introduction and motivation • Previous works • New architectures • Coarse-grain redundancy (CGR) • Fine-grain redundancy (FGR) • Experimentation Results • Conclusions
Introduction and Motivation • Scaling introduces new typesof defects • Smaller feature sizes susceptible to smaller defects • Expected results • Defects per chip increases • Chip yield declines • FPGAs are mostly interconnect • FPGAs must tolerate multiple interconnect defects to improve yield (and $$$)
General Defect Tolerant Techniques • Defect-tolerant techniques minimize impact (cost) of manufacturing defects • FPGA defect-tolerance can be loosely categorized into three classes: • Software Redundancy – use CAD tools to map around the defects • Hardware Redundancy – incorporate spare resources to assist in defect correction (eg. Spare row/column) • Run-time Redundancy – protection against transient faults such as SEUs (eg. TMR)
Previous work – 1 – Xilinx • Xilinx’s Defect-Tolerant Approach • Customer (knowingly) purchases “less that perfect” parts • Customer gives Xilinx configuration bitstream • Xilinx tests FPGA devices against bitstream • Sells FPGA parts that “appear” perfect • Defects avoid the bitstream • Limitation: • Chips work only with given bitstream – no changes!
Previous work – 2 – Altera • Altera’s Defect-Tolerant Approach • Customer purchases “seemingly perfect” parts • Make defective resources inaccessible to user • Coarse-grain architecture • Spare row and column in array (like memories) • Defective row/column must be bypassed • Use the spare row/column instead • Limitation: • Does not scale well (multiple defects)
Objective • Problem • FPGA yield is on decline because of aggressive technology scaling • Proposed Solutions • Defect-tolerance through redundancy • Important Objectives • Interconnect defects important (dominates area) • Tolerate multiple defects (future trend) • Preserve timing (no timing re-verification) • Fast correction time (production use)
Improving yield for CGR –Adding Multiple Global Spares • Add multiple global spare to traditional CGR • Global spares can be used to repair any defective row/column in the array • Wire extensions are now longer
Increasing Area+Delay Overhead MORE SPARES MORE MUX OVERHEAD IN EVERY SWITCH ELEMENT NO SPARES 2 GLOBAL SPARES 4 GLOBAL SPARES MAY BE IMPRACTICAL !!! 1 GLOBAL SPARE
Improving yield for CGR –Adding Multiple Local Spares • Divide FPGA into subdivisions • Each subdivision has localspare(s) • Distributes spares across chip • Reduces mux area overhead(of Global scheme) • Limitation: • Spare(s) can only repair defect within the subdivision
Yield Impact of Multiple Local Spares(not as good as Global with same # spares)
Our Proposed SolutionFine-grain Redundancy (FGR) – Defect Avoidance by Shifting
Switch Implementation Options • Several detailed implementations are possible • Trade off area / delay / yield(repairability)
Defect Avoidance –Switch Implementation Option 1 Can avoid contention by pre-shifting the red signal… OR… [ lower area overhead, lower yield improvement ]
Defect Avoidance –Switch Implementation Option 2 …OR … can avoid contention by embedding the IMUX [ higher area overhead, best yield ]
Experimentation Results • Area • Delay • Area Delay Product • Yield • Summary
Estimated Area overhead at equal yield (80%) * CGR-G1 can only tolerate 1-2 defects
Yield – 1Switch Implementation Affects Yield * Assumes all bridging defects
Comparison between FGR and CGR – FGR Tolerates Tens of Defects
Limitations of Study & Architectures • FGR • Does not tolerate defects in the logic • Cannot tolerate clustered defects • Requires a detailed fault map • CGR • Assumes that all defects can be corrected with a single row/column • Bypass circuitry is approximated
Conclusions • CGR is effective for 1 or 2 defects • FGR meets desired objectives: • Tolerates multiple randomly distributed defects • Defect correction does not perturb timing • Tolerates an increasing number of defects as array size increases • Correction can be applied quickly • FGR potentially capable of correcting crosstalk faults, but has not been explored
Thank you! anthonyy@ece.ubc.ca