10 likes | 137 Views
Stronger clustering, higher yield ( 0). Weaker clustering, lower yield ( ). 4/6 good chips. 2/6 good chips. Yield and Cost Modeling of a Spare-Enhanced NoC. Saeed Shamshiri and Kwang-Ting (Tim) Cheng University of California, Santa Barbara.
E N D
Stronger clustering, higher yield (0) Weaker clustering, lower yield ( ) 4/6 good chips 2/6 good chips Yield and Cost Modeling of a Spare-Enhanced NoC Saeed Shamshiri and Kwang-Ting (Tim) Cheng University of California, Santa Barbara Resilient System Design, Task 1.2.3.4. SoC Structure Objectives Core Yield Model 3 1 2 • Problems: • Low yield of multi-core chips (due to man. defects) • High cost of manufacturing testing to avoid huge service cost (due to man. defects) • Huge service cost (due to in-filed defects, and man. test escape) • Over testing: burning or rejecting good chips (false reject) • Solution: • Making the system self-reconfigurable and repairable by adding distributed redundancy (spare cores and wires) • Our approach: • Providing an analytical model to explore • The relationship of the yield and cost of a multi-core chip with the parameters associated with each components such as yield, defect coverage of manufacturing and in-field testing, defect density, shape and scale parameters and etc. • How many spare cores/wires should be included? • Can we skip burn-in and repair infant mortality in the field? • Providing a simulation model to explore • The connectivity of the cores • Applying this model to Intel 80-tile processor Defect density Area • True yield of a core, yc, is a function of area, defect density, and clustering factor: • is the degree to which defects are clustered • Observed yield of a core, y’c, also depends on the manufacturing testing: Clustering factor Intel 80-tile network on chip (ISSCC07) defect coverage of the manufacturing testing One tile of the chip * de Sousa and Agrawal, DATE 2000 * Kuo and Kim, Proc. of IEEE 1999 Yield & Manufacturing Cost Processor Yield Model Network Yield Model 4 5 6 95% yield: -11 spare blocks -No spare wire 99% yield: -3 spare blocks -1 spare wire Monte Carlo Simulation • Start with n fault-free blocks connected in a 2-D mesh. • Inject fault in n-i randomly selected blocks. • For each of those n-i faulty blocks, disconnect their links with f’conn probability. • Disconnect every link of the mesh with 1- y’link probability. • Traverse the mesh and count the maximum number of fault-free blocks connected together. If this number is larger than m, it is a connected mesh. • Repeat all the above steps for many times and return the probability of having a connected mesh. • Approach 1: • No Correlation • Approach 2: • Maximum Correlation • Shared Parameters • Defect coverage • Defect density • Clustering parameter • Approach 3: • Partial Correlation • Dedicated parameters • A network of n blocks (tiles) passes the test if: • i out of n blocks pass the test (i >= m) • At least m out of those i fault-free blocks are connected with fault-free links The probability that a faulty block cannot be used as a routing node Manufacturing cost of memory Test cost of a tile Manufacturing cost of a link The lowest cost: -Three spare blocks -One spare wire per link Manufacturing cost of a tile Observed system yield Conclusion & Future Work Total Cost More Experimental Results 7 8 9 Intel 80-core Burn-in Elimination Summary: • Distributed redundancy • Spare tiles • Spare wires • Analytical model • yield • Cost • Monte Carlo simulation for connectivity Conclusion: • Improving the overall cost and yield significantly • Burn-in elimination Future work: • Quality of the network in addition to the connectivity • Quality Metric: Average Minimum Distance • Non-uniform distributed redundancy Minimum Service Cost Software Demo Minimum Total Cost Minimum Man Cost With one spare wire per link Burn-in increases the cost Burn-in reduces the cost Minimum Cost: -7 spare blocks -1 spare wire per link • Burn-in elimination is beneficial when there are enough spare cores in the chip to recover failures in the field. • Without any spare wire 54% of the chips have at-least 74 fault-free cores. • With one spare wire per link 65% of the chips have at-least 74 fault-free cores. The probability of a shipped chip not failing within the warranty period Service cost per failed chip