400 likes | 485 Views
Design Validation and Debugging. Tim Cheng Department of Electrical & Computer Engineering UC Santa Barbara VLSI Design and Education Center (VDEC) Univ. of Tokyo. First-silicon success rate has been dropping ~30% for complex ASIC/SoC@.13m (according to an ASIC vendor)
E N D
Design Validation and Debugging Tim Cheng Department of Electrical & Computer Engineering UC Santa Barbara VLSI Design and Education Center (VDEC) Univ. of Tokyo
First-silicon success rate has been dropping ~30% for complex ASIC/SoC@.13m (according to an ASIC vendor) Pre-silicon logic bugs have been increasing at 3X-4X per generation for Intel’s processors Yield has been dropping for volume production and takes longer to ramp up the yield IBM’s 8-core Cell-Processor chips: ~10-20% yield (July 2006) “Better than worst-case” design resulting in failures w/o defects Increase in variation of process parameters with scaling Worst-case design getting way too conservative Harder to Design Robust and Reliable Chips
In-Field Failures are Common and Costly • Xbox:16.4% failure rate • Additional warranty and refund will cost Microsoft $1.15B ($86 per $300-item) • More than financial cost: reputation and market loss • Non-trivial failure rate • 15% in average http://arstechnica.com/news.ars/post/20080214-xbox-360-failure-rates-worse-than-most-consumer-electornics.html
Systems must be designed to cope with failures Efficient silicon debug is becoming a must Need efficient design validation and debugging methodology Design for debugging would become necessary Must have embedded self-test for error detection For both testing in manufacturing line and in-field testing Both on-line and off-line testing Re-configurability and adaptability for error recovery make better sense Using spares to replace defective parts Using redundancy to mask errors Using tuning to compensate variations Design for Robustness and Reliability
Post-Silicon Validation and Debug SMT-Based RTL Error Diagnosis [ITC 2008] SAT-Based Diagnostic Test Generation [ATS 2007] Outline
Manufacturing defects Discovered during manufacturing test (<<1M DPM) Functional bugs (AKA logic bugs) Exist in all components ~98% found before tape out, ~2% post-silicon* Circuit bugs (AKA electrical bugs) Not all components exhibit failures Fails in some operating region (voltage, temperature, or frequency) Usually cause by design margin errors, IR drop, crosstalk coupling, L di/dt noise, process variation … ~50% found before tape out, ~50% post-silicon* Bugs in Silicon * Source: Intel
Pre-silicon validation Cycle accurate simulation FSIM << FPROD: cycle poor Any signal visible (i.e. white box): debugging is straightforward Limited platform level interaction Post-silicon validation Tests run at FPROD: cycle rich Component tested in platform configuration Only package pins visible: difficult debug Validation Domain Characteristics
Functional bugs relatively constant Correlate well to design complexity (amount of new and changed RTL) Late specification changes are contributors Circuit and analog bugs growing over time I/O circuit complexity increasing sharply Speedpaths (limiting FMAX of component) dominate CPU core circuit issues Post-Si History And Trends
Trend is toward lower observability Integration increasing towards SoC Functional and circuit issues require different solutions On average circuit bugs take 3x as much time to root cause vs. functional bugs Bugs found on platforms, but are debugged on debug-enabled automatic test equipment (ATE) Often need multiple iterations to reproduce on the tester Often long latency between circuit issue and it’s syndrome Post-Si Debug Challenges
Pre-Si Verification vs. Post-Si Debugging Pre-siliconFunctional Debugging Insert Corrections Silicon Debugging & Fault Diagnosis Insert Faults/ Errors Specification RTL Description Different Applications But Similar Problems Logic Netlist Physical Design
Automated Debugging/Diagnosis Automated Debugging/Diagnosis Counter examples/ Diagnostic Patterns A failed verification/test step is followed by debugging/diagnosis: Testbench or Test Vectors Verification or Testing PASS FAIL Design or Silicon
Leveraging Pre-Si Verification & Manufacturing Test Efforts for Post-Si Validation Specification Lack of error propagation analysis/metrics Pre-silicon verification White Box RTL Description Post-silicon validation Black Box Logic Netlist Physical Design Models at very low level of abstraction Black Box Manufacturing test 12
Post-Silicon Validation and Debug SMT-Based RTL Error Diagnosis [ITC 2008] SAT-Based Diagnostic Test Generation [ATS 2007] Outline
SAT-Based Diagnosis Failing Tests Erroneous Design Replicate circuit for each test Add additional circuitry into circuit model Add input/output constraints SAT assignment(s) →Fault location(s)!
SAT-Based Diagnosis - Example Stuck-at-1 fault on line l1 Input vector v=(0, 0, 1) detects 1/0 at y x 3 x 1 x 2 y l 1 1 0 1 / 0 1 0 / 1 0 Courtesy: A. Veneris
SAT-Based Diagnosis – Example (Cont’d) 1. Insert a MUX at each error candidate location s s x x 1 1 3 3 x x 1 1 x x w w y 2 2 l 1 1 1 0 0 1 0 0 1 y l 1 0 1 2. Apply input/output vector constraints Courtesy: A. Veneris
SAT-Based Diagnosis – Multiple Diagnostic Tests s 1 1 x 1 1 x y 1 3 l 1 1 0 0 1 0 0 1 1 x 1 w 2 1 2 x 2 2 x y 2 3 l 1 1 0 0 1 0 1 1 2 x 2 w 2 1 3 x 3 3 x y 3 3 l 1 1 1 0 1 0 0 1 3 x 3 w 2 1 Courtesy: A. Veneris
RTL Design Error Diagnosis Using Boolean SAT-Solvers for RTL design error diagnosis is not efficient The translation to Boolean is expensive High level information is discarded Propose a SMT-based, automated method for RTL-level design error diagnosis
Satisfiability Modulo Theory (SMT) Solvers • Targets combined decision procedures (CDP) • Integrate Boolean-level approach with higher-level decision procedures, such as ILP • SHIVA-UIF: an SMT solver developed for RTL circuit • Boolean Theory • Bit-vector Theory • Equality Theory } Makes a good candidate as the satisfiability engine for hardware designs
RTL Design Error Diagnosis Utilizing SHIVA-UIF Extend the main idea of Boolean-SAT-based diagnosis approach to word-level MUXs are added to word-level signals Failing Patterns, Error Candidates Add MUXs to design Reduced candidate list Impose test as constraints SMT UNSAT Remove remaining candidates SAT Add identified candidate to possible candidate list Add constraints to avoid same solution
Initialization Steps • Simple effect-cause analysis used to limit the potential candidates • A MUX is inserted at each potential erroneous signal W S X3 = X1 Y + L X2
module full_adder_imp (a1, a2, c_in, s, c_out); input a1, a2, c_in; output s, c_out; wire temp; assign s = a1 ^ a2 ^ c_in; assign temp = (a1 & a2) | (a1 & c_in); assign c_out = temp | (a2 & c_in); endmodule module full_adder_muxed (a1, a2, free1, free2, free3, s1, s2, s3, c_in, s, c_out); input a1, a2, c_in; input free1, free2, free3; input s1, s2, s3; output s, c_out; wire temp_mux, s_mux, cout_mux; assign s_mux = a1 ^ a2 ^ c_in; assign s = s1 ? s_mux : free1; assign temp_mux = (a1 & a2) | (a1 & c_in); assign temp = s2 ? temp_mux : free2; assign c_out_mux = temp | (a2 & c_in); assign c_out = s3 ? c_out_mux : free3; endmodule Could Directly Modifying HDL Code (at Potential Erroneous Statements)
Add constraints corresponding to a failing test and its expected response to the MUX-inserted circuit/code Inserting Constraints w.r.t. Failing Test and Expected Response = X3 Y X1 SAT W + L S 5 X2 1 3 S = 1 W =5 3 ( ( S? (W):(3+3) ) = 5 )
Experimental Results • 11 example circuits (IWLS 2005 benchmarks) • An error is randomly injected in each circuit • * after applying simple effect-cause analysis
Experimental Results 4 sample circuits, each with 1000 random errors Average/Max/Minimum number of remaining candidates
Experimental Results – Effect of Applying More Failing Tests Average of 4 sample circuits, each with 1000 random errors
Disadvantage of Model-Free Diagnosis Golden Model Design W3 W1 W2 W4 W5 S2 S4 S5 X3 X3 = = s1 s3 X1 X1 Y Y - + L L X2 X2 • Some errors are indistinguishable from each other • Example: L is the real error location but the solver can find satisfying values for all initial error candidates
Advantages of SMT-Based RTL Design Error Diagnosis • The learned information can be reused • The order of candidate identification is easy to difficult, implicitly done by the solver • Solver tends to set MUXs of easy-to-diagnosis candidates first, and, • By the time of checking difficult candidates, the accumulated learned clauses help reduce complexity • Running All-SAT for this model results in: • Eliminating a group of candidates without explicitly targeting them one at a time
Post-Silicon Validation and Debug SMT-Based RTL Error Diagnosis [ITC 2008] SAT-Based Diagnostic Test Generation [ATS 2007] Outline
Diagnostic Test Pattern Generation (DTPG) Generates tests that distinguish fault types or locations One of the most computationally intensive problems Most existing methods are based on modified conventional ATPG or Sequential ATPG Very complex and tedious implementation Propose an Efficient SAT-based DTPG approach for combinational and sequential circuits
Traditional SAT-based DTPG Use a miter-like model to transform DTPG into a SAT problem P O f1 × F a u l t y M = 1 P I f2 P O × F a u l t y SAT Distinguishable Indistinguishable UNSAT
SAT-based DTPG Limitations: Need to build a miter circuit for each fault pair Cannot share learned information for different fault pairs Objectives: Reduce number of miter circuits and the computational cost for each DTPG run by using learned information from previous runs
DTPG Model for Injecting Multiple Fault Pairs Inject the same set of N=2n to-be-differentiated faults into each of the two circuits in the miter Add a n-to-2n decoder in each circuit to activate exactly one fault at a time The extra sets of primary inputs to the decoders, PI1 and PI2, are extra primary inputs Solve objective M=1 Vi 001 110 Vi differentiates f1 and f6!!
DTPG Procedure Using Proposed Model For a SAT solution, values assigned at PI1 and PI2 represent indices of activated fault pair; values assigned at PI is a diagnostic test After diagnostic test of fault pair fi and fj, is found, add a blocking clause to avoid test for the same pair generated again After UNSAT, all remaining fault pairs are indistinguishable List of fault candidates Build the DTPG model Simplify the circuit M=1? UNSAT End SAT Diagnostic pattern found Add SAT constraint
Main Advantages of the DTPG Model The learned information can be reused Order of target fault pair selection is automatically determined by SAT solving Easy-to-distinguish fault pairs would be implicitly targeted first Running All-SAT for this miter model could: Find diagnostic patterns for all pairs of faults Naturally perform diagnostic pattern compaction Identify a group of indistinguishable fault pairs without explicitly targeting them one at a time
Finding More Compact Diagnostic Tests Vi Vj 000 0x0 011 11x Vj differentiates {f0, f2} and {f6, f7} Vi differentiates f0 and f3
DTPG with Compaction Heuristic Solve objective M = 1 using SAT solver Use existing patterns to guide the SAT solving Find don’t cares at PI1 and PI2 in the newly generated pattern - so the corresponding pattern differentiate two groups of faults
Need m n-to-2n decoder in each faulty circuit (m is the cardinality of multiple faults) One output from each decoder is connected to an m-input OR gate Can inject m or fewer faults Combine existing methods before using the proposed DTPG model DTPG for Multiple Faults
DTPG Results • Initial fault pairs: generated by a critical-path-tracing tool • All fault pairs injected into one miter circuit • #D—distinguishable, #E—equivalent, #A—aborted
Summary SMT-based RTL Design Error Diagnosis An enhanced model injecting single/multiple design errors Enable sharing of the learned information Identify false candidates without explicitly targeting them SAT-based DTPG Use an enhanced miter model injecting multiple faults Enable sharing of the learned information Identify undifferentiable faults efficiently Support diagnosis between mixed, multiple fault types Combine with diagnostic test pattern compaction