Marly Roncken, Ken Stevens, Shai Rotem - Intel Corporation

R CA-BIST for Asynchronous Circuits:A Case Study on the RAPPID Pentium Pro Instruction Length Decoder Marly Roncken, Ken Stevens, Shai Rotem - Intel Corporation Rajesh Pendurkar - Sun Microsystems Parimal Pal Chaudhuri - Bengal Engineering College

R Latency [nsec] 3.1 x 3.5 mm RAPPIDRevolving Asynchronous Pentium Processor Instruction-length Decoder RAPPID Pentium II (Processor Core) Throughput [instr/nsec]

Baseline for Case Study • RAPPID = High-SpeedProof-of-Concept • synchronous basis • 0.25 micron CMOS static + domino library • self-timed addition: Relative Timing • from handshake causality • to relative assumptions • Testability study started afterwards • no DfT in core design - but some debug features • considered a major risk in performance + fault coverage • Wanted: non-invasive test approach • outside RAPPID core • low performance penalty • no re-design

Objectives • Achieve 95% fault coverage with non-scan BIST • look for fast ways to tune BIST to RAPPID • beyond pseudo-random testing • use HDM coverage metric : min-terms in Length Decode PLA • follow design architecture : replication • use Cellular Automata BIST • wider behavior than LFSR solutions • expert in-house • Analyze testability impact of Relative Timing • take BIST solution - analyze undetected stuck-at faults • manageable • high fault coverage leaves relatively few undetected faults • in-focus • HDM coverage metric uncovers implementation-specific faults

Outline • Part I CA-BIST solution for RAPPID • RAPPID interface and design hooks • CA-BIST architecture + algorithm + costs • CA test generation engine • bootstrapped test expansion • Part II Stuck-at fault analysis for Relative Timing • fault coverage distribution • benign and suspicious escapees • Conclusion

RAPPID CA-BIST - starting point

CATPG CARE RAPPID RAPPID CA-BIST - starting point

16x replication used in CATPG Decode and Steer Unit Column 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Byte Latch optimal balance (common instr) Byte CTRL Byte Unit Length Decode 720 MHz 3.6 GHz Tag Unit Row 0 Crossbar Switch 900 MHz Tag Unit Row 1 Crossbar Switch Output Buffer Tag Unit Row 2 Crossbar Switch Output Buffer Tag Unit Outputs shared in CARE Row 3 Crossbar Switch Output Buffer RAPPID - core Architecture

interface to the tester data bytes in circular scan at-speed performance testing branch status register / byte from external Branch Target Buffer 16/32-bit instruction modes global setting instruction based local setting interface to RAPPID share circular scan to bootstrap test generation share 3-bit status register direct access via test circuit BRT 2-byte-instruction based setting share 16/32 global setting test patterns cover local setting RAPPID - input FIFO RAPPID CATPG

CATPG CA CARE RAPPID CA-BIST - interfacing RAPPID

One for All: CA test engine • Generate initial fillings for 16 instruction bytes • 11-bit D1*CA + dual version • 11 state bits - take first 8 bits as test instruction byte • 48 pairs of state components with cycle length 16 • no LFSR implementation

Normal Dual 16x S0-S0 cycle =16 FIFO fillings 16x S1-S1 cycle =16 FIFO fillings S0 S1 D1*CA component Pair ?

Traversal algorithm • Step 3 • S0 := next normal state after S1 • run S0-S0 normal cycle • use state outputs as test bytes • Step 4 • STOP if ( S0=S2 ) else Step 2 • Step 1 • S0 := normal cyclic state := S2 • Step 2 • S1 := S0 with inverted MSB • run S1-S1 dual cycle • use state outputs as test bytes

design replication used in CATPG 1024 x 32 tests 100% coverage All for One: test expansion • Circular scan + Bootstrap Algorithm (256x) Step 1: run Traversal Algorithm for next FIFO filling Step 2: 128x ( left-rotate FIFO by 1 bit + test RAPPID ) Step 3: repeat Step 2 with right-rotate Step 4: STOP if end-of-Traversal-Algorithm else Step 1 • Add Test circuitry to close test gaps • HDM coverage revealed 66-0F gap for long instruction • extra Test Circuitry to circulate 66-0F in test set • Test operation modes (4x) • 16/32 bit instruction modes • BRT to test branch instructions • 660F to cover remaining test gap

CATPG CA CARE RAPPID CA-BIST solution

Costs • performance • latency 5% • CATPG 1 gate delay (shared scan) • CARE negligible output load+delay • throughput 0% • CATPG + CARE off critical path • area • from schematics 5% (including circular scan) • fault coverage • for HDM 100% PLA min-terms • at switch level 94% testable stuck-at faults

Decode and Steer Unit Column 0 15 Tag Unit Crossbar Switch 5% 9% 5% excellent ATPG candidates 81% Stuck-at fault Analysis • 120,000 transistors • static + domino gates • pass + reset transistors • switch-level fault analysis • COSMOS • stuck-at input + output • simulated for full RAPPID • only injected in 1 column-row

full keeper for 0 & 1 half keeper for 1 only footedprotection set-reset overlap ( c d ) no foot for pulse-d evaluations Benign & Suspicious Escapees

@1 @0 fast slow • floating gate output • z floats when neither set-reset • relative timing matters • fast reset OK, slow NOT OK • suspicious • test at low speedor frequency • within specification range • increases test application costs • noise sensitive • test for realistic noise conditions • similar for clocked design Domino Escapees • pulse narrowing • d fights keeper (weak) • push-out of z:=0 • slightly deteriorated 0 • keeper helps reset c • push-forward of z:=1 • benign • wide evaluation pulse • d stays valid during evaluation • redundant n-transistor in keeper

@0 • pulse narrowing • d fights keeper (weak) • push-out of z:=0 • slightly deteriorated 0 • keeper helps reset c • push-forward of z:=1 • benign • wide evaluation pulse • d stays valid during evaluation • redundant n-transistor in keeper • suspicious • small input pulse • d0d1d2 reset during evaluation • required for un-footed gate • crucial n-transistor in keeper • output pulse shrinks >50% • from 7 gate delays ( c loop ) • to 3 gate delays ( d2 loop ) • noise sensitive • noise can make pulse narrower • similar for self-reset clocks in `synchronous’ pulse logic Pulse domino Escapees

Current focus Conclusion • Testability is no excuse • to avoid asynchronous + clocked high-performance • similar fault effects for RAPPID and clocked domino • CA-BIST without scan works for RAPPID • non-invasive with low performance + area penalty • covers 94% of testable faults • remaining 6% relate to missing data operands • suitable minority for tailored ATPG + off-chip testing • … and BIST tuning • RAPPID column replication tuned test expansion • not always so obvious • HDM coverage metric tuned CATPG solution • … but missed 5% potentially catastrophic timing related faults

Marly Roncken, Ken Stevens, Shai Rotem - Intel Corporation