1 / 21

Marly Roncken, Ken Stevens, Shai Rotem - Intel Corporation

R. CA-BIST for Asynchronous Circuits: A Case Study on the RAPPID Pentium Pro Instruction Length Decoder. Marly Roncken, Ken Stevens, Shai Rotem - Intel Corporation Rajesh Pendurkar - Sun Microsystems

danika
Download Presentation

Marly Roncken, Ken Stevens, Shai Rotem - Intel Corporation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. R CA-BIST for Asynchronous Circuits:A Case Study on the RAPPID Pentium Pro Instruction Length Decoder Marly Roncken, Ken Stevens, Shai Rotem - Intel Corporation Rajesh Pendurkar - Sun Microsystems Parimal Pal Chaudhuri - Bengal Engineering College

  2. R Latency [nsec] 3.1 x 3.5 mm RAPPIDRevolving Asynchronous Pentium Processor Instruction-length Decoder RAPPID Pentium II (Processor Core) Throughput [instr/nsec]

  3. Baseline for Case Study • RAPPID = High-SpeedProof-of-Concept • synchronous basis • 0.25 micron CMOS static + domino library • self-timed addition: Relative Timing • from handshake causality • to relative assumptions • Testability study started afterwards • no DfT in core design - but some debug features • considered a major risk in performance + fault coverage • Wanted: non-invasive test approach • outside RAPPID core • low performance penalty • no re-design

  4. Objectives • Achieve 95% fault coverage with non-scan BIST • look for fast ways to tune BIST to RAPPID • beyond pseudo-random testing • use HDM coverage metric : min-terms in Length Decode PLA • follow design architecture : replication • use Cellular Automata BIST • wider behavior than LFSR solutions • expert in-house • Analyze testability impact of Relative Timing • take BIST solution - analyze undetected stuck-at faults • manageable • high fault coverage leaves relatively few undetected faults • in-focus • HDM coverage metric uncovers implementation-specific faults

  5. Outline • Part I CA-BIST solution for RAPPID • RAPPID interface and design hooks • CA-BIST architecture + algorithm + costs • CA test generation engine • bootstrapped test expansion • Part II Stuck-at fault analysis for Relative Timing • fault coverage distribution • benign and suspicious escapees • Conclusion

  6. RAPPID CA-BIST - starting point

  7. CATPG CARE RAPPID RAPPID CA-BIST - starting point

  8. 16x replication used in CATPG Decode and Steer Unit Column 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Byte Latch optimal balance (common instr) Byte CTRL Byte Unit Length Decode 720 MHz 3.6 GHz Tag Unit Row 0 Crossbar Switch 900 MHz Tag Unit Row 1 Crossbar Switch Output Buffer Tag Unit Row 2 Crossbar Switch Output Buffer Tag Unit Outputs shared in CARE Row 3 Crossbar Switch Output Buffer RAPPID - core Architecture

  9. interface to the tester data bytes in circular scan at-speed performance testing branch status register / byte from external Branch Target Buffer 16/32-bit instruction modes global setting instruction based local setting interface to RAPPID share circular scan to bootstrap test generation share 3-bit status register direct access via test circuit BRT 2-byte-instruction based setting share 16/32 global setting test patterns cover local setting RAPPID - input FIFO RAPPID CATPG

  10. CATPG CA CARE RAPPID CA-BIST - interfacing RAPPID

  11. One for All: CA test engine • Generate initial fillings for 16 instruction bytes • 11-bit D1*CA + dual version • 11 state bits - take first 8 bits as test instruction byte • 48 pairs of state components with cycle length 16 • no LFSR implementation

  12. Normal Dual 16x S0-S0 cycle =16 FIFO fillings 16x S1-S1 cycle =16 FIFO fillings S0 S1 D1*CA component Pair ?

  13. Traversal algorithm • Step 3 • S0 := next normal state after S1 • run S0-S0 normal cycle • use state outputs as test bytes • Step 4 • STOP if ( S0=S2 ) else Step 2 • Step 1 • S0 := normal cyclic state := S2 • Step 2 • S1 := S0 with inverted MSB • run S1-S1 dual cycle • use state outputs as test bytes

  14. design replication used in CATPG 1024 x 32 tests 100% coverage All for One: test expansion • Circular scan + Bootstrap Algorithm (256x) Step 1: run Traversal Algorithm for next FIFO filling Step 2: 128x ( left-rotate FIFO by 1 bit + test RAPPID ) Step 3: repeat Step 2 with right-rotate Step 4: STOP if end-of-Traversal-Algorithm else Step 1 • Add Test circuitry to close test gaps • HDM coverage revealed 66-0F gap for long instruction • extra Test Circuitry to circulate 66-0F in test set • Test operation modes (4x) • 16/32 bit instruction modes • BRT to test branch instructions • 660F to cover remaining test gap

  15. CATPG CA CARE RAPPID CA-BIST solution

  16. Costs • performance • latency 5% • CATPG 1 gate delay (shared scan) • CARE negligible output load+delay • throughput 0% • CATPG + CARE off critical path • area • from schematics 5% (including circular scan) • fault coverage • for HDM 100% PLA min-terms • at switch level 94% testable stuck-at faults

  17. Decode and Steer Unit Column 0 15 Tag Unit Crossbar Switch 5% 9% 5% excellent ATPG candidates 81% Stuck-at fault Analysis • 120,000 transistors • static + domino gates • pass + reset transistors • switch-level fault analysis • COSMOS • stuck-at input + output • simulated for full RAPPID • only injected in 1 column-row

  18. full keeper for 0 & 1 half keeper for 1 only footedprotection set-reset overlap ( c d ) no foot for pulse-d evaluations Benign & Suspicious Escapees

  19. @1 @0 fast slow • floating gate output • z floats when neither set-reset • relative timing matters • fast reset OK, slow NOT OK • suspicious • test at low speedor frequency • within specification range • increases test application costs • noise sensitive • test for realistic noise conditions • similar for clocked design Domino Escapees • pulse narrowing • d fights keeper (weak) • push-out of z:=0 • slightly deteriorated 0 • keeper helps reset c • push-forward of z:=1 • benign • wide evaluation pulse • d stays valid during evaluation • redundant n-transistor in keeper

  20. @0 • pulse narrowing • d fights keeper (weak) • push-out of z:=0 • slightly deteriorated 0 • keeper helps reset c • push-forward of z:=1 • benign • wide evaluation pulse • d stays valid during evaluation • redundant n-transistor in keeper • suspicious • small input pulse • d0d1d2 reset during evaluation • required for un-footed gate • crucial n-transistor in keeper • output pulse shrinks >50% • from 7 gate delays ( c loop ) • to 3 gate delays ( d2 loop ) • noise sensitive • noise can make pulse narrower • similar for self-reset clocks in `synchronous’ pulse logic Pulse domino Escapees

  21. Current focus Conclusion • Testability is no excuse • to avoid asynchronous + clocked high-performance • similar fault effects for RAPPID and clocked domino • CA-BIST without scan works for RAPPID • non-invasive with low performance + area penalty • covers 94% of testable faults • remaining 6% relate to missing data operands • suitable minority for tailored ATPG + off-chip testing • … and BIST tuning • RAPPID column replication tuned test expansion • not always so obvious • HDM coverage metric tuned CATPG solution • … but missed 5% potentially catastrophic timing related faults

More Related