1 / 34

IFRA Instruction Footprint Recording & Analysis for Post-Silicon Bug Localization

IFRA Instruction Footprint Recording & Analysis for Post-Silicon Bug Localization. Sung-Boem Park Subhasish Mitra Robust Systems Group Departments of Electrical Eng. & Computer Sc. Stanford University. 1. Key Message. Post-silicon bug localization – Major bottleneck

marli
Download Presentation

IFRA Instruction Footprint Recording & Analysis for Post-Silicon Bug Localization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IFRAInstruction Footprint Recording & Analysisfor Post-Silicon Bug Localization Sung-Boem Park Subhasish Mitra Robust Systems Group Departments of Electrical Eng. & Computer Sc. Stanford University 1

  2. Key Message Post-silicon bug localization – Major bottleneck Pinpoint from system failure Bug location, exposing stimulus Existing schemes – Expensive & not scalable IFRA – New technique for processors Eliminates limitations of existing techniques 96% accuracy 1% area, ~0% performance impact 2

  3. Outline • Motivation • IFRA Overview • Simulation Results • Conclusion

  4. Microprocessor Development Flow Post-Silicon Validation Costs: 35% of Development Time 25% of Design Resources Design Pre-Silicon Pre-Silicon Verification POST-SILICON VALIDATION Post-Silicon Manufacturing Test “Post-silicon cost & complexity is rising faster than design cost” S. Yerramilli, VP, Intel, ITC06 Invited Address

  5. Post-Silicon Validation Steps • Detect – Run test content in system • e.g., OS, games, functional tests • Localize – Pinpoint from system failure (e.g., crash) • Bug location – e.g., ALU, decoder, scheduler • Exposing stimulus – e.g., instruction sequence • Dominates cost [Josephson DAC06] • Root cause & Fix • Optical probing, patch / circuit edit / respin

  6. Post-Silicon Bug Types [Josephson DAC06] Functional bugs – Incorrect logic implementation e.g., design errors Short localization time – e.g., hours to days Electrical bugs / circuit marginalities e.g., speed-path, noise, races, hold time Some voltage / temp / frequency corners LONG localization time – e.g., days to weeks Our focus 6

  7. Existing Post-Silicon Bug Localization Flows System-based Tester-based Detect in system Detect in system Not always Possible 1 to 4 weeks Localize failure in system Reproduce failure on tester 2 days Localize on tester 3 days Major Problems Failure Reproduction System-level simulation

  8. IFRA vs. Existing Techniques 8

  9. Instruction Footprint Recording & Analysis Design Phase Insert recorders inside chip design Non-intrusive No failure reproduction Single test run sufficient Record special info. in recorders / Run tests No Failure detected? Post-Si Validation Yes No system simulation Self-consistency against test program binary Scan out recorder contents Post-analyze offline Localized Bug: (location, stimulus)

  10. Outline • Motivation • IFRA Overview • Hardware Support • Automated Post-Analysis Techniques • Simulation Results • Conclusion

  11. IFRA Hardware in Superscalar Processor Branch Predictor I-TLB I-Cache FETCH ID assignment Fetch Queue Pipeline Registers Alpha 21264 Recorders DECODE Part of scan chain Decoders Pipeline Registers Recorders DISPATCH Reg Map Reg Free Reg Rename Post-Trigger Generator Pipeline Registers Recorders ISSUE Instruction Window Phys Regfile Slow wire Pipeline Registers Recorders No at-speed routing MUL 2xALU D-Cache EXECUTE 2xBr FPU 2xLSU D-TLB Pipeline Registers Recorders COMMIT Reorder Buffer Reg Map Pipeline Registers Scan chain Recorders

  12. Recording Operation Example Special ID assignment rule Branch Predictor I-TLB I-Cache FETCH ID Assignment Fetch Queue INST2 Auxiliary Info: PC2 ID2 Auxiliary Info: PC1 ID1 INST1 Recorder 1 Pipeline Reg INST2 INST1 ID1 ID2 ID2 Auxiliary Info: PC2 ID1 Auxiliary Info: PC1 Decoder DECODE Instruction Footprints INST2 Auxiliary Info: Decoded bits2 ID2 INST1 ID1 Auxiliary Info: Decoded bits1 Recorder 2 Pipeline Reg INST1 ID1 ID2 Auxiliary Info: Decoded bits2 INST2 ID2 ID1 Auxiliary Info: Decoded bits1

  13. Special Rule for Instruction ID Assignment Simplistic ID assignment inadequate Speculation + flushes, out-of-order execution PC does not work for loops Special ID assignment rule – formal proof in paper ID width: log24n bits n = max. instructions in flight e.g., 8 bits for Alpha-like processor (n=64) No timestamp or global synchronization required 13

  14. Instruction Footprint Recorder Design Instruction ID + Auxiliary info. • Dominated by memory • Simple control logic • Idle cycle compaction • Circular buffer control • Serialization • Stop / Start recording • No high-speed global routing • Contents scanned out after failure detection Post-triggersignal Circular Buffer Control Logic To slow scan chain 14

  15. What to Record? Total required storage for all recorders: 60 KBytes

  16. Error after a billion cycles (e.g., speedpath) Failure after 2 billion cycles (e.g., crash) Post-Trigger Generation Code Execution time t=0 Too much storage overhead to store 1 billion cycles

  17. Error after a billion cycles (e.g., speedpath) Failure after 2 billion cycles (e.g., crash) Post-Trigger Generation Code Execution time t=0 Early failure detection necessary Need to capture in recorder storage • Early failure detection techniques (post-triggers) • Classical error detection – residue, parity • Deadlock & segfault detection • Special early warnings to pause recording • Details in paper

  18. IFRA Area Impact • 1% chip-level area impact • Synopsys Design Compiler synthesis • Alpha 21264-like processor: 2MB L2 cache • TSMC 130nm technology • No global at-speed routing • Area dominated by circular buffers in recorders • Total recorder storage: 60 KBytes

  19. Outline • Motivation • IFRA Overview • Hardware Support • Post-Analysis Techniques • Simulation Results • Conclusion

  20. Post-Analysis Overview Test program binary Footprints from recorders Link footprints (Not covered today – Details in paper) Control-flow analysis Data-dependency analysis Decoding analysis Load/Store analysis Run high-level analysis Run low-level analysis Residue consistency check List of bug location-stimulus pairs

  21. Linking Footprints from Recorder Contents Test program binary Fetch-stage recorder Commit-stage recorder Execution-stage recorder PC6 PC5 PC4 PC3 PC2 PC0 INST6 INST5 INST4 INST3 INST2 INST0 ID: 0 PC5 … … … … … … ID: 0 AUX18 PC7 INST7 ID: 0 AUX8 ID: 7 ID: 6 ID: 5 ID: 4 ID: 7 ID: 6 ID: 5 AUX7 AUX6 AUX5 AUX4 AUX3 AUX2 AUX1 ID: 6 ID: 5 ID: 4 ID: 7 ID: 6 ID: 5 AUX17 AUX16 AUX15 AUX14 AUX12 AUX11 ID: 7 ID: 6 ID: 5 ID: 4 ID: 7 ID: 6 ID: 5 PC4 PC3 PC2 PC1 PC3 PC2 PC1 time ID: 0 PC4 ID: 0 AUX13 PC1 INST1 ID: 0 AUX0 ID: 0 PC0 ID: 0 AUX10 … … • Special ID assignment rule ensures: • Uncommitted instructions uniquely identified • Relative orders of identical IDs maintained • Even under flushes & out-of-order execution

  22. Debug Example Link footprints ? ? ? ? High-level analysis ? ? Low-level analysis ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Bug locations + exposing stimulus

  23. Debug Example – Decision 1 Test Program Binary Fetch-stage recorder … R0  R1 + R2 R0  R3 + R6 R5  R0 + R6 … Serial execution trace

  24. Debug Example – Question 1 Residue of values mismatch? … R0  R1 + R2 Producer of R0 R0  R3 + R6 Issue-stage recorder Execute-stage recorder RAW hazard R0=3 R0=5 R5  R0 + R6 Consumer of R0 … Serial execution trace

  25. Debug Example – Question 2 Residue of phys. reg. names mismatch? Dispatch-stage recorder … R0  R1 + R2 R0=P5 Producer of R0 R0  R3 + R6 R0=P2 RAW hazard R5  R0 + R6 Consumer of R0 … Serial execution trace

  26. Debug Example – Question 3 Residue of phys. reg. name match with previous producer? Dispatch-stage recorder … Previous producer R0  R1 + R2 R0=P5 R0=P5 Producer of R0 R0  R3 + R6 RAW hazard R5  R0 + R6 Consumer of R0 … Serial execution trace

  27. Debug Example – Result Pipeline Register R0  R1 + R2 R0  R3 + R6 R5  R0 + R6 … Decoder Stimulates Bug Arch. Dest. Reg Rest of pipeline reg. Bug Location Write Circuit Read Circuit … Propagates to failure Rest of modules in dispatch stage … Reg. Mapping

  28. Outline • Motivation • IFRA Overview • Simulation Results • Conclusion

  29. Experimental Setup • Simplescalar architectural simulator • Alpha 21264 configuration • Augmented with ~1K error injection points • Error model – single bit-flips • Hard-to-repeat electrical bugs • Both flip-flops & combinational logic • Stimulus • SpecInt 2000 benchmarks

  30. Experimental Flow Warm up for a million cycles 100K simulation runs 800 post-analysis runs Inject error Masked/silent error Short error latency? No Yes No Any failure detected? Post-analyze  Yes   Complete miss Exact localization Localization with candidates

  31. IFRA Bug Localization Results Exact localization (78%) Correct localization (96%)     Complete miss (4%) Localization with avg. 6 candidates (22%) • Localization resolution • Bug exposing stimulus • One of 200 erroneous design blocks • Avg. block size: 10K 2-input NAND gates

  32. Outline • Motivation • IFRA Overview • Simulation Results • Conclusion

  33. Conclusion IFRA Inexpensive 1% area, no expensive logic analyzers No failure reproduction or system simulation Effective 96% accuracy Practical Alpha processor demonstration 33

  34. Acknowledgement • Bob Gottlieb, Intel • Nagib Hakim, Intel • Ted Hong, Stanford University • Doug Josephson, Intel • Onur Mutlu, Microsoft Research • Priyadarshan Patra, Intel • Eric Rentschler, AMD • Jason Stinson, Intel

More Related