1 / 18

On-Chip Hardware Checking for an Out-of-Order Core

On-Chip Hardware Checking for an Out-of-Order Core. Adam Bauserman Andrew DeOrio EECS Department University of Michigan. EECS 578 Fall 2006. Outline. Problem Background: Out-of-order execution Proposed Solution Examples Lessons Learned Related Work Experimental Results Conclusion.

clem
Download Presentation

On-Chip Hardware Checking for an Out-of-Order Core

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On-Chip Hardware Checking for an Out-of-Order Core Adam Bauserman Andrew DeOrio EECS Department University of Michigan EECS 578 Fall 2006

  2. Outline • Problem • Background: Out-of-order execution • Proposed Solution • Examples • Lessons Learned • Related Work • Experimental Results • Conclusion Adam Bauserman Andrew DeOrio

  3. The Problem • Widening gap between CPU complexity and verification capability • Complete pre-silicon verification is impossible • Most errors involve control and forwarding logic in the out-of-order core Adam Bauserman Andrew DeOrio

  4. Where are Most of the Problems? • Let’s check the things that really matter • Control logic: PC control and data forwarding Image courtesy of Valeria Bertacco Adam Bauserman Andrew DeOrio

  5. Out of Order Execution Refresher Adam Bauserman Andrew DeOrio

  6. The Solution • On-chip runtime verification of out-of-order core Adam Bauserman Andrew DeOrio

  7. Checker Stages Adam Bauserman Andrew DeOrio

  8. Faults Detected • Control and program order • Improperly resolved branch • PC jump after non-branch instruction • Data forwarding / renaming • Incorrect operands (RegA or RegB values) RegB=10 (passed from EX) Should have been 20 addPC=64 multPC=48 addPC=96 br (NT)PC=20 loadPC=44 loadPC=20 Adam Bauserman Andrew DeOrio

  9. Simplifying Assumptions • ALU and other execution units always output correct results, given correct inputs • Processor datapath is not faulty, i.e. instructions can at least pass through in the absence of data and control hazards • No errors in instruction fetch or decode logic • Exception handling works properly; instructions can be squashed on a mispredicted branch or checker error • No problems with memory subsystem; loads and stores commit properly given the correct effective address and/or data Adam Bauserman Andrew DeOrio

  10. Example – Checker Disabled • Cycle: IF | ID | ISSUE | CK1 | CK2 | CM Reg Result • ... • 7: 4:addq | 0:- | 0:lda | 0:- | 0:- | 0:- • 8: 8:halt | 4:addq | 0:- | 0:- | 0:- | 0:- • 9: 0:- | 8:halt | 4:addq | 0:- | 0:- | 0:- • 10: 0:- | 0:- | 8:halt | 0:- | 0:- | 0:- • 11: 0:- | 0:- | 0:- | 0:lda | 0:- | 0:- • 12: 0:- | 0:- | 0:- | 4:addq | 0:lda | 0:- • 13: 0:- | 0:- | 0:- | 8:halt | 4:addq | 0:lda r2=2 • 14: 0:- | 0:- | 0:- | 0:- | 8:halt | 4:addqr4=9 • 15: 0:- | 0:- | 0:- | 0:- | 0:- | 8:halt The Program: 2+2 $r2 <- 2 /* load 2 into $r2 */ $r4 <- $r2+$r2 /* add 2+2 into $r4 */ halt Wrong Answer! Adam Bauserman Andrew DeOrio

  11. Example – Checker Enabled Cycle: IF | ID | ISSUE | CK1 | CK2 | CM Reg Result ... 7: 4:addq | 0:- | 0:lda | 0:- | 0:- | 0:- 8: 8:halt | 4:addq | 0:- | 0:- | 0:- | 0:- 9: 0:- | 8:halt | 4:addq | 0:- | 0:- | 0:- 10: 0:- | 0:- | 8:halt | 0:- | 0:- | 0:- 11: 0:- | 0:- | 12:nop | 0:lda | 0:- | 0:- 12: 0:- | 0:- | 0:- | 4:addq | 0:lda | 0:- 13: 0:- | 0:- | 0:- | 8:halt | 4:addq | 0:lda r2=2 ++++++++++++++++++++++++++++++++++++++++ CONFLICT ++++++++++++++++++++++++++++++++++++++++ regA (reg 2, value=7, should be 2) Re-executing <PC 4:addq> alone ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 14: 0:- | 0:- | 0:- | 0:- | 0:- | 0:- 15: 0:- | 0:- | 0:- | 0:- | 0:- | 0:- 16: 0:- | 0:- | 0:- | 0:- | 0:- | 0:- 17: 4:addq | 0:- | 0:- | 0:- | 0:- | 0:- 18: 0:- | 4:addq | 0:- | 0:- | 0:- | 0:- 19: 0:- | 0:- | 4:addq | 0:- | 0:- | 0:- 20: 0:- | 0:- | 0:- | 0:- | 0:- | 0:- 21: 0:- | 0:- | 0:- | 0:- | 0:- | 0:- 22: 0:- | 0:- | 0:- | 0:- | 0:- | 0:- 23: 8:halt | 0:- | 0:- | 4:addq | 0:- | 0:- 24: 0:- | 8:halt | 0:- | 0:- | 4:addq | 0:- 25: 0:- | 0:- | 8:halt | 0:- | 4:addq | 0:- 26: 0:- | 0:- | 0:- | 0:- | 4:addq | 0:- 27: 0:- | 0:- | 0:- | 8:halt | 4:addq | 0:- 14: 0:- | 0:- | 0:- | 0:- | 8:halt | 4:addqr4=4 Correct Answer! Degraded Mode Adam Bauserman Andrew DeOrio

  12. Lessons Learned • Verification of a verification tool is a step by step process • Individual module level testbenches • Used simple in-order pipeline for testing • Deadlocked core is bug that checker doesn’t catch • Didn’t realize this until the end Adam Bauserman Andrew DeOrio

  13. Why is Our Method Better? • Assertion Processing [Nac03] • Does not verify all instructions, only assertions • “Where do I put my assertions?” conundrum • DIVA – Full Processor Checking [Wea01] • Recomputes result of each instruction (higher HW overhead) • Sometimes DIVA checker cannot keep up with OoO core, causing it to stall even when no error is detected • FRCL - Field Repairable Control Logic [Wag06] • Bug detection not fully automated: bugs must be found, reported and incorporated into new bug patterns Adam Bauserman Andrew DeOrio

  14. Experimental Results • Added checker to in-order processor • Randomly injected errors using force/release in testbench • Compared simulator output to ensure all faults are caught • Gathered performance data • Checker also caught bugs in OoO during design • Data forwarding problems • Incorrect implementation (JSR instruction) • Timing issues resulting in dropped instructions Adam Bauserman Andrew DeOrio

  15. Performance Evaluation • Checker does not decrease performance • Never causes a stall unless faults are detected • Branch mispredict penalty not increased, if using early branch resolution • Penalty on checker fault depends on pipeline depth • Number of in-flight instructions can be large (32 in our ROB) • Experimental results for out-of-order still pending • With the in-order pipeline, penalty is approximately 10 cycles per fault detected • Remains fairly constant, regardless of program characteristics Adam Bauserman Andrew DeOrio

  16. Performance Evaluation Adam Bauserman Andrew DeOrio

  17. Conclusion • We have developed a simple, elegant method for on-chip runtime verification of processor cores • Key advantages • Detects and corrects bugs automatically • Low hardware overhead • No CPI penalty if no faults occur • Future work • More extensive testing with out-of-order core • Add watchdog timer to recover from deadlock Adam Bauserman Andrew DeOrio

  18. Questions? Adam Bauserman Andrew DeOrio

More Related