180 likes | 237 Views
On-Chip Hardware Checking for an Out-of-Order Core. Adam Bauserman Andrew DeOrio EECS Department University of Michigan. EECS 578 Fall 2006. Outline. Problem Background: Out-of-order execution Proposed Solution Examples Lessons Learned Related Work Experimental Results Conclusion.
E N D
On-Chip Hardware Checking for an Out-of-Order Core Adam Bauserman Andrew DeOrio EECS Department University of Michigan EECS 578 Fall 2006
Outline • Problem • Background: Out-of-order execution • Proposed Solution • Examples • Lessons Learned • Related Work • Experimental Results • Conclusion Adam Bauserman Andrew DeOrio
The Problem • Widening gap between CPU complexity and verification capability • Complete pre-silicon verification is impossible • Most errors involve control and forwarding logic in the out-of-order core Adam Bauserman Andrew DeOrio
Where are Most of the Problems? • Let’s check the things that really matter • Control logic: PC control and data forwarding Image courtesy of Valeria Bertacco Adam Bauserman Andrew DeOrio
Out of Order Execution Refresher Adam Bauserman Andrew DeOrio
The Solution • On-chip runtime verification of out-of-order core Adam Bauserman Andrew DeOrio
Checker Stages Adam Bauserman Andrew DeOrio
Faults Detected • Control and program order • Improperly resolved branch • PC jump after non-branch instruction • Data forwarding / renaming • Incorrect operands (RegA or RegB values) RegB=10 (passed from EX) Should have been 20 addPC=64 multPC=48 addPC=96 br (NT)PC=20 loadPC=44 loadPC=20 Adam Bauserman Andrew DeOrio
Simplifying Assumptions • ALU and other execution units always output correct results, given correct inputs • Processor datapath is not faulty, i.e. instructions can at least pass through in the absence of data and control hazards • No errors in instruction fetch or decode logic • Exception handling works properly; instructions can be squashed on a mispredicted branch or checker error • No problems with memory subsystem; loads and stores commit properly given the correct effective address and/or data Adam Bauserman Andrew DeOrio
Example – Checker Disabled • Cycle: IF | ID | ISSUE | CK1 | CK2 | CM Reg Result • ... • 7: 4:addq | 0:- | 0:lda | 0:- | 0:- | 0:- • 8: 8:halt | 4:addq | 0:- | 0:- | 0:- | 0:- • 9: 0:- | 8:halt | 4:addq | 0:- | 0:- | 0:- • 10: 0:- | 0:- | 8:halt | 0:- | 0:- | 0:- • 11: 0:- | 0:- | 0:- | 0:lda | 0:- | 0:- • 12: 0:- | 0:- | 0:- | 4:addq | 0:lda | 0:- • 13: 0:- | 0:- | 0:- | 8:halt | 4:addq | 0:lda r2=2 • 14: 0:- | 0:- | 0:- | 0:- | 8:halt | 4:addqr4=9 • 15: 0:- | 0:- | 0:- | 0:- | 0:- | 8:halt The Program: 2+2 $r2 <- 2 /* load 2 into $r2 */ $r4 <- $r2+$r2 /* add 2+2 into $r4 */ halt Wrong Answer! Adam Bauserman Andrew DeOrio
Example – Checker Enabled Cycle: IF | ID | ISSUE | CK1 | CK2 | CM Reg Result ... 7: 4:addq | 0:- | 0:lda | 0:- | 0:- | 0:- 8: 8:halt | 4:addq | 0:- | 0:- | 0:- | 0:- 9: 0:- | 8:halt | 4:addq | 0:- | 0:- | 0:- 10: 0:- | 0:- | 8:halt | 0:- | 0:- | 0:- 11: 0:- | 0:- | 12:nop | 0:lda | 0:- | 0:- 12: 0:- | 0:- | 0:- | 4:addq | 0:lda | 0:- 13: 0:- | 0:- | 0:- | 8:halt | 4:addq | 0:lda r2=2 ++++++++++++++++++++++++++++++++++++++++ CONFLICT ++++++++++++++++++++++++++++++++++++++++ regA (reg 2, value=7, should be 2) Re-executing <PC 4:addq> alone ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 14: 0:- | 0:- | 0:- | 0:- | 0:- | 0:- 15: 0:- | 0:- | 0:- | 0:- | 0:- | 0:- 16: 0:- | 0:- | 0:- | 0:- | 0:- | 0:- 17: 4:addq | 0:- | 0:- | 0:- | 0:- | 0:- 18: 0:- | 4:addq | 0:- | 0:- | 0:- | 0:- 19: 0:- | 0:- | 4:addq | 0:- | 0:- | 0:- 20: 0:- | 0:- | 0:- | 0:- | 0:- | 0:- 21: 0:- | 0:- | 0:- | 0:- | 0:- | 0:- 22: 0:- | 0:- | 0:- | 0:- | 0:- | 0:- 23: 8:halt | 0:- | 0:- | 4:addq | 0:- | 0:- 24: 0:- | 8:halt | 0:- | 0:- | 4:addq | 0:- 25: 0:- | 0:- | 8:halt | 0:- | 4:addq | 0:- 26: 0:- | 0:- | 0:- | 0:- | 4:addq | 0:- 27: 0:- | 0:- | 0:- | 8:halt | 4:addq | 0:- 14: 0:- | 0:- | 0:- | 0:- | 8:halt | 4:addqr4=4 Correct Answer! Degraded Mode Adam Bauserman Andrew DeOrio
Lessons Learned • Verification of a verification tool is a step by step process • Individual module level testbenches • Used simple in-order pipeline for testing • Deadlocked core is bug that checker doesn’t catch • Didn’t realize this until the end Adam Bauserman Andrew DeOrio
Why is Our Method Better? • Assertion Processing [Nac03] • Does not verify all instructions, only assertions • “Where do I put my assertions?” conundrum • DIVA – Full Processor Checking [Wea01] • Recomputes result of each instruction (higher HW overhead) • Sometimes DIVA checker cannot keep up with OoO core, causing it to stall even when no error is detected • FRCL - Field Repairable Control Logic [Wag06] • Bug detection not fully automated: bugs must be found, reported and incorporated into new bug patterns Adam Bauserman Andrew DeOrio
Experimental Results • Added checker to in-order processor • Randomly injected errors using force/release in testbench • Compared simulator output to ensure all faults are caught • Gathered performance data • Checker also caught bugs in OoO during design • Data forwarding problems • Incorrect implementation (JSR instruction) • Timing issues resulting in dropped instructions Adam Bauserman Andrew DeOrio
Performance Evaluation • Checker does not decrease performance • Never causes a stall unless faults are detected • Branch mispredict penalty not increased, if using early branch resolution • Penalty on checker fault depends on pipeline depth • Number of in-flight instructions can be large (32 in our ROB) • Experimental results for out-of-order still pending • With the in-order pipeline, penalty is approximately 10 cycles per fault detected • Remains fairly constant, regardless of program characteristics Adam Bauserman Andrew DeOrio
Performance Evaluation Adam Bauserman Andrew DeOrio
Conclusion • We have developed a simple, elegant method for on-chip runtime verification of processor cores • Key advantages • Detects and corrects bugs automatically • Low hardware overhead • No CPI penalty if no faults occur • Future work • More extensive testing with out-of-order core • Add watchdog timer to recover from deadlock Adam Bauserman Andrew DeOrio
Questions? Adam Bauserman Andrew DeOrio