180 likes | 191 Views
Delve into the world of verification with insights from Bruce Wile, focusing on the importance of verification for the chip industry, the cost of bugs over time, and the challenges faced by engineers. Learn about the Line Delete Escape, ECC logic, and the critical aspects of simulation and testing processes.
E N D
Logic Verification Industry Perspective Bruce Wile IBM Server Group Verification Lead 4/2/01
What a great time to be an engineer! • Exciting work • Major effect on culture • Compensation • Industry's word for "money, morale, and benefits"
Why all the big bucks? • Basic business principle: Company that gets a product to the market first gets an inordinate share of the market revenue
Triple Constraints • Schedule • Costs • Quality
Why is verification so important to the chip industry? • Verification is the single biggest lever to positively effect the triple constraints • Fewer revsthrough the fabrication process means lower costs and faster time-to-market • Re-spinning a chip costs: • Hundreds of thousands of dollars • 6-8 weeks • So if you can get it right in fewer "passes", you WIN!!!
$ Time • The longer a bug goes undetected, the more expensive the fix • A bug found early (designer sim) has little cost • Finding a bug at Chip or System Sim has moderate cost • Requires more debug time and problem isolation • Could require new algorithm, which could effect schedule and cause rework of physical design • Finding a bug in System Test (testfloor) requires new hardware RIT • Finding a bug in the customer's environment can cost hundreds of millions in hardware and brand image Cost of Bugs Over Time
EE career choices Verification EE Design Circuits
Biggest challenges are in Verification • Circuit design process has been "fixed" • Industry-wide shortage of "good" verification engineers
The Art of Verification • Two simple questions; One huge task • Am I driving all possible input scenarios? • How will I know when it fails?
Thou shalt not move onto a higher platform until the bug rate has dropped off Thou shalt place checking upon all things Three Simulation Commandments Thou shalt stress thine logic harder than it will ever be stressed again
The Line Delete Escape • Escape: A problem that is found on the test floor (after fabrication) and therefore has escaped the verification process • The Line Delete escape was a problem on the ES/9000 machine • S/390 Bipolar, 1991 • Escape shows example of how a verification engineer needs to think
The Line Delete Escape (pg 2) • Line Delete is a method of circumventing bad cells of a large memory array or cache array • An array mapping allows for removal of defective cells within the usable space • In highly reliable servers, Error Correction Code (ECC) fixes single bit errors withing an array, and detects double bit errors
The Line Delete Escape (pg 3) If a line in an array has multiple bad bits (a single bit usually goes unnoticed due to ECC-error correction codes), the line can be taken "out of service". In the array pictured, row 05 has a bad congruence class entry. 05 . . .
ECC Logic ECC Logic Counters Data in The Line Delete Escape (pg 4) Data enters ECC creation logic prior to storage into the array. When read out, the ECC logic corrects single bit errors and tags Uncorrectable Errors (UEs), and increments a counter corresponding to the row and congruence class. 05 . . . Data out
ECC Logic ECC Logic Counters Threshhold Service Controller The Line Delete Escape (pg 5) When a preset threshhold of UEs is detected from a array cell, the service controller is informed that a line delete operation is needed. Data in 05 . . . Data out
Data in ECC Logic Line delete control Storage Controller configuration registers ECC Logic Counters Threshhold Service Controller The Line Delete Escape (pg 6) The Service controller can update the configuration registers, ordering a line delete to occur. When the configuration registers are written, the line delete controls are engaged and writes to row 5, congruence class 'C' cease. However, because three other cells remain good in this congruence class, the sole repercussion of the line delete is a slight decline in performance. 05 . . . Data out
Data in ECC Logic Line delete control Storage Controller configuration registers ECC Logic Counters Threshhold Service Controller The Line Delete Escape (pg 7) How would we test this logic? What must occur in the testcase? What checking must we implement? 05 . . . Data out