10 likes | 119 Views
Server Reliability Categories and Criteria: Doug Bossen, IBM at IRPS Tutorial 2002 Clarification – Pia Sanda, 2/15/05 HPCA (sanda@us.ibm.com). Undetected errors corrupting customer data Typical system target: 1000 yr MTBF (114 FITS) Detected errors causing system termination
E N D
Server Reliability Categories and Criteria:Doug Bossen, IBM at IRPS Tutorial 2002Clarification – Pia Sanda, 2/15/05 HPCA (sanda@us.ibm.com) • Undetected errors corrupting customer data • Typical system target: 1000 yr MTBF (114 FITS) • Detected errors causing system termination • Corrupt data/control state in a global resource • Target: 25 yr MTBF (4500 FITS unrecovered) • Detected errors causing application or partition termination • Corrupt data/control state in a local resource • Target: 10 yr MTBF (11400 FITS unrecovered) MTBF numbers above are for a HE system scenario, not intended to be targets for Power4 or any other IBM product. Above is for the WHOLE SYSTEM: All SYSTEM HARDWARE and Operational SOFTWARE (not including customer applications) HW: microprocessors, memory, all other active components, interconnects, HDD, power, etc. Above is for both HARD and SOFT ERRORS. Apportionment of single processor chip cannot be inferred from above scenario. Transient particle induced contribution is but a small part of the scenario targets.