110 likes | 303 Views
Digital System Reliability. What is Reliability?. A digital system is reliable if it always performs according to its specification Truth table State diagram Integrated Circuits are becoming increasingly unreliable Transistors are so small that are affected by quantum effects
E N D
What is Reliability? • A digital system is reliable if it always performs according to its specification • Truth table • State diagram • Integrated Circuits are becoming increasingly unreliable • Transistors are so small that are affected by quantum effects • Cosmic and electromagnetic radiation can cause errors
Errors and faults • Fault,error and failure are often used synonymously in the literature, but they are related by the following cause-and-effect relationship: • faults are the cause of errors • errors are the cause of failures
Faults • Faults happen at the physical level • There are many kinds of faults • Can be distinguished into permanent (manufacturing faults or damage) and transient (faults due to radiation or quantum effects)
Errors • Similarly errors can be classified as: • Hard errors (due to permanent faults) • Soft errors (due to transient faults) • Can be modeled simply as bits having an incorrect value permanently or temporarily • A reliable design should be able to recover from soft errors
Design for Reliability • Triple Modular Redundancy (TMR) • Error detecting and/or correcting codes • Parity • Hamming codes • Block codes • Cyclic Redundancy Check (CRC)
TMR • Simply triplicate the module that must be designed reliably, use majority voter to determine correct output • Assumes errors can occur to only one module at a time
Hamming distance • Hamming distance between two n-bit vectors is the number of bits in which they differ. • Example: The Hamming distance between “1010” and “0011” is 2
Error detecting codes In order to detect errors we need to add error detection bits to the data Error detecting codes are based on the idea that the data together with the error detection bits form a code, which has legal and illegal codewords. An error should change a legal codeword into an illegal one.
Parity bit The Parity bit is based on adding a single bit to the data so that the total number of ‘1’ is either odd or even Data bits Even parity code Odd parity code 000 000 0 000 1 001 001 0 001 1 010 010 1 010 0 011 011 0 011 1 100 100 1 100 0 101 101 0 101 1 110 110 0 110 1 111 111 1 111 0
Error-correcting and multiple error-detecting codes • By inserting more than one bit according to some rules, multiple errors can be detected or even corrected • Hamming codes • Bits in positions that are a power of 2 are even parity check bits • Check bits are associated with the information bits which have a 1 in the same bit in binary (starting from bit 1): • Check bit 1: information bit 1, 3, 5, 7 (001, 011, 101, 111) • Check bit 2: information bits 2, 3, 6 and 7 (010, 011, 110, 111) • Check bit 3: information bits 4, 5, 6, 7 (100, 101, 110, 111) • Example: • Information bits 0000 becomes 000 0 0 0 0 • Information bits 0101 becomes 010 1 1 0 1 • Single error correction example: • Instead of 0101101 we receive 011 1 1 0 1 • Check bit 1 yields error, check bit 2 yields correct, check bit 3 yields error. • The common information bits in check bit 1 and check bit 3 are 5 and 7, but there is no error in bit 7 (check bit 2), so the error is in bit 5 • Double error detection example: • Instead of 0101101 we receive 001 1 1 0 1 • Check bit 1 error, check bit 2 correct, check bit 3 correct: The error cannot be corrected