110 likes | 134 Views
Fault-Tolerant Computing Systems #1 Introduction. Pattara Leelaprute Computer Engineering Department Kasetsart University pattara.l@ku.ac.th. Introduction. Dependability (ความเชื่อถือได้) Trustworthiness (ความไว้วางใจ, ความเชื่อมั่น, ความเชื่อใจ) of a computer system
E N D
Fault-Tolerant Computing Systems#1 Introduction Pattara Leelaprute Computer Engineering Department Kasetsart University pattara.l@ku.ac.th
Introduction • Dependability (ความเชื่อถือได้) • Trustworthiness (ความไว้วางใจ, ความเชื่อมั่น, ความเชื่อใจ) of a computer system • Reliance can be justified by the service it delivers • Why dependability is necessary for computer? • Life critical task (lost of human life) • Patient monitoring • Missile guidance control • Air traffic system (i.e. Die Hard2 ) • Task that critically depends on computers (financial lost) • Banking systems • Stock markets • Online shopping Make a group and give an example as many as you can (10 min).
Reliability and Availability • Attributes important for dependability • Reliability (ความน่าเชื่อถือ), availability (การหามาได้), safety, security • Attributes important for fault tolerance • Reliability • Deals with continuity of services • Availability • Deals with readiness for usage
Fault Avoidance & Fault Tolerance • Fault Avoidance • Approach to prevent faults from the occurring or getting introduced into the system (direct approach) • Fault Tolerance • Approach to provide service despite the presence of faults in the system. Fault = abnormality of a component of the system
Fault Avoidance • Eliminate as many faults as possible before the system is put in use. • Has no redundancy (ความซ้ำซ้อน) • Focus on methodologies on design, testing and validation • All component must work correctly without failing, at all time. IMPOSSIBLE Manual maintenance methods are needed to repair the system when failure takes place
0 1 Failure, Fault, Error 0 1 1 0 • Fault • Abnormality of a component of the system • Cause of an error and failure • Error • Abnormal state of a component system of a system • Appearance of fault in the system • Cause offailure • Failure • The system cannot provide the desired service (behavior of system deviates from the required specification) NG 0 OK Not 100% 100% Not 100% 100%
Should be detected Type of Faults (by Duration) • Duration • Transient fault (ชั่วคราว) • Faults of limited duration (exist only in short duration) • Caused by temporary malfunction of system • Hard to detect Intermittent fault (เป็นช่วงๆ) (transient fault that occurs repeatedly in short duration) • Permanent fault (ถาวร) • Permanently exist until the faulty component is repaired • Most of techniques for fault tolerance assume that the component fail permanently
Type of Faults (by Phase) • Phase in which faults are introduced • Design fault • Introduced during system design • Introduced during modification of the system • Operational fault • Appear during the system life time, and caused due to the physical reasons
Fault Tolerance and Redundancy Goal = avoid system failure even if faults are present • Fault-tolerant system A system that can mask (ปิดบัง) an effect of fault by using redundancy • Redundancy (ความซ้ำซ้อน, การมีมากเกินไป ・A kind of redundancy is needed for fault tolerant system ・Defined as those parts of the system that are not needed for the correct functioning system (No need when the system is normal) • Space Redundancy • Hardware, Software • Time Redundancy • Extra time for performing tasks for fault tolerance
Digital Circuit Review x0 x2 x4 x6 x1 x3 x5 x7 x8 z2 z1