180 likes | 428 Views
A survey of dependability patterns. Ingrid Buckley and Eduardo B. Fernandez Dept. of Computer Science and Engineering Florida Atlantic University Boca Raton, FL, USA January 18, 2007. Introduction. Dependability is that property of a system that allows one to rely on its service
E N D
A survey of dependability patterns Ingrid Buckley and Eduardo B. Fernandez Dept. of Computer Science and Engineering Florida Atlantic UniversityBoca Raton, FL, USA January 18, 2007
Introduction Dependability is that property of a system that allows one to rely on its service Dependability for critical systems is of utter importance in business and critical infrastructures such as hospitals, airport and the electricity grid of a country. Dependability is comprised of several pertinent aspects: • Fault Tolerance • Safety • Availability • Reliability
Introduction cont’d • Fault Tolerance as it relates to systems, software and hardware is the ability to remain operable in the presence of faults. • Safety is the prevention of catastrophic effects on the environment or the users of the system • Availability is the ability of a system to perform its functions when needed. • Reliability measures the success with which the system conforms to its specification. • We use the Unified Modeling Language (UML), to represent fault tolerance patterns.
Objectives • Classify software and hardware fault tolerance patterns according to their objectives • Analyze and evaluate the classified fault tolerance patterns • Determine how to improve upon existing patterns. • Design new fault tolerance patterns for unsupported areas within critical systems.
Background • A pattern is an encapsulated solution to a recurrent problem that solves a specific problem in a given context and can be tailored to fit different situations. • A fault is a defective value in the state of a component or in the design of a system; a fault is the manifestation of an error. An error is a defective value in an erroneous state of a system • A system failure occurs when there is a deviation from the system’s specification. A failure is the manifestation of an error. • The System Development Life Cycle (SDLC) is the entire process of formal, logical steps taken to develop software.
Fault Tolerance • A system that can mask the effects of a fault and continue operating correctly is said to be fault tolerant. • Fault tolerance requires redundancy and diversity which are directly linked to reliability and support availability of a system. • Diversity in this sense speaks of having different versions of a function or system where all have the same functionality. • The integration of hardware and software fault tolerance to cope with the various kinds of faults that can appear in a software system is a good foundation towards achieving a fault tolerant system. • There are several fault tolerance patterns that have already been written and support different levels of the system architecture. Our aim is to focus on hardware and software fault tolerant patterns.
Fault Tolerance Cont’d • Fault Tolerance patterns are a fairly new area in association with critical systems , the need for them has increased with the need to secure systems against failure caused accidentally or intentionally by attackers. • Due to the diversity of attacks on different types of systems, it is highly important to have effective fault tolerance techniques to mitigate faults that may lead to a failure in a critical system. • To prevent failures the following is required: • Detection - Detecting the occurrence of errors • Locating the unit or component where the error has occurred (diagnosis). • Masking- masking errors so as to prevent malfunctioning of the system if a fault occurs. • Containment of faults -Confine or delimit the effects of the error. • Recovery- Reconfigure the system to remove the faulty unit and erase the effects of the error.
Hardware Fault Tolerant Patterns Hardware fault tolerance applies hardware replication to enhance the system availability/reliability in the presence of hardware faults. • Hardware Fault Tolerance patterns: -The Watch Dog pattern primarily provides protection against time-based faults by creating an alarm whenever liveness messages are not received in a given time frame.
Hardware Fault Tolerant Patterns Cont’d • Fail Stop Processor : The Fail-Stop Processor pattern mainly aims at transforming errors that lead to Byzantine/complex failures, and is based on redundancy and comparing output from all replicas to reach an agreement. • Acknowledgement : The Acknowledgement pattern detects crash failures and is based on acknowledging the reception of input within a given time interval.
Software Fault Tolerant Patterns • Software fault tolerance applies software redundancy by means of diversity of design to tolerate software faults that can occur at the design, programming or maintaining phases of the software development cycle. Software Fault Tolerance patterns: • Roll forward : The Roll Forward pattern is a failure recovery pattern which detects and recovers from a fault by monitoring two replicas for errors.
Software Fault Tolerant Patterns Con’t • Input Guard : Input Guard pattern stops erroneous input from propagating the error inside a component. A guard is placed at every access point of the component to check the validity of the input. • Fault Container : The Fault Container patternprovides the same benefits as the combination of the Input Guard and the Output Guard patterns, because it prevents an error from being propagated inside and outside a given component .
Hardware/Software Fault Tolerance Pattern • The Software Redundancy Pattern deals with hardware, software and environmental faults at the same time.
Conclusion • There is a need to improve upon current Fault Tolerant Patterns based on our analysis. • New Fault Tolerance Patterns are necessary to provide dependability in distributed systems because many of the fault Tolerance patterns are very similar and do not provide a comprehensive support for errors that can lead to failure.
Future Work • Safety, Availability and Reliability Patterns being researched. • Defining areas of need where current Fault Tolerance Patterns are lacking or require improvement. • Designing new Fault Tolerance Patterns.
Recommendations and Questions Feed back: