1 / 19

ABCSG

ABCSG. Dependable Systems. Agenda. Dependable Computing Basic concepts Definitions Attributes Threads Means to attain dependability Fault prevention Fault removal Fault forecasting Fault tolerance -> Branch into techniques -> Branch into Coordinated Atomic Actions.

jerom
Download Presentation

ABCSG

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ABCSG Dependable Systems ABCSG - Dependable Systems - 01/06/2006

  2. Agenda • Dependable Computing • Basic concepts • Definitions • Attributes • Threads • Means to attain dependability • Fault prevention • Fault removal • Fault forecasting • Fault tolerance -> Branch into techniques -> Branch into Coordinated Atomic Actions ABCSG - Dependable Systems - 01/06/2006

  3. Dependable Computing - Definition • Ability to deliver service that can justifiably be trusted or • Ability of a system to avoid service failures that are more frequent or more severe than is acceptable ABCSG - Dependable Systems - 01/06/2006

  4. Dependable Computing - Attributes ABCSG - Dependable Systems - 01/06/2006

  5. Dependable Computing - Threats • Everything that can influence the system in such a way, that it will result in the system to fall outside the definition of dependable • Development phase • Physical world • Human developers • Development tools • Production and test facilities • Use phase • Physical world • Administrators • Users of services • Providers of services • Infrastructure • Intruders ABCSG - Dependable Systems - 01/06/2006

  6. Means - Fault prevention • A failure is the result of an error • An error is the result of a fault => Prevent faults = prevent failure • Basically we all know how (right?) • Information hiding • Modularization • Strongly typed languages • ... ABCSG - Dependable Systems - 01/06/2006

  7. Means - Fault removal • During development (also test fault tolerance by fault injection) • During use • Corrective maintenance • Preventive maintenance ABCSG - Dependable Systems - 01/06/2006

  8. Means - Fault forecasting • The performance of a evaluation of the system behavior with respect to fault occurrence or activation. • Qualitative evaluation • Identify the failure modes or the event combinations that would lead to system failure. • Quantitative evaluation • Identify in terms of probabilities the extent to which some of the attributes of dependability are satisfied. ABCSG - Dependable Systems - 01/06/2006

  9. Means - Fault tolerance • Fault prevention include human activities and is thus imperfect => We need fault removal • Fault removal include human activities and is thus imperfect => We need fault forecasting • Fault forecasting include human activities and is thus imperfect => We need fault tolerance • Fault tolerance include human activities and is thus imperfect => Systems will fail ... but a combination of all aforementioned techniques, can best lead to dependable computing ... so lets have a look at fault tolerance ABCSG - Dependable Systems - 01/06/2006

  10. Fault tolerance • Recall that fault tolerance is one of the means to attain dependable systems • Terminology and key concept • Fault -> Error -> Failure • Failure semantics • Redundancy • Techniques • Sequential • Independent concurrent systems • Competitive concurrent systems • Cooperative concurrent systems • Hybrid systems ABCSG - Dependable Systems - 01/06/2006

  11. Fault tolerance - Terminology and key concept • A failure is the observation of an erroneous system state • An error is an erroneous system state, which might lead to a failure • A fault is a system defect, which might lead to an error ABCSG - Dependable Systems - 01/06/2006

  12. Fault tolerance - Terminology and key concept English • A failure is a consequence of an error that is the consequence of a fault • Fault => Error => Failure Dansk • En fejl er konsekvensen af en fejl som er konsekvensen af en fejl • Fejl => Fejl => Fejl (Tænk lidt over den) ABCSG - Dependable Systems - 01/06/2006

  13. Fault tolerance - Terminology and key concept • We have a space of possibility between an error and a failure • Redundancy is the key concept ABCSG - Dependable Systems - 01/06/2006

  14. Fault tolerance- Sequential systems • Recovery blocks - redundant algorithms • Retry blocks - redundant data Acceptance test examines the system state to verify that the behavior is acceptable ABCSG - Dependable Systems - 01/06/2006

  15. Fault tolerance- Independent concurrent systems • N-Version programming - The parallel version of recovery blocks • N-Copy programming - The parallel version of retry blocks The decision mechanism must decide if one of the results can be considered correct ... and this is not an easy task ! - Multiple correct results, floating point precision ... - Exact majority voter, mean voter, consensus voter, etc... ABCSG - Dependable Systems - 01/06/2006

  16. Fault tolerance- Competitive concurrent systems • Two or more processes are not aware of each other, but share some resources • They want to live in their own environment and a fault in one process should not affect the other processes • Transactions • Atomicity / Consistency / Isolation / Durability • Provide backward error recovery • Together with exception handling, transactions can be used to provide forward error recovery • In self-checking transactional objects methods are decorated with a pre and a post condition ABCSG - Dependable Systems - 01/06/2006

  17. Fault tolerance- Cooperative concurrent systems • Several processes cooperate in executing a common job, and they are aware of each other • Conversation • Works like a transaction involving several processes • It’s an isolated environment for the participating processes, they are not allowed to communicate outside the conversation (information smuggling) • Ultimately everybody commits or rollback to the state from the beginning of the conversation - backward error recovery • Atomic actions • Is a conversation, but with the ability to do forward error recovery ABCSG - Dependable Systems - 01/06/2006

  18. Fault tolerance- Hybrid systems • Models that support both competitive and corporative concurrency • Coordinated atomic actions • An atomic action, but with the possibility of the participants to access external objects • Atomic actions to control cooperative concurrency and coordinated error recovery • Transactions to control competitive concurrency to maintain the consistency of the shared resources in case of failures ABCSG - Dependable Systems - 01/06/2006

  19. Coordinated Atomic Actions ... must be another day, I think time is up! ABCSG - Dependable Systems - 01/06/2006

More Related