1 / 59

Seminarie Informatica

Seminarie Informatica. Fault-tolerant Systems: The Software Viewpoint. A series of seminars coordinated by Vincenzo De Florio http://www.pats.ua.ac.be. The matter. The exam The topics This lecture Application-level fault tolerance provisions. Introduction to the exam.

flower
Download Presentation

Seminarie Informatica

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Seminarie Informatica Fault-tolerant Systems: The Software Viewpoint A series of seminars coordinated byVincenzo De Florio http://www.pats.ua.ac.be

  2. The matter • The exam • The topics • This lecture • Application-level fault tolerance provisions Seminarie Informatica - Lecture 1

  3. Introduction to the exam • Seminarie informatica • 10 seminars on hot topics of computer science • Topic of this cycle: software fault-tolerant systems • Next 3 seminars: 15, 22 November; 6 December • Next year seminars: to be announced on http://www.win.ua.ac.be/~vincenz/si/0607.html Seminarie Informatica - Lecture 1

  4. Introduction to the exam • Oral discussion of 2 papers • A 5–6 page paper based on one or more of the topics of the seminars • A paper with the analysis of a case study • See later for examples • Evaluation criteria: • Do the papers contain original ideas? Do they follow «too strictly» the seminar? • Does the author understand the subject? Is (s)he able to reason independently about the subject? • Papers must be submitted by May 15, 2007 • E-mail to vincenzo.deflorio@ua.ac.be Seminarie Informatica - Lecture 1

  5. The Topics Dependability =the property of a system such that reliancecan justifiably be placed on the service it delivers Fault tolerance =one of the means of dependability Seminarie Informatica - Lecture 1

  6. The Dependability Tree Seminarie Informatica - Lecture 1

  7. Fault tolerance (FT) hardware software operator I/O Fault-tolerant system is system that continues to function in spite of faults defect IC bug in program operation fault sensor drift Seminarie Informatica - Lecture 1

  8. Attributes of dependability • Availability • Readiness for usage • A(t) = probability that system is conform to specification at time t • Reliability • Continuity of service • R(t) = probability that system is conform to specifications during [t0,t], provided that so it is at t0 Seminarie Informatica - Lecture 1

  9. Attributes of dependability (2) • Safety • Non-occurrence of catastrophic consequences on environment • S(t) = probability that a system is either conform to specification, or reaches a safe halt, at time t • Fail-safe systems Seminarie Informatica - Lecture 1

  10. Attributes of dependability (3) • Maintainability • Aptitude to undergo repairs and evolution • M(t) = probability that system is back to specifications at t if failed at t0 Seminarie Informatica - Lecture 1

  11. Attributes of dependability (4) • Confidentiality • Non-occurrence of unauthorised disclosure of information • Integrity • Non-occurrence of improper alterations of information Seminarie Informatica - Lecture 1

  12. Related attributes • Testability • Ability to test features of a system • Related to maintainability Seminarie Informatica - Lecture 1

  13. Related attributes • Security • Integrity + availability + confidentiality Seminarie Informatica - Lecture 1

  14. References • Jean-Claude Laprie, “Dependable Computing and Fault Tolerance: Concepts and Terminology”, in Proc. of the 15th Int. Symposium on Fault-Tolerant Computing (FTCS-15), Ann Arbor, Mich., June 1985, pp.2-11 • Jean-Claude Laprie, “Dependability---Its Attributes, Impairments and Means”, in Predictably Dependable Computing Systems, ESPRIT Basic Research Series, B. Randell and J.-C. Laprie and H. Kopetz and B. Littlewood (eds.), Springer Verlag, 1995, pp. 3-18. Seminarie Informatica - Lecture 1

  15. The lecture • We now focus on application-level fault tolerance • Why do we need ALFT? Why do we need software FT in the first place? • We explain why • We survey the existing methods and assess their pros and cons against a set of properties • Surprising conclusion: still an open problem Seminarie Informatica - Lecture 1

  16. Software Fault Tolerance • Human society more and more expectsand relies ongood quality of complex services supplied by computers Seminarie Informatica - Lecture 1

  17. Software Fault Tolerance Performance &ease of use • Consequences of a failure in the ‘40s:(Computers as fast solvers of numerical problems) • Errors in computations, long downtimes • Consequences of a failures nowadays:(Computers controlling nuclear plants, airborneequipment, healthcare…) • Incalculable penalty (catastrophes) Seminarie Informatica - Lecture 1

  18. Software Fault Tolerance APPLICATION SW MW OS HW • Traditional answer: Hardware Fault Tolerance • This is an important ingredient, but not the only one needed today! • Complexity is also in the SW layers • Hierarchies of complex abstract machines Seminarie Informatica - Lecture 1

  19. Software Fault Tolerance • Complexity is also in SW layers (cont.’ed) • Software is often networked and distributed • Relationships among software components are often complex • Object model Þ Easier SW reuse ÞHidden + explicit Complexity Seminarie Informatica - Lecture 1

  20. Software Fault Tolerance • In conclusion: “No amount of verification, validation and testing can eliminate all faults in an application and give complete confidence in the availability and data consistency of applications” • Fault tolerance in SW is key • SW failures can have the same extent in consequences of failures in HW Ariane 5 ! Seminarie Informatica - Lecture 1

  21. Problems of SW FT APPLICATION HL RUN-TIME OS HW The lighter the color, the more general purpose the (virtual) machine The lighter the color, the more complexthe problem ofexpressing fault tolerance Seminarie Informatica - Lecture 1

  22. Problems of Application-levelFault Tolerance • “The only alternative and effective means for increasing software reliability is that of incorporating in the application software provisions for SFT” • The Application software has to manage • Functional aspects • Fault tolerance (FT) aspects at the same time / in the same space Seminarie Informatica - Lecture 1

  23. Problems and properties of Application-levelFault Tolerance • Hazard : code intrusion • FT provisions are specified side by side with the service • Conflicting design concerns • Overall design complexity gets increased • Larger development and maintenance costs & times • Larger probability of introducing software bugs Seminarie Informatica - Lecture 1

  24. Problems and properties of Application-levelFault Tolerance • Separation of design concerns ( SDC) • In what follows we call an “ALFT” a means to express fault tolerance in the application software • A criterion to compare ALFT’s is by their degree of SDC Seminarie Informatica - Lecture 1

  25. Problems and properties of Application-levelFault Tolerance • Hazard : porting code ¹porting service • FT code assumes fault model = f(e) • If e changes, or • If the code is moved to another environment e’ the QoS may degrade Seminarie Informatica - Lecture 1

  26. Problems and properties of Application-levelFault Tolerance • Hazard: porting code ¹porting service • An interesting case: Ariane 5 501 • Ariane 4 missions software re-used inAriane 5 • The early part of the trajectory of Ariane 5 differed from that of Ariane 4 and resulted in quite higher horizontal velocity values …370 Million Euros in the sink This could be a case study for the exam Seminarie Informatica - Lecture 1

  27. Problems and properties of Application-levelFault Tolerance • Problem: service portability • Porting FT comes not for free • “Hardwired ” fault model = static environment • More difficult to adapt / test / maintain • More prone to Ariane 5 - effects “ What is the most often overlooked risk in swengineering?That the environment will do something thedesigner never anticipated” [J. Horning ] Seminarie Informatica - Lecture 1

  28. Problems and properties of Application-levelFault Tolerance • Adaptability ( AD) • Does the ALFT provide means to adapt, dynamically, to new environmental conditions? • A criterion to compare 2 ALFT’s is by their degree of AD Seminarie Informatica - Lecture 1

  29. Problems and properties of Application-levelFault Tolerance • Problem: adding complexity can decrease the dependability • The ALFT (the means to express FT) must be based on a simple strategy • It must be syntactically adequate to host several mechanisms Seminarie Informatica - Lecture 1

  30. Problems and properties of Application-levelFault Tolerance • Hazard: • “Languages shape the way we think …” [Warf] • “If all you have is a hammer, everything looks like a nail” [/usr/share/fortune] • …but – is it really a nail? • Syntactical Adequacy ( SA) • Does the ALFT provide simple means to host many FT solutions? • A criterion to compare 2 ALFT’s is by their degree of SA Seminarie Informatica - Lecture 1

  31. Summary • Separation of design concerns ( SDC) • Adaptability ( AD) • Syntactical Adequacy ( SA) • A “base” of attributes we can use to compare ALFT’s with one another Seminarie Informatica - Lecture 1

  32. System structures for SFT • Single-version FT • Multiple-version FT • Object model • Linda Model • FT Languages • Recovery metaprogram Each of these could be a case study for the exam Seminarie Informatica - Lecture 1

  33. Single-version Fault Tolerance • Single-version SFT = embedding in the user application of a simplex system a set of error detection / recovery features • Explicit code intrusion (bad SDC) • Increases size and complexity (bad SA) • Bad for transparency, maintainability, portability • Increases development times and costs • No support for dynamic adaptability (bad AD) • Libraries • SwIFT, HATS, EFTOS … Seminarie Informatica - Lecture 1

  34. Multiple-version Fault Tolerance • Multiple-version SFT: NVP and RB • Idea: redundancy of software: independently designed versions of software • Randell (1975) : “All fault tolerance must be based on the provision of useful redundancy, both for error detection and error recovery. In software the redundancy required is not simple replication of programs but redundancy of design” • Assumption: random component failures. Correlated failures Þ sudden exhaustion of available redundancy • Again, Ariane 5 flight 501: two crucial components were operating in parallel with identical hardware and software… Seminarie Informatica - Lecture 1

  35. Multiple-version Fault Tolerance #include <ftmacros.h> ... ENSURE(acceptance-test) { Alternate 1; } ELSEBY { Alternate 2; } ... ENSURE; Seminarie Informatica - Lecture 1

  36. Multiple-version Fault Tolerance #include <ftmacros.h> ... NVP VERSION{ block 1; SENDVOTE(v-pointer, v-size); } VERSION{ block 2; SENDVOTE(v-pointer, v-size); } … ENDVERSION(timeout, v-size); if (!agreeon(v-pointer)) error_handler(); ENDNVP; Seminarie Informatica - Lecture 1

  37. Multiple-version Fault Tolerance • Multiple-version SFT • Implies N-fold design costs, N-fold maintenance costs • The risk of correlated failures is not negligible • Code intrusion is limited (Acceptable SDC) • System structure is fixed (Bad SA) • No support for dynamic adaptability (bad AD) • Can be combined with other means Seminarie Informatica - Lecture 1

  38. Object-centred Strategies • Strategies based on the object model • Metaobject protocols and reflection • Open implementation of the run-time executive of an OO-language • Reflection, reification • Composition filters • Each object has a set of “filters”. Messages sent to any object are trapped by its filters. These filters possibly manipulate the message before passing it to the object. Seminarie Informatica - Lecture 1

  39. Object-centred Strategies • Active objects • Objects that have control over the synchronisation of incoming requests from other objects. Objects can autonomously decide, e.g., to delay a request until it is acceptable, i.e., until a guard is met • FRIENDS, SINA, Correlate • Full separation of design concerns (Good SDC) • No code intrusion • Syntactically adequate - at least for a subset of FT strategies (Acceptable SA) Seminarie Informatica - Lecture 1

  40. Object-centred Strategies • Assumption: application written in extended OO-language • Adaptability? (Questionable AD) Seminarie Informatica - Lecture 1

  41. FT Linda Systems • Generative communication - messages are not “sent”, they are stored in a public, distributed shared memory • A shared relational database for storing and withdrawing “tuples” • Tuples: lists of objects identified by their contents, cardinality and type • A Linda process inserts, reads, and withdraws tuples via blocking or non-blocking primitives • Synchronisation: presence / absence of a matching tuple Seminarie Informatica - Lecture 1

  42. Linda • In master-worker applications • Dynamic load balancing, also in heterogeneous clusters • Inherently tolerates crash failures of workers • Single-op atomicity • Solutions: • Atomic transactions with multiple TS ops • Stable tuple space • Tuple space checkpointing, etc. Possible case study for the exam Seminarie Informatica - Lecture 1

  43. Linda • FT-Linda, Persistent Linda... • Full separation of design concerns (Good SDC) • No code intrusion • Syntactically adequate - at least for a subset of FT strategies (Acceptable SA) • Assumption: application written in Linda • Adaptability? (Questionable AD) Seminarie Informatica - Lecture 1

  44. FT Languages • FT Languages • Enhanced, pre-existing • Examples: • FT-SR • Fail-stop modules - “abstract unit of encapsulation” • Atomic execution • Composability • x-Linda (x= C, Fortran, C++, …) Seminarie Informatica - Lecture 1

  45. FT Languages • FT Languages • Novel languages • Examples: • Argus: distributed OO programming language and operating system • “Guardians”: objects performing user-definable actions in response to remote requests • Atomic transactions • FTAG: functional language based on attribute grammars Seminarie Informatica - Lecture 1

  46. FT Languages • FTAG • Computation = collection of pure mathematical functions, the modules. • Each module has a set of input values, called inherited attributes, and of output variables, called synthesized attributes. Seminarie Informatica - Lecture 1

  47. FTAG (cont.’d) • Primitive modules can be executed • Non-primitive modules require other modules to be performed first • FTAG program = decomposing a “root” module into its basic sub-modules and then applying recursively this decomposition process to each of the sub-modules (computation tree) Seminarie Informatica - Lecture 1

  48. FTAG (cont.’d) • Natural support for redoing (replacing a portion of the computation tree with a new computation) • Natural support for replication (replicated decomposition: a module is decomposed into N identical sub-modules implementing the function to replicate) Seminarie Informatica - Lecture 1

  49. FT Languages • Conclusions for FT languages • adequate separation of design concerns, transparency (good SDC) • special purpose syntax (potentially good SA) • application must be written with non standard language • bad portability • Adaptability ( AD): unknown Seminarie Informatica - Lecture 1

  50. RMP • Recovery Metaprogram • Two cooperating processing contexts • User-placed breakpoints in the user context bring to the execution of a meta-program • When the meta-program ends, control is returned to the user program • Meta-program is to be written in CSP Seminarie Informatica - Lecture 1

More Related