1 / 27

Hibatűrő rendszerek tervezési mintái

Hibatűrő rendszerek tervezési mintái. Segédfóliák az Autonóm és hibatűrő inf . r sz . tárgyhoz Kocsis Imre ( ikocsis @ mit.bme.hu ) 2010.09.20. Ismétlés: singleton. Ismétlés: Facade. Ismétlés: Observer. Architekturális mintanyelv. Units of Mitigation.

Download Presentation

Hibatűrő rendszerek tervezési mintái

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.


Presentation Transcript

  1. Hibatűrő rendszerek tervezési mintái Segédfóliák az Autonóm és hibatűrő inf. rsz. tárgyhoz Kocsis Imre (ikocsis@mit.bme.hu) 2010.09.20.

  2. Ismétlés: singleton

  3. Ismétlés: Facade

  4. Ismétlés: Observer

  5. Architekturális mintanyelv

  6. Units of Mitigation • Howcanyoukeepthewholesystemfrom being unavailablewhen an erroroccurs? • „Design thesystemintopartsthatwillcontainbothanyerrors and theerrorrecovery. Choosethedivisionsthatmakesenseforyoursystem. Design the rest of thesystemaroundthesepartsthatrepresentthebasicunitsoferrormitigation.”

  7. CorrectingAudits • Faultydatacauseserrors. • „Detect and correctdataerrorsassoonaspossible. Checkrelateddataforerrors, correct and recordtheoccurence of theerror.”

  8. Redundancy • Howcanwereducetheamount of timebetweenerrordetection and theresumption of normaloperationaftererrorrecovery? • „Provideredundantcapabilitiesthatsupportquickactivationtoenableerrorprocessingtocontinuein parallel withnormalexecution.”

  9. Minimize Human Intervention • Howcanwepreventpeoplefromdoingthewrongthings and causingerrors? • „Design thesystemin a waythatit is abletoprocess and resolveerrorsautomatically, beforetheybecomefailures. Thisspeedserrorrecovery and reducestherisk of proceduralerrors.”

  10. Maximize Human Participation • Shouldthesystemignorepeopletotally? Thatwillreduceproceduralerrors. • „Knowtheuser and theiravailability. Design thesystemtoenableknowledgeableoperatingpersonneltoparticipate. […] ProvideappropriateMaintenanceInterfaces and Fault Observercapabilities […]”

  11. MaintenanceInterface • Shouldmaintenance and applicationrequests be intermingledontheapplication input and output channels? • „Provide a separateinterfacetothesystemforthe (almost) exclusiveuse of maintenanceinteractions.”

  12. SomeoneinCharge • Anythingcan go wrong, evenduringerrorprocessing. Whenthishappensthesystemmight stop doingtheerrorprocessinginadditiontonotdoingthenormalprocessing. • „All fault tolerancerelatedactivitieshavesomecomponent of thesystemthat is clearlyincharge and has theabilitytodeterminecorrectcompletion and theresponsibilitytotakeactionifitdoesnotcompletecorrectly.”

  13. Escalation • Whatdoesthesystemdowhenitsattempttoprocess an errorin a component is notacheivingthecorrecteffect? • „Whenrecoveryormitigation is failing, escalatetheactiontothenext more drasticaction.”

  14. Detektálási minták

  15. Fault Correlation • What fault is activating? • „Lookattheuniquesignature of theerrorto sort itintothe fault categoryforwhicherrorprocessingstepsareknown.”

  16. ErrorContainmentBarrier • What is thefirstthingthatthesystem must dowhenitdetects an error? • „Isolatetheerrorto a unit of mitigation. Stop theerror flow with a barrier, quarantine and initiateeithererrorrecoveryorerrormitigation.”

  17. System Monitor • Howdoesone part of a systemkeeptrackthatanother part is alive and functioning? • „Create a Monitor tostudysystembehavior, orthebehavior of specificpartsofthesystemtomakesurethattheycontinueoperatingcorrectly. Whenthewatchedcomponents stop, the monitor shouldreporttheoccurencetothe Fault Observer and initiatecorrectiveactions.”

  18. Detektálási minták

  19. ExistingMetrics • Howtomeasuretheseverity of an overloadwithoutcontributingtotheoverload? • „Usepre-existingindicatorsalready tied totheresourceas an indicator of thesystem’soverloadcondition.” • Megjegyzés: nem csak a teljesítményre igaz!

  20. Detektálási minták

  21. RoutineMaintenance • Howcanwekeeppreventableerrorsfromoccuring? • „Performroutine, preventivemaintenanceonthesystem.”

  22. Detektálási minták

  23. RoutineExercises • HowdoyouknowthatRedundantelementsthatwill be calledinto service by a Failoverincase of an errororfailurewillactuallywork? • „Routinelyexercise, orexecutethesystemcomponentsthatwill be requiredin an errorsituation. Thiswillidentifylatentfaults.”

  24. Detektálási minták

  25. Helyreállítási minták

  26. Quarantine • Howcanthesystempreventerrorsfromspreading? • „Establish a barrieraroundtheelementthatpreventsitfrombothcontributingtotheusefulwork and alsopreventsitfrompropagatingitserrorintootherparts of thesystem.”

  27. Helyreállítási minták

More Related