250 likes | 400 Views
Design of Self-Managing Dependable Systems with UML and Fault Tolerance Patterns. Matthias Tichy , Daniela Schilling, Holger Giese. Application Example - RailCab. Application Example - RailCab. Vision: Combine individual transport cost-effectiveness of public transport
E N D
Design of Self-Managing Dependable Systems with UML and Fault Tolerance Patterns Matthias Tichy, Daniela Schilling, Holger Giese
Application Example - RailCab • Vision: Combine • individual transport • cost-effectiveness of public transport • Autonomously operating shuttles using linear drive (maglev train) • Build contact-free convoys for energy savings • Information about the exact position necessary • Computation based on the stator waves
Example service structure: Problem: position calculation service must be highlyreliable Motivation :StatorWaveSensor :PositionCalculation :ConvoyController
Problem: position calculation service must be highlyreliable Solution: self-healing, fault-tolerant software Application of software fault tolerance patterns, which capture: Abstract service structure of fault tolerance techniques Deployment restrictions (explicitly taking fault tolerance into account) Automatic deployment Initial deployment Self-healing by deployment reconfiguration during runtime Contents
Fault Tolerance Patterns • Capture the abstract structure of a well known fault tolerance technique • Example: Triple Modular Redundancy (TMR) Triple Modular Redundancy :Service1 :Provider :Multiplier :Service2 :Voter :User :Service3
Fault Tolerance Patterns - Deployment Arthur • Naïve, unrestricted deployment may yield unwanted results • Deployment must respect restrictions imposed by the fault tolerance technique (redundancy, heterogeneity) • Instead of manual deployment, enrich fault tolerance patterns by deployment restrictions <<deploy>> <<deploy>> <<deploy>> <<Service3>> :PositionCalculation <<Service1>> :PositionCalculation <<Service2>> :PositionCalculation
Deployment restrictions for the TMR pattern: Deployment restrictions Avoid single-point-of-failure of voter / multiplier -> Deploy voter and user to same node (if the user fails, the failure of the voter is no problem) Node1: Node2: :Provider :Multiplier :Voter :User Avoid crash failures -> Deploy redundant services to distinct nodes :Service1 :Service2 :Service3 Node3: Node4: Node5: Heterogeneous hardware platform -> require different CPU { Node3.CPU Node4.CPU Node4.CPU Node5.CPU Node3.CPU Node 5.CPU }
Fault Tolerance Patterns :StatorWaveSensor :PositionCalculation :ConvoyController Application of TMR pattern <<Service1>> pc1:Position Calculation <<Provider>> sen: StatorWaveSensor <<Multiplier>> mult: Multiplier <<Service2>> pc2:Position Calculation <<Voter>> vot:Voter <<User>> cc:Convoy Controller <<Service3>> pc3:Position Calculation
Use a standard constraint solver <<Service1>> pc1:Position Calculation <<Provider>> sen:Sensor <<Multiplier>> mult: Multiplier <<Service2>> pc2:Position Calculation <<Voter>> vot:Voter <<User>> cc:Convoy <<Service3>> pc3:Position Calculation Avalon Uther Gorlois Taliesin Gareth Arthur Compute a correct / reliable deployment Deployment
Mapping to a standard constraint solver Compute a correct / reliable deployment Services (i) 1: Service pc1 is deployed to node Gorlois Nodes (j) • Variables xi,jє{0,1} • Constraint: (each service is deployed to exactly one node) i Services:∑xi,j = 1 j
Restriction: Services must be executed on same node Graphically: Constraint: Compute a correct / reliable deployment <<deploys>> <<deploys>> <<Voter>> vot:Voter <<User>> cc: ConvoyController jNodes: (xvot,j+xcc,j) =2 or (xvot,j+xcc,j) =0
Initial deployment: Graphically: Compute a correct / reliable deployment pc1 pc2 pc3 Avalon Gareth Taliesin sen mul vot cc Gorlois Uther Arthur
Due to TMR, one crashed redundant service is tolerable Second crash is not tolerable Therefore: Self-heal by restarting the crashed service on a working node. But on which node? Compute it timely! Repair Gorlois <<Service1>> pc1:PositionCalculation <<deploys>> Uther <<Service2>> pc2:PositionCalculation <<deploys>> Arthur <<Service3>> pc3:PositionCalculation <<deploys>>
Solve restricted deployment problem Repair Only consider changed values Gorlois <<deploys>> Crash Failure <<Service1>> pc1:PositionCalculation • Fix variables xi,jof unaffected service/node combinations • Free variables xi,jfor affected service/node combinations Service migration should be avoided!
No solution found -> relax the problem First round Service migration necessary Second round Third round Repair • Fixed variables • Free variables
Current work Tool Support
Fault tolerance patterns capture fault tolerance techniques Contain: Structure + Deployment restrictions Graphical specification Easy application Automatic deployment Initial deployment Self-healing by deployment reconfiguration during runtime Future Work Heterogeneous service restrictions in pattern application Synthesize behavior of voter and multiplier services Use fault tolerance application knowledge for automatic fault tree analysis Conclusions / Future Work .de www.
Questions? .de www.
Restriction: Services must be deployed on distinct nodes Graphically: Constraint: Compute a correct / reliable deployment <<Service1>> pc1: PositionCalculation <<deploys>> <<Service2>> pc2: PositionCalculation <<deploys>> j Nodes: (xpc1,j+xpc2,j) <=1
Restriction: CPU of nodes for redundant service must be distinct Constraint: Compute a correct / reliable deployment xs1,j2 xs2,j2 xs3,j3 j1.cpu j2.cpu j3.cpu
Multiple applications of TMR Transform to a multiple stage arrangement with tripled voter and multiplier Deployment restrictions are transformed too Multistage <<Multiplier>> :PCMultiplier <<Voter>> :Voter <<User>> <<Service1>> cc:Convoy <<Service1>> <<Provider>> pc1:Position Calculation <<Multiplier>> :PCMultiplier <<Voter>> :Voter <<Provider>> sen:Sensor <<Multiplier>> mult: Multiplier <<Service2>> <<Provider>> pc2:Position Calculation <<User>> <<Service2>> cc:Convoy <<Service3>> <<Provider>> pc3:Position Calculation <<Multiplier>> :PCMultiplier <<Voter>> :Voter <<User>> <<Service3>> cc:Convoy
Already working: Graphical specification of fault tolerance patterns and deployment restrictions Automatic mapping to ILOG solver software Next: Runtime support Tool Support
System without fault-tolerance, node crash (1) fault tolerance pattern Introduce redundancy (TMR) but deploy two redundant service on one node, crash of that node Better deploy redundant services to distinct nodes (2) deployment restrictions for fault tolerance pattern Example with distinct nodes, crash failure, everything ok (3) which nodes Now self-heal the system by restarting a crashed failure on a new node (4) which node Motivation
Goals: Easy application of fault tolerance techniques Deployment issues taken into account Easy deployment specification Reusable deployment specification Motivation