100 likes | 402 Views
Replicated State Machines. ITV Model-based Analysis and Design of Embedded Software Techniques and methods for Critical Software Anders P. Ravn Aalborg University September 2011. A simple State machine. Object-oriented. Message-oriented. process SM{ (m,args)= getMessage();
E N D
Replicated State Machines ITV Model-based Analysis and Design of Embedded Software Techniques and methods for Critical Software Anders P. Ravn Aalborg University September 2011
A simple State machine Object-oriented Message-oriented process SM{ (m,args)= getMessage(); switch m { case m_1: ... sendMessage(OB,m,arg) ... ... } class SM { void method m_1(par_1) { ... OB.m(arg); ... } ... } Note: Asynchronous communication, cf. Module 1
Constraints • Asynchronous message passing (unbounded buffering). Thus it must be proved no buffer-overflow for an implementation. • No timing (delays, timeouts) in state machines. State machines are scheduled as a set of periodic or sporadic processes
Fault Tolerance • Byzantine failures: SMs may fail in any way. Requires 2t+1 replicas to tolerate t failures. • Fail-stop failures: Failing processors stop and the stop state is detectable. Only t+1 replicas needed.
Agreement and Order • Every request message is received by every non-faulty processor. This requires reliable message passing – a fault in a particular link translates to a byzantine failure for the receiving state machine • Requests are processed in order. Requests sent from same destination cannot overtake each other. Cf. TCP and UDP in Internet
Agreement IC1: Select a non-faulty transmitter IC2: Ensure that the value sent by the transmitter is recieved by all other non-faulty processors The difficult part is implementing a move of the transmitter, cf. Token rings. Alternative. Broadcasts
Watch-dogs for Fail-stop Logical clock stability test
Dynamic Configurations C – clients S – state machines O – output devices This state machine could be the watch dog.
Integration after repair • Resynchronization with getting a check-pointed state from a replica. • Alignment with received messages.
Perspective • A general paradigm suitable for highly critical distributed processing. • Fail-stop may be feasible for medium level criticality. • Both may become cost-efficient in a multi-core setting. Requires highly dependable hardware and kernel support.