210 likes | 335 Views
Pontus Boström and Marina Waldén Åbo Akademi University/ TUCS. Development of Fault Tolerant Grid Applications Using Distributed B. Motivation. Grids have become widespread in organizations Handle large amount of information Manage computational resources
E N D
Pontus Boström and Marina Waldén Åbo Akademi University/ TUCS Development of Fault Tolerant Grid Applications Using Distributed B
Motivation • Grids have become widespread in organizations • Handle large amount of information • Manage computational resources • Difficult to implement “correct” Grid applications • Formal methods useful in order to ensure correctness of specifications • Can be difficult to implement • The specification language should take into account the features of the underlying platform • Fault tolerance also important for correctness due to the nature of the grid environment
Grids • Used for large-scale distributed systems • Scientific computing, e.g., in Physics and engineering • Business applications • Share information and computational resources over organizational boundaries • Loosely coupled systems • SOA • Client – Server architecture
Client Host Grid service Client Grid service Grid services • Services in a grid environment that can be accessed by clients • Similar to remote objects in CORBA and RMI • Remote procedures used for communication
Remote procedure call Client Grid service Notification Grid Services • Based on Web Services • XML • SOAP • WSDL • Extends Web services with • Potentially transient services containing state • Service data • Notifications • Globus Toolkit middleware
The architecture of a grid application Host 1 Host 2 Grid service Client Globus T. Globus T. OS OS RPC Notification
Faults • Remote procedure calls can fail in four different ways • The server grid service instance has crashed before the call • The network connection fails when calling a remote procedure • The server instance fails during the call • The network connection fails when returning the result • A notification can fail to arrive for two reasons • The sending grid service crashed before sending the notification • The network connection fails during sending • The client crashes when using a server grid service • The server grid service becomes an orphan
Fault tolerance using GT • No support for advanced fault tolerance mechanisms such as replication or check-pointing • Exception is raised in the caller when a call to a remote procedure fails • Not easy to know what caused the exception to be raised • That a notifications is lost can be discovered with timers in the client • The most difficult error to handle is removal of orphan grid service instances
Orphan control • Need to remove orphan grid service instances, since they waste resources • New remote procedure isAlive • Timer in both client and server grid service
Orphan control • Need to remove orphan grid service instances, since they waste resources • New remote procedure isAlive • Timer in both client and server grid service • An exception is raised in the client when a call to isAlive fails • The server grid service is deleted when a timeout occurs in it
Extension of the B Method Developed by J. R. Abrial Based on Action Systems by Back and Kurki-Suonio Related To B Action Systems Event B SYSTEMC VARIABLES x INVARIANT Inv_C INITIALISATION x := x0 EVENTS C_Evt1 = ANY u WHERE G1(u,x) THEN S1 END; C_Evt2= SELECT G2 THEN S2 END; END
Formal development of Grid applications • We like to have a formal method suitable for developing fault tolerant grid applications • Difficult to create implementable specifications of grid applications in Event B • No grid communication mechanisms such as remote procedures and notifications • No fault tolerance mechanisms • Difficult to implement due to synchronization issues and atomicity of events • We need to extend Event B with constructs for • Specifying grid services • Remote procedure calls and notifications • Fault tolerance • Extensions should be introduced in a manner that simplifies implementation
Distributed B • Provides two new types of B machines • GRIDSERVICE • GRID_REFINEMENT • Take into account grid specific features • Remote procedures • Notifications • Timeouts due to lost notifications • Exceptions due to failed calls to isAlive • Enables us to prove properties about the entire system • Are translated to ordinary B for verification • New constructs get their semantics from the translation • Automatic generation of proof obligations • Enable automatic or semi-automatic translation of the specification to a programming language
Grid service machine • Abstract specification of a grid service • A grid service machine is a template that clients obtain instances of • Compare to Classes in OO • Remote procedures • Ordinary B procedures called from a client • Events • Executed independently of a client • Notifications • Sent when all events have become disabled Grid service Remote procedures: Proc(p) Events: J1T1 J2T2 Notifications: (J1 J2) Q
Grid refinement machine (1) • A client that uses grid service machine instances • Refines GRIDSERVICE, ordinary SYSTEM or REFINEMENT • Clause for enabling dynamic management of grid service machine instances • Instances are used as variables • When a failed instance is discovered it is marked as no longer in use and deleted from the application • Clause for refining remote procedures • Clause for refining events
Grid refinement machine (2) • Special substitution used in events for making remote procedures calls • Enables the exceptions for failed calls to be handled • Special events that consists of two parts for handling notifications • First part enabled when a notification has been sent from a grid service • Second part enabled when a timeout occurs • Executed once for each notification/timeout • Special event for handling failed calls to isAlive • Enabled for each grid service instance in use. • Non-deterministically models failures of instances
The behaviour of grid components Grid refinement Grid service Events: Remote procedures: Proc(p) G1S1 Events: G2S2 J1T1 G3S3 J2T2 Notifications: (J1 J2) Q Notification handlers: NotifHandler
Grid service machine GRIDSERVICE A VARIABLES y INVARIANT Inv_A INITIALISATION y := y0 REMOTE_PROCEDURES Proc(p) = PRE P(p) THEN T END EVENTS A_Evt1 = ANY u WHERE J1(u,y) THEN T1 END; A_Evt2 = SELECT J2 THEN T2 END; NOTIFICATIONS Notif = GUARANTEES Q END END
Grid refinement machine (1) GRID_REFINEMENT C2 REFINES C1 REFERENCES A VARIABLES z,x,a_inst INVARIANT a_inst:A & Inv_C’ INITIALISATION x := x0 || z:=z0 || a_inst::A EVENTS C_Evt1 = SELECT G1’ THEN CALL a_inst.Proc(x) EXCEPTION SE END|| S1’ END; C_Evt2 = SELECT G2’ THEN S2’ END
Grid refinement machine (2) NOTIFICATION_HANDLERS NotifHandler = NOTIFICATION Notif SOURCE v:A THEN S3 TIMEOUT ST END IS_ALIVE_HANDLERS IAHandler = SOURCE v:A THEN SIA END END
Conclusions • Enables construction of correct fault tolerant grid applications • Automatic generation of proof obligations • Implementable architecture by construction • These Event B extensions can also use other middleware for distributed systems