1 / 43

A Hierarchical Checkpointing Protocol for Parallel Applications in Cluster Federations

A Hierarchical Checkpointing Protocol for Parallel Applications in Cluster Federations. Sébastien Monnet , Christine Morin, Ramamurthy Badrinath PARIS Research group / IRISA Rennes IPDPS Workshop: Fault-Tolerant Parallel, Distributed and Network-Centric Systems Santa Fe. Friday, April 30th.

babu
Download Presentation

A Hierarchical Checkpointing Protocol for Parallel Applications in Cluster Federations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Hierarchical Checkpointing Protocol for Parallel Applications in Cluster Federations Sébastien Monnet, Christine Morin, Ramamurthy Badrinath PARIS Research group / IRISA Rennes IPDPS Workshop: Fault-Tolerant Parallel, Distributed and Network-Centric Systems Santa Fe. Friday, April 30th

  2. Outline • The context: fault tolerance through checkpoint / restart • The problem: going large scale • Checkpoint / restart principles • Our contribution: a hierarchical protocol

  3. Simulation Processing Display Simulation Simulation Context (1)Target applications

  4. Context (2)Cluster federation Inside clusters : High performance networks (SAN) Efficient Synchronization Between clusters LAN or WAN (high delays and low bandwidth) • Large number of nodes • Short MTBF • Heterogeneous architecture

  5. Fault tolerance • Fault: fail-stop (crash) • Two approaches • Replicate computation • Extra nodes • Checkpoint / restart protocol • Extra memory • Regular low-cost PC clusters with large memory • Checkpoint / restart

  6. P Fault Time : checkpoint Basic principles • For a single process • Saving the process state (checkpoint) • In case of fault -> the process is restarted from its last stored checkpoint • For a parallel application • Communications -> dependencies

  7. S0 P0 P1 : state Time Dependence due to a message • Lamport happened before relation • For a single process: events are totally ordered • Emission(M) -> reception(M) • Transitivity m S1

  8. B A P0 P1 D C Time Recovery line: validity • Restart: finding a recovery line m

  9. B A P0 P1 D C Time Recovery line in-transit messages • Restart: finding a recovery line • In-transit message (logging) m

  10. B A P0 P1 D C Time Recovery lineghost-messages • Restart: finding a recovery line • Ghost-message m

  11. Coordinated checkpointing • Recovery line valid by construction at checkpointing time • Simple 2-phase commit protocol • Relatively easy to implement • Drawback: synchronization • does not scale

  12. Independent checkpointing • Reconstruct a valid recovery line at rollback time • Local states stored independently • Fits to large scale • Drawbacks • Need to store multiple checkpoints (garbage collection) • Need to maintain up-to-date antecedence graphs • Domino effect: long rollback needed in practice

  13. M8 M1 M3 M5 M2 M4 M7 M6 Domino Effect P0 E C P1 F B P2 A D Time

  14. M8 M1 M3 M5 M2 M4 M7 M6 Domino Effect ? P0 E C P1 F B P2 A D Time

  15. M1 M3 M5 M2 M4 M7 M6 Domino Effect P0 E C ? P1 F B P2 A D Time

  16. M1 M3 M5 M2 M4 M6 Domino Effect P0 E C P1 F B P2 ? A D Time

  17. M1 M3 M2 M4 Domino Effect P0 C ? P1 B P2 A D Time

  18. M1 M3 M2 Domino Effect P0 C P1 B P2 ? A Time

  19. M1 M2 Domino Effect ? P0 P1 P2 A Time

  20. Domino Effect P0 P1 P2 Time

  21. Addressing the domino effect (1) • Idea: log communications • Drawback: needs assumptions upon determinism Piecewise deterministic assumption

  22. Addressing the domino effect (2) • Idea: communication-induced checkpointing • Independent checkpointing • Still need to store multiple checkpoints (garbage collection) • Force checkpoints at communication time • Additional information is piggy-backed • If the communication generates a new dependence => forced checkpoint • Updates the current recovery line

  23. A hierarchical protocol for a hierarchical architecture • The protocol needs to reflect the architecture • Relaxed inter-cluster synchronism • Principle • Intra-cluster: coordinated checkpointing • Inter-cluster: communication-induced checkpointing

  24. s1 c1 m1 m2 s2 s3 c2 Time Limit the number of forced checkpoints • It is not necessary to save a checkpoint at each receive • Force a checkpoint only if the sender has saved a checkpoint since its last send • Sequence number • Direct Dependencies Vector (DDV)

  25. Rollback algorithm • A node crashes • Its cluster rolls back • Its cluster sends a [rollback alert] to all clusters in the federation • When receiving a Rollback Alert: check the need to rollback • If rollback is needed send a Rollback Alert with the new sequence number • Else send do not need to rollback message • Loop • Wait for n messages • If one of the n messages is a Rollback Alert • Check the need to rollback (with all the piggybacked sequence number) • If rollback is needed send a Rollback Alert with the new sequence number • Else send do not need to rollback message • Else leave the loop (break)

  26. 1 0 0 0 1 0 0 0 1 x y z c1 c2 c3 Time Example Unforced checkpoint with DDV <x,y,z>

  27. 1 0 0 0 1 0 0 0 1 x y z c1 m1 1 2 0 x y z c2 c3 Time Example Unforced checkpoint with DDV <x,y,z> Forced checkpoint before the message is taken into account

  28. 1 0 0 0 1 0 0 0 1 0 0 2 x y z c1 m1 1 2 0 x y z c2 c3 Time Example Unforced checkpoint with DDV <x,y,z> Forced checkpoint before the message is taken into account

  29. 1 0 0 0 1 0 0 0 1 0 0 2 x y z c1 m2 m1 1 2 0 x y z c2 c3 Time Example Unforced checkpoint with DDV <x,y,z> Forced checkpoint before the message is taken into account

  30. 0 1 0 1 0 0 x y z 0 0 1 0 0 2 1 3 0 c1 m2 m1 1 2 0 x y z c2 c3 Time Example Unforced checkpoint with DDV <x,y,z> Forced checkpoint before the message is taken into account

  31. 0 1 0 1 0 0 x y z 0 0 1 0 0 2 1 3 0 c1 m2 m1 0 3 3 1 2 0 x y z c2 m3 c3 Time Example Unforced checkpoint with DDV <x,y,z> Forced checkpoint before the message is taken into account

  32. 2 0 0 1 3 0 0 0 2 1 0 0 0 0 1 0 1 0 x y z c1 m2 m1 3 0 3 0 3 3 x y z 1 2 0 c2 m4 m3 c3 Time Example Unforced checkpoint with DDV <x,y,z> Forced checkpoint before the message is taken into account

  33. 2 0 0 1 3 0 0 0 2 0 0 1 0 1 0 x y z 1 0 0 c1 m2 m1 3 0 3 0 3 3 1 2 0 x y z c2 m4 m3 c3 Time Example Unforced checkpoint with DDV <x,y,z> Forced checkpoint before the message is taken into account

  34. 0 0 1 x y z 1 0 0 2 0 0 0 1 0 0 0 2 1 3 0 c1 m2 m1 0 3 3 x y z 1 2 0 3 0 3 c2 Alert (3) m4 m3 c3 Time Example Unforced checkpoint with DDV <x,y,z> Forced checkpoint before the message is taken into account

  35. 0 1 0 x y z 1 3 0 0 0 2 0 0 1 1 0 0 2 0 0 c1 m2 m1 3 0 3 0 3 3 1 2 0 x y z c2 m4 c3 Alert (3) Time Example Unforced checkpoint with DDV <x,y,z> Forced checkpoint before the message is taken into account

  36. 0 1 0 1 3 0 0 0 2 0 0 1 1 0 0 2 0 0 x y z c1 m2 m1 1 2 0 x y z 0 3 3 3 0 3 c2 c3 Time Example Unforced checkpoint with DDV <x,y,z> Forced checkpoint before the message is taken into account

  37. Simulator • Realization of a discrete event simulator • Configurable • Topology • Application • Timers • Based on C++SIM library University of Newcastle upon Tyne (http://cxxsim.ncl.ac.uk) • Threads • Scheduler • Random flows

  38. Experiments: configuration ~3 000 intra-cluster messages X2 10 hours long running application Ethernet 100 between clusters message each ~4 minutes Control message each 50 minutes Cluster 1 100 nodes (Myrinet) Cluster 0 100 nodes (Myrinet)

  39. Experiments:number of forced checkpoints • Impact of Cluster 0 unforced checkpoints on Cluster 1 • No unforced checkpoints in Cluster 1

  40. Experiments:number of forced checkpoints • Increasing number of unforced checkpoints in Cluster 1

  41. Experiments:communication patterns • Increasing the number of messages from Custer 1 to Cluster 0 • Unforced checkpoints initiated each 30 minutes in each cluster

  42. Conclusion • Quasi-synchronous, hierarchical, hybrid protocol • Works well if • Few inter-cluster communications • Quasi-unidirectional inter-cluster communications • Improvements • Support for more communication patterns • Simultaneous faults in different clusters • Dynamic architecture modification • Implementation

  43. A p1 m (B) p2 B Time Optimization • The sender doesn’t need to rollback if messages are logged • Optimistic logging on the sender • Which messages to replay ? • Inter-cluster messages are acknowledged with the sequence number

More Related