380 likes | 397 Views
Explore how Civilian Worms maintain reliability in unstable systems, including leader election, forward progress, and application correctness in parallel environments. Learn about monitoring, replication, and the control of worm states. Discover our LE algorithm for maintaining system stability.
E N D
Civilian Worms: Ensuring Reliability in an Unreliable Environment Sanjeev R. Kulkarni University of Wisconsin-Madison sanjeevk@cs.wisc.edu Joint Work with Sambavi Muthukrishnan
Outline • Motivation and Goals • Civilian Worms • Master-Worker Model • Leader Election • Forward Progress • Correctness • Parallel Applications
What’s happening today • Move towards clusters • Resource Managers • eg. Condor • Dynamic environment
Motivation • Large Parallel/Standalone Applications • Non-Dedicated Resources • eg.:- Condor env. • Machines can disappear at any time • Unreliable commodity clusters • Hardware failures • Network Failures • Security Attacks!
What’s available • Parallel Platforms • MPI • MPI-1 :- Machines can’t go away! • MPI-2 any takers? • PVM • Shoot the master! • Condor • Shoot the Central Manager!
Goal • Bottleneck-Free infrastructure in an unreliable Environment • Ensure “normal termination” of applications • Users submit their jobs • Get e-mail upon completion!
Focus of this talk • Approaches for Reliability • Standalone Applications • Monitor framework ( worms! ) • Replication • Parallel Applications • Future work!
Worms are here again! • Usual Worms • Self replicating • Hard to detect and kill • Civilian Worms • Controlled replication • Spread legally! • Monitor applications
Desired Monitoring System W = worm C = computation
Issues • Management of worms • Distributed State detection • Very hard • Forward Progress • Checkpointing • Correctness
Management Models • Master-Worker • Simple • Effective • Our Choice! • Symmetric • Difficult to manage the model itself!
Our Implementation Model Master W = worm C = computation Workers
Worm States • Master • Maintains the state of all the worm segments • Listens on a particular socket • Respawns failed worm segments • Worker • Periodically ping the master • Starts the encapsulated process if instructed • Leader Election • Invoke the LE algorithm to elect a new master • Note:- Independent of application State
Leader Election • The woes begin! • Master goes down • Detection • Worker ping times out • Timeout value • Worker gets an LE message • Action • Worker goes into LE state
LE algorithm • Each worm segment is given an ID • Only master gives the id • Workers broadcast their ids • The worker with the lowest id wins
Brief Skeleton • While in LE • bcast LE message with your id • Set min = your id • On getting an LE message with id i • If i >= min ignore • else min = i; • min is the new Master
LE in action (1) M0 W2 W1 Master goes down!
LE in action (2) LE, 1 LE, 2 L2 L1 LE, 1 LE, 2 L1 and L2 send out LE messages
LE in action (3) COORD_ACK L2 L1 L1 gets LE, 2 and ignores it L2 gets LE, 1 and send COORD_ACK
LE in action (4) W3 spawn COORD W2 M1 M1 send COORD to W2, spawns W0
Implementation Problems • Too many cases • Many unclear cases • Time to Converge • Timeout values • Network Partition
What happens if? • Master still up? • Incoming id < self id => goes to LE mode • Else => sends back COORD message • Next master in line goes down? • Timeout on COORD message receipt • Late COORD_ACK? • Sends KILL message
More Bizarre cases • Multiple Masters? • Master bcasts its id periodically • Conflict is resolved using lowest id method • No-master? • Workers will timeout soon!
Test-Bed • 64 dual processor 550 MHz P-III nodes • Linux 2.2.12 • 2 GB RAM • Fast interconnect. 100 Mbps • Master-Worker comm. via UDP
A Stress Test for LE • Test • Worker Pings every second • Kill n/4 workers • After 1 sec, kill the master • After .5 sec kill the master in line • Kill n/4 workers again
Forward Progress • Why? • MTTF < application time • Solutions • Checkpointing • Application Level • Process level • Start from checkpoint image!
Checkpoint • Address Space • Condor Checkpoint library • Rewrites Object files • Writes checkpoint to a file on SIGUSR2 • Files • Assumption :- Common File System
Correctness • File Access • Read Only, no problems • Writes • Possible inconsistency if multiple processes access • Inconsistency across checkpoints? • Need a new File Access Algorithm
Solution: Individual Versions • File Access Algorithm • On open • If first open • read: nothing • write: create a local copy and set a mapping • Else • If mapped access mapped file • If write: create a local copy and set a mapping • Close • Preserve the mapping
File Access cont. • Commit Point • On completion of the computation • Checkpoint • Includes mapped files
Being more Fancy • Security Attacks • Civilian to Military transition • Hide yourself from the ps • Re-fork periodically to avoid detection
Conclusion • LE is VERY HARD • Don’t take it for a course project! • Does our system work? • 16 nodes: YES • 32 nodes: NO • Quite Reliable
Future Direction • Robustness • Extension to parallel programs • Re-write send/recv calls • Routing issues • Scalability issues? • A hierarchical design?
References • Cohen, F. B., ‘A Case for Benevolent Viruses’, http://www.all.net/books/integ/goodvcase.html • M. Litzkow and M. Solomon. “Supporting Checkponting and Process Migration outside the UNIX kernel”, Usenix Conference Proceedings, San Francisco, CA, January 1992. • Gurdip Singh, “Leader election in complete networks”, PPDC 92
Implementation Arch. Worm Communicator Dispatcher Dequeuer Checkpointer Remove Checkpoint Prepend Computation Append
Parallel Programs • Communication • Connectivity across failures • Re-write send/recv socket calls • Limitations of Master-Worker Model? • Not really!
Communication • Checkpoint markers • Buffer all data between checkpoint markers • Help of master in rerouting