Coordination, Synchronization and Locking With Isis2

Coordination, Synchronization and Locking With Isis2 Cornell University Ken Birman

Isis2 has many options for coordination • Within a group of processes there are many ways in which you might want coordinated or synchronized behavior • Isis2 can support all of them, but because there are many patterns, the topic isn’t trivial!

Examples • Primary/Backup fault-tolerance • Group g receives a request • A and B are assigned to handle it • If A succeeds, it sends a reply and we’re done • If A fails, B takes over

Examples • Coordinator/Cohort fault-tolerance • Group g receives a request • A and B are assigned to handle it, but in such a way that every request has its own primary (“coordinator”) and every request has one or more backups (“cohort”) • If A succeeds, it sends a reply and we’re done • If A fails, B takes over • Updates issued to the group state by A prior to failing are visible to B when it takes over

Examples • Periodic action based on a timer • A clock is running, and every X ms, the group members jointly perform some action • They could each take some “part” of a shared task • Or the action could be performed in primary/backup style with a primary member initiating it but others standing ready to help if the primary fails

Examples • Locking • The group manages some form of data. The items have an associated key • Members can obtain a lock on the key • Mutex locks: while a member holds the key, no other member can access the key • Read/Write locks: Read locks allow other readers but Write locks exclude both readers and writers

Barrier synchronization • Group has some form of task to do • An initiator sends a request to start the members working on that task, then wishes to wait until they are all finished • The members function as “workers”. As each finishes it signals that it has finished its part

Summary of Options

Primary - Backup

… Details • Primary backup • Easiest: Form a group with 2 members • Rule: The member with rank 0 is the primary, the member with rank 1 is a backup • To update data, use the Isis2 Send primitive, but call g.Flush() before talking to an external user of the service • If the group updates databases or file storage may need to use SafeSend. This topic is covered in a different module. • If a failure occurs, a new view will signal that the backup is now the new primary • It picks up in the same state that the old primary was in

Issues with primary/backup • Notice that Isis2 lacks a way to send a multicast to the group plus one external member • So suppose an outside person asks the group to do something, like in our old picture: • Backup will knowwhat the primaryintended to do • But did the primaryactually send the response? • Did it reach the user? Time Update the monitoring and alarmscriteria for Mrs. Marsh as follows… Service instance Response delay seen by end-user would include Internet latencies Service response delay Confirmed

This problem can’t be solved! • In the Isis2 system, we can’t atomically send a message to the external user in such a way that the backup will be certain it was sent. • So… the backup must re-send the last message(s) to the user! • … but how? Time Update the monitoring and alarmscriteria for Mrs. Marsh as follows… Service instance Response delay seen by end-user would include Internet latencies Service response delay Confirmed Confirmed

UDP, TCP-R… • With UDP the backup might be able to just send the identical reply. • With TCP, the user sees the connection break and if the confirmation wasn’t received, may need to re-issue the request. The backup (the new primary) would sense that this is a repeated request and resend the old reply • Cornell has a technology called TCP-R. With it a single TCP connection can be “taken over” by the backup. With TCP-R the backup can finish the sending the primary was in the midst of doing even if it crashed

Coordinator Cohort

… More details • Coordination-Cohort • Really the identical idea, but now we relay the request into the group, using OrderedSend. • If the group updates databases or file storage may need to use SafeSend. A topic covered in a different module. • Some simple rule should be used to map requests to the members: not just “rank 0 is the primary” but “rank K will be the primary for request R” • For example, the request could contain some sort of identification data, or we could compute a hashcode • Any rule that all members can apply will do the trick • In other ways, just like primary-backup

Which multicast should we use? • If external users connect in a load-balanced way, each member can • Handle read-only work locally • Use OrderedSend to relay and work that updates the group state.. If the group updates databases or file storage may need to use SafeSend. • With OrderedSend, always call Flush if important updates were done and we are able to respond to the external user • If external users all connect to the rank-0 member then we can just use Send, but we risk overloading that member if everyone connects to the same one

Periodic Actions with Timer

Periodic Actions • Easiest solution: Have the member with rank 0 launch a thread • This thread loops • Wait for K ms (use Thread.Sleep() or a timed call to WaitOne on a semaphore that is always 0) • Then issue a g.Send to “ping everyone” • Latencies of g.Send in an otherwise idle group will be very low and the jitter even smaller • Action should be taken by everyone within a millisecond or two

Periodic Actions • Fancier solutions are also possible • There is a literature on real-time actions in fault-tolerant systems • If you are facing a “mission critical” need you might consider using such a solution • For example, every member could run its own timer, and every member could send a “ping” • On receiving “ping at time T” from a majority of members of the current view, take the action • Such a solution will be far more robust, but more costly

Read and Write Locks

Locking • Isis2 supports group-wide locks • If you want local locking, don’t use this tool • Basic API: • g.WriteLock(“name” [, timeout]). g.Lock() for short. • g.ReadLock(“name” [, timeout]) • g.Unlock(“name”) • You can also control the persistency of the lock state of the group and the way that failures are handled

Rules… • Lock requests are handled one by one in order • Locks on different “names” don’t conflict • Locks on the same name: • Write locks exclude all other locks • Read locks: allow further read locks, until a write lock is waiting. But then read locks wait behind the write lock • Timeout: Causes a “cancel” request to be sent

Handling of failures • If the lock holder fails, the default action is to “break” the lock (release it). Read locks always act this way. • For write locks, you can specify that instead that a broken lock be passed to the rank-0 group member • A lock-transfer upcall event notifies you when this occurs • You would need to code whatever handling you desire. It will run in the rank-0 member as necessary.

Lock-State Persistence • The lock manager state is stored in a data structure that can live in memory or be retained on disk • The default configuration keeps the structure in memory • If you override this and request persistent locking, we use SafeSend instead of OrderedSend, and the locking state will be saved on disk. • In the default case, if the group terminates the lock state is discarded. In the persistent case lock state is retained even across group shutdowns

When is lock persistence important? • If your database will be used across periods when the whole group shuts down, we would say that the database itself is • External (not in-memory) • Persistent • In such cases you’ll use SafeSend to update the database. And for this case may want to make the lock service state persistent too. DB1 DB2

Barrier Synchronization

Barrier Synchronization • This is easy achieved using g.Query/OrderedQuery • Query can initiate the computation, or could simply specify “which” computation you have in mind, if you have many running in parallel. • Group members use g.Reply() when they reach the barrier point. Sender waits for all to reply

Barrier Synchronization: Failures • We recommend that in the Reply you send • The size of the group view when computation started • Which rank this particular member had in the group • … just record these values when the Query arrives • Then do g.Reply(myRank, N, other data…) • Caller can thus verify that it received all N replies. If not, it knows that some member crashed

Summary of Options

Coordination, Synchronization and Locking With Isis2