CSS434 Distributed Transactions and Replication Textbook Ch 14 - 15

CSS434 Distributed Transactions and Replication Textbook Ch 14 - 15 Professor: Munehiro Fukuda CSS434 Replication

Outline • Distributed transaction • Two-phase commitment protocol • File replication • Group communication revisited • Primary copy replication • Active replication • Read-any-write-all protocol • Available copy protocol • Quorum-based protocol CSS434 Replication

join openTransaction participant closeTransaction A a.withdraw(4); . . join BranchX T participant b.withdraw(T, 3); Client B b.withdraw(3); T = openTransaction BranchY join a.withdraw(4); c.deposit(4); participant b.withdraw(3); c.deposit(4); d.deposit(3); C closeTransaction D d.deposit(3); Note: the coordinator is in one of the servers, e.g. BranchX BranchZ Distributed TransactionExample: Banking Transaction CSS434 Replication

Transaction Commitment • How can all participant servers either commit a transaction or abort it? • One-phase atomic commit protocol • The coordinator keep requesting all participants to commit until they return an acknowledgment. • No chance of a participant to initiate an abort. • Two-phase commit protocol • Phase 1: calls for participants’ vote. • Phase 2: Complete a commit or an abort according to output of vet. CSS434 Replication

Two-PhaseCommit Protocoloperations • canCommit?(trans)-> Yes / No • Call from coordinator to participant to ask whether it can commit a transaction. Participant replies with its vote. • doCommit(trans) • Call from coordinator to participant to tell participant to commit its part of a transaction. • doAbort(trans) • Call from coordinator to participant to tell participant to abort its part of a transaction. • haveCommitted(trans, participant) • Call from participant to coordinator to confirm that it has committed the transaction. • getDecision(trans) -> Yes / No • Call from participant to coordinator to ask for the decision on a transaction after it has voted Yes but has still had no reply after some delay. Used to recover from server crash or delayed messages. CSS434 Replication

Coordinator Participant step status step status canCommit? prepared to commit 1 Yes (waiting for votes) 2 prepared to commit (uncertain) doCommit 3 committed haveCommitted 4 committed done Two-PhaseCommit ProtocolCommunication CSS434 Replication

Coordinator Worker 1 Worker 2 INIT INIT INIT Client_wants_to_commit CanCommit? CanCommit? Vote-Yes CanCommit? Vote-Yes WAIT CanCommit? Vote-No CanCommit? Vote-No READY READY Vote-No doAbort Vote-Yes doCommit doAbort Ack doCommit Ack doAbort Ack doCommit Ack COMMIT ABORT COMMIT ABORT COMMIT ABORT Two-Phase Commit ProtocolState Transition Another possible cases: The coordinator didn’t receive all vote-Yes. → Time out and send a doAbort. A worker didn’t receive a CanCommit?. → All workers eventually receive a doAbort. A worker didn’t receive a doCommit. → Time out and check the other work’s status. CSS434 Replication

File ReplicationConcepts • Difference between replication and caching • A replica is associated with a server, whereas a cache with client. • A replicate focuses on availability, while a cache on locality • A replicate is more persistent than a cache is • A cache is contingent upon a replica • Advantages • Increased availability/reliability • Performance enhancement (response time and network traffic) • Scalability and autonomous operation • Requirements • Naming: no need to be aware of multiple replicas. • Consistency: data consistency among replicated files. • Replication control: explicit v.s. implicit/lazy replication • ACID: Atomicity, Consistency, Isolation, and Durability CSS434 Replication

File ReplicationBasic Architectural Model • Request: send a client request to a server. • Coordination: deliver the request to each replica manger in some order. • Execution: process a client request but not permanently commit it. • Agreement: agree if the execution will be committed (ex. Two-phase commit protocol) • Response: respond to the front end Client Replica Manger Front End Replica Manger Client Front End Replica Manger Ex: DNS Web server CSS434 Replication

Review: Group Communication • Group membership service • Create and destroy a group. • Add or withdraw a replica manager to/from a group. • Detect a failure. • Notify members of group membership changes. • Provide clients with a group address. • Message delivery • Absolute ordering • Consistent ordering Replica Manger Replica Manger Client Replica Manger Replica Manger group CSS434 Replication

Deleted or delivered? Review: Group CommunicationExample: ISIS • Group view multicast Joins the group p1 multicast p2 multicast rejoins crashed p3 p4 Multicast to available processes In ISIS, if P4 receives this partially multicast message at the same time when it knows p3 has been crashed, it forwards it to all the others and immediately sends a flush message. In other words, P1, P2, and P4 receive this multicast message as if P3 was still alive. CSS434 Replication

Review: Group CommunicationAbsolute Ordering - Linearizability • Rule: • Mi must be delivered before mj if Ti < Tj • Implementation: • A clock synchronized among machines • A sliding time window used to commit message delivery whose timestamp is in this window. • Example: • Distributed simulation • Drawback • Too strict constraint • No absolute synchronized clock • No guarantee to catch all tardy messages Ti < Tj Ti mi Tj mi mj mj CSS434 Replication

Review: Group CommunicationTotal Ordering - Sequential Consistency • Rule: • Messages received in the same order (regardless of their timestamp). • Implementation: • A message sent to a sequencer, assigned a sequence number, and finally multicast to receivers • A message retrieved in incremental order at a receiver • Example: • Replicated database update • Drawback: • A centralized algorithm Ti < Tj Ti Tj mj mj mi mi CSS434 Replication

Multi-copy Update Problem • Keep in mind the basic architecture and group communication models, how can we update multiple copies over replica servers? • Read-only replication • Allow the replication of only immutable files. • Primary backup replication • Designate one copy as the primary copy and all the others as secondary copies. • Active backup replication • Access any or all of replicas • Read-any-write-all protocol • Available-copies protocol • Quorum-based consensus CSS434 Replication

Primary-Copy Replication • Request: The front end sends a request to the primary replica. • Coordination:. The primary takes the request atomically. • Execution: The primary executes and stores the results. • Agreement: The primary sends the updates to all the backups and receives an ask from them. • Response: reply to the front end. • Advantage: an easy implementation, linearizable, coping with n-1 crashes. • Disadvantage: large overhead especially if the failing primary must be replaced with a backup. Client Replica Manger Front End Primary Backup Replica Manger Client Front End Replica Manger Backup Ex: Sun NIS (Yellow Page) CSS434 Replication

Active Replication • Request: The front end multicasts to all replicas. • Coordination:. All replica take the request in the sequential order. • Execution: Every replica executes the request. • Agreement: No agreement needed. • Response: Each replies to the front. • Advantage: achieve sequential consistency, cope with (n/2 – 1) byzantine failures • Disadvantage: no more linearizable Client Replica Manger Front End Replica Manger Client Front End Replica Manger CSS434 Replication

Read-Any-Write-All Protocol • Read • Lock any one of replicas for a read • Write • Lock all of replicas for a write • Sequential consistency • Intolerable for even 1 failing replica upon a write. Read from any one of them Client Replica Manger Front End Replica Manger Write to all of them Client Front End Replica Manger Replica Manger CSS434 Replication

X Available-Copies Protocol • Read • Lock any one of replicas for a read • Write • Lock all available replicas for a write • Recovering replica • Bring itself up to date by coping from other servers before accepting any user request. • Better availability • Cannot cope with network partition. (Inconsistency in two sub-divided network groups) Read from any one of them Client Replica Manger Front End Write to all available replicats Replica Manger Client Front End Replica Manger Replica Manger CSS434 Replication

Available Copies ProtocolExample 1: Gossip If (Tj > Tk) update RMk else discard the gossip message Categorized in lazy available copies protocol Tardy messages are ignored RMk Gossip RMj (Tj) RMi (Ti) Update, Tf Update id Query, Tf Value, Ti If (Tf > Tj) update RMj else { update Client or ignore and update RMj} If (Tf < Ti) return value else { waits for RMi to be updated or query RMj/RMk} FE (Tf) FE Query Value Update Client Client CSS434 Replication

Primary RM RM Sent first Sent later FE FE FE FE Tn T0 T3 T1 Client Client Client Client Executive: book 3pm Secretary and other employees: book 3pm Available Copies ProtocolExample 2: Bayou Committed Tentative • To make a tentative update committed: • Perform a dependency check • Check conflicts • Check priority • Merge Procedure • Cancel tentative updates • Change tentative updates Categorized in lazy available copies protocol Tardy messages are reordered or merged. Tn Tn+1 C0 C1 C2 T0 T1 T2 T3 CN CSS434 Replication

Read quorum Replica Manger Replica Manger Replica Manger Client Front End Replica Manger Replica Manger Replica Manger Client Front End Replica Manger Replica Manger Write quorum Network PartitionsWell-known Solution: Quorum-Based Protocols • Read • Retrieve the read quorum • Select the one with the latest version. • Perform a read on it • Write • Retrieve the write quorum. • Find the latest version and increment it. • Perform a write on the entire write quorum. • If a sufficient number of replicas from read/write quorum, the operation must be aborted. #replicas in read quorum + #replicas in write quorum > n Read-any-write-all: r = 1, w = n CSS434 Replication

Normal case: • Read-any, write-all protocol • Whenever a client writes back its file, it increments the file version at each server. • Network disconnection: • A client writes back its file to only available servers. • Version conflicts are detected and resolved automatically when network is reconnected • Client disconnection: • A client caches as many files as possible (in hoard walking). • A client works in local if disconnected (in emulation mode). • A client writes back updated files to servers (in reintegration mode). W W W Version[3,3,2] Version[3,3,2] Version[2,2,3] emulation Version[2,2,2] Version[2,2,2] Version[2,2,2] Version[1,1,1] Version[1,1,1] Version[1,1,1] hoard reintegration Network PartitionsSystem example: Coda Server 2 Server 1 Server 3 CSS434 Replication

Paper Review by Students • ISIS System • Gossip Architecture • Bayou System • Coda • Discussions • What if a message is lost in ISIS group communication? What if another crash occurs when unstable/flush messages are exchanged? • What performance drawbacks does Gossip have? • What problems remain to users in Bayou? • Why doesn’t Coda use read/write quorum? CSS434 Replication

Coordinator Worker 1 Worker 2 INIT INIT INIT Client_wants_to_commit CanCommit? CanCommit? Vote-Yes CanCommit? Vote-Yes WAIT CanCommit? Vote-No CanCommit? Vote-No READY READY Vote-No doAbort Vote-Yes doCommit doAbort Ack doCommit Ack doAbort Ack doCommit Ack COMMIT ABORT COMMIT ABORT COMMIT ABORT Non-Turn-In Exercises • The following state transition diagram describes the two-phase commitment protocol. Let’s assume that worker1 crashed when a coordinate sent a commit message. Trace this diagram. To be specific, make appropriate dashed arrows “thick and solid arrows” with your pen or pencil. CSS434 Replication

Non-Turn-In Exercises • Textbook p762, Q17.1: In a decentralized variant of the two-phase commit protocol, the participants communicate directly with one another instead of indirectly via the coordinator. In phase 1, the coordinator sends its vote to all the participants. In phase 2, if the coordinator’s vote is No, the participants just abort the transaction; if it is Yes, each participant sends its vote to the coordinator and the other participants, each of which decides on the outcome according to the vote and carries it out. Calculate the number of messages and the number of rounds it takes. What are its advantages or disadvantages in comparison with the centralized variant? • Textbook p816, Q18.10: Explain why allowing backups to process read operations directly, (i.e., without contacting a primary), leads to sequentially consistent rather than linearizable executions in a primary-copy replication. • Textbook p816, Q18.11: Could the gossip architecture be used for a distributed computer game as describe below? • The players move figures around a common scene. The state of the game is replicated at the players’ workstations and at a server, which contains services controlling the game overall, such as collision detection. Updates are multicast to all replicas. • The quorum-based replication protocol can address network partition problems. Why didn’t Coda use this protocol? Explain the reason. • What if a message is lost in ISIS group communication? Describe a solution. CSS434 Replication

CSS434 Distributed Transactions and Replication Textbook Ch 14 - 15

CSS434 Distributed Transactions and Replication Textbook Ch 14 - 15

Presentation Transcript

CSS434 Distributed Objects and Remote Invocation Textbook Ch5

CSS434 Networking Textbook Ch3

CSS434 Process Migration Textbook 7.4.2 and Non-Textbook Contents

Distributed Transactions

MongoDB: Ch 9-15 Replication and Sharding

CSS434 Distributed Shared Memory Textbook Ch18

CSS434 Grid Computing Textbook No Corresponding Chapters

CSS434 System Models Textbook Ch2

DNA Structure and Replication Ch. 14

Distributed Transactions

Distributed Transactions

DNA Structure and Replication Ch. 14

Distributed Transactions

Distributed Transactions

Distributed Transactions

Distributed Transactions

Transactions with Replication

Distributed Transactions

Distributed Transactions

Distributed Transactions

Ch. 14 and 15 Review

Distributed Transactions