180 likes | 338 Views
CSS490 Replication & Fault Tolerance Textbook Ch9 (p440 – 484). Instructor: Munehiro Fukuda These slides were compiled from the course textbook and the reference books. File Replication Concepts. Difference between replication and caching
E N D
CSS490 Replication & Fault Tolerance Textbook Ch9 (p440 – 484) Instructor: Munehiro Fukuda These slides were compiled from the course textbook and the reference books. CSS490 Fault Tolerance
File ReplicationConcepts • Difference between replication and caching • A replica is associated with a server, whereas a cache with client. • A replicate focuses on availability, while a cache on locality • A replicate is more persistent than a cache is • A cache is contingent upon a replica • Advantages • Increased availability/reliability • Performance enhancement (response time and network traffic) • Scalability and autonomous operation • Requirements • Naming: no need to be aware of multiple replicas. • Consistency: data consistency among replicated files. • Replication control: explicit v.s. implicit/lazy replication • ACID: Atomicity, Consistency, Isolation, and Durability CSS490 Fault Tolerance
File ReplicationBasic Architectural Model • Request: send a client request to a server. • Coordination: deliver the request to each replica manger in some order. • Execution: process a client request but not permanently commit it. • Agreement: agree if the execution will be committed • Response: respond to the front end Client Replica Manger Front End Replica Manger Client Front End Replica Manger Ex: DNS Web server CSS490 Fault Tolerance
Group Communication • Group membership service • Create and destroy a group. • Add or withdraw a replica manager to/from a group. • Detect a failure. • Notify members of group membership changes. • Provide clients with a group address. • Message delivery • Absolute ordering • Consistent ordering Replica Manger Replica Manger Client Replica Manger Replica Manger group CSS490 Fault Tolerance
Absolute OrderingLinearizability • Rule: • Mi must be delivered before mj if Ti < Tj • Implementation: • A clock synchronized among machines • A sliding time window used to commit message delivery whose timestamp is in this window. • Example: • Distributed simulation • Drawback • Too strict constraint • No absolute synchronized clock • No guarantee to catch all tardy messages Ti < Tj Ti mi Tj mi mj mj CSS490 Fault Tolerance
Consistent (Total) OrderingSequential Consistency • Rule: • Messages received in the same order (regardless of their timestamp). • Implementation: • A message sent to a sequencer, assigned a sequence number, and finally multicast to receivers • A message retrieved in incremental order at a receiver • Example: • Replicated database update • Drawback: • A centralized algorithm Ti < Tj Ti Tj mj mj mi mi CSS490 Fault Tolerance
Coordinator Worker 1 Worker 2 INIT INIT INIT Commit Vote-request Vote-request Vote-commit Vote-request Vote-commit WAIT Vote-request Vote-abort Vote-request Vote-abort READY READY Vote-abort Global-abort Vote-commit Global-commit Global-abort Ack Global-commit Ack Global-abort Ack Global-commit Ack COMMIT ABORT COMMIT ABORT COMMIT ABORT Two-Phase Commit Protocol Another possible cases: The coordinator didn’t receive all vote-commits. → Time out and send a global-abort. A worker didn’t receive a vote-request. → All workers eventually receive a global-abort. A worker didn’t receive a global-commit. → Time out and check the other work’s status. CSS490 Fault Tolerance
Multi-copy Update Problem • Read-only replication • Allow the replication of only immutable files. • Primary backup replication • Designate one copy as the primary copy and all the others as secondary copies. • Active backup replication • Access any or all of replicas • Read-any-write-all protocol • Available-copies protocol • Quorum-based consensus CSS490 Fault Tolerance
Primary-Copy Replication • Request: The front end sends a request to the primary replica. • Coordination:. The primary takes the request atomically. • Execution: The primary executes and stores the results. • Agreement: The primary sends the updates to all the backups and receives an ask from them. • Response: reply to the front end. • Advantage: an easy implementation, linearizable, coping with n-1 crashes. • Disadvantage: large overhead especially if the failing primary must be replaced with a backup. Client Replica Manger Front End Primary Backup Replica Manger Client Front End Replica Manger Backup CSS490 Fault Tolerance
Active Replication • Request: The front end multicasts to all replicas. • Coordination:. All replica take the request in the sequential order. • Execution: Every replica executes the request. • Agreement: No agreement needed. • Response: Each replies to the front. • Advantage: achieve sequential consistency, cope with (n/2 – 1) byzantine failures • Disadvantage: no more linearizable Client Replica Manger Front End Replica Manger Client Front End Replica Manger CSS490 Fault Tolerance
Read-Any-Write-All Protocol • Read • Lock any one of replicas for a read • Write • Lock all of replicas for a write • Sequential consistency • Intolerable for even 1 failing replica upon a write. Read from any one of them Client Replica Manger Front End Replica Manger Write to all of them Client Front End Replica Manger Replica Manger CSS490 Fault Tolerance
X Available-Copies Protocol • Read • Lock any one of replicas for a read • Write • Lock all available replicas for a write • Recovering replica • Bring itself up to date by coping from other servers before accepting any user request. • Better availability • Cannot cope with network partition. (Inconsistency in two sub-divided network groups) Read from any one of them Client Replica Manger Front End Write to all available replicats Replica Manger Client Front End Replica Manger Replica Manger CSS490 Fault Tolerance
Read quorum Replica Manger Replica Manger Replica Manger Client Front End Replica Manger Replica Manger Replica Manger Client Front End Replica Manger Replica Manger Write quorum Quorum-Based Protocols • Read • Retrieve the read quorum • Select the one with the latest version. • Perform a read on it • Write • Retrieve the write quorum. • Find the latest version and increment it. • Perform a write on the entire write quorum. • If a sufficient number of replicas from read/write quorum, the operation must be aborted. #replicas in read quorum + #replicas in write quorum > n Read-any-write-all: r = 1, w = n CSS490 Fault Tolerance
multicast Joins the group p1 multicast p2 multicast rejoins crashed p3 Partially multicast messages must be discarded p4 Multicast to available processes ISIS System • Process group: see page 4 of this ppt file • Group view • Reliable multicast • Causal multicast: see pages 5 & 6 of MPI ppt file • Atomic broadcast: see page 7 of this ppt file CSS490 Fault Tolerance
Gossip Architecture If (Tj > Tk) update RMk else discard the gossip message RMk Gossip RMj (Tj) RMi (Ti) Update, Tf Update id Query, Tf Value, Ti If (Tf > Tj) update RMj else { update Client or ignore and update RMj} If (Tf < Ti) return value else { waits for RMi to be updated or query RMj/RMk} FE (Tf) FE Query Value Update Client Client CSS490 Fault Tolerance
Bayou System Committed Tentative • To make a tentative update committed: • Perform a dependency check • Check conflicts • Check priority • Merge Procedure • Cancel tentative updates • Change tentative updates Primary Tn Tn+1 C0 C1 C2 T0 T1 T2 T3 CN RM RM Sent first Sent later FE FE FE FE Tn T0 T3 T1 Client Client Client Client Executive: book 3pm Secretary and other employees: book 3pm CSS490 Fault Tolerance
Normal case: • Read-any, write-all protocol • Whenever a client writes back its file, it increments the file version at each server. • Network disconnection: • A client writes back its file to only available servers. • Version conflicts are detected and resolved automatically when network is reconnected • Client disconnection: • A client caches as many files as possible (in hoard walking). • A client works in local if disconnected (in emulation mode). • A client writes back updated files to servers (in reintegration mode). W W W Version[3,3,2] Version[3,3,2] Version[2,2,3] emulation Version[2,2,2] Version[2,2,2] Version[2,2,2] Version[1,1,1] Version[1,1,1] Version[1,1,1] hoard reintegration Coda File System Server 2 Server 1 Server 3 CSS490 Fault Tolerance
Paper Review by Students • ISIS System • Gossip Architecture • Bayou System • Coda CSS490 Fault Tolerance