210 likes | 380 Views
Flexible Update Propagation for Weakly Consistent Replication. Karin Petersen, Mike K. Spreitzer, Douglas B. Terry, Marvin M. Theimer and Alan J. Demers CSL Xerox PARC, ACM SIGOPS 1997 26 APR 2004 Presented by Yuseung Jeong Dept. of Computer Science, KAIST. Outline. Introduction
E N D
Flexible Update Propagation for Weakly Consistent Replication Karin Petersen, Mike K. Spreitzer, Douglas B. Terry, Marvin M. Theimer and Alan J. Demers CSL Xerox PARC, ACM SIGOPS 1997 26 APR 2004 Presented by Yuseung Jeong Dept. of Computer Science, KAIST
Outline • Introduction • Basic Anti-entropy • Effective Write-log Management • Anti-entropy Protocol Extensions • Performance Evaluation • Discussion Issues • Conclusion Flexible Update Propagation for Weakly Consistent Replication
Introduction • Entropy • A process of degradation, running down or a trend to disorder • Anti-Entropy • Brings 2 replicas up-to-date • Major Anti-Entropy Design Decisions • Pair-wise communication • Exchange of operations • Ordered propagation of operations • Weakly consistent replicated systems • Can accommodate policy choices for “when, with whom, and what data” to reconcile by relaxing data consistency Flexible Update Propagation for Weakly Consistent Replication
Features • Contribution of this paper is to demonstrate how anti-entropy design enables following features and functionalities: • Support for arbitrary communication topologies • Operation over low-bandwidth networks • Incremental progress • Eventual consistency • Efficient storage management • Light-weight management of dynamic replica sets • Arbitrary policy choices • Focus on Bayou’s anti-entropy protocol • Supports different application or user requirements for data reconciliation • Supports variable networking and computing environments Flexible Update Propagation for Weakly Consistent Replication
Data Structures for Anti-entropy • Replica: • Database • Write-log • Server: • Logical clock • Version vector (V) • Omitted version vector (O) • Commit Sequence Number (CSN) • Omitted Sequence Number (OSN) Database Committed (< CSN) Truncated (< OSN) Log A B C A B C V O Truncated Log Highest A.Clockfor server Athat is in log Highest A.Clock for server A that has been truncated … Flexible Update Propagation for Weakly Consistent Replication
Orderings • Prefix Property • If R has write Wi accepted by server X, it has all writes X accepted before Wi • Stable Order (Committed Order) • Decided by primary replica • Assigns the final CSN, which is < infinity • New CSN is propagated to nodes • Accept Order • Partial order of all writes accepted by a particular server • Accept-stamp: time-stamp or simple generation counter • Causal-Accept Order • Accept-stamp is a logical clock • Clock is advanced when a write is received that has a higher accept-stamp • Provides better chances of a node seeing the same database from different servers • If they have the same writes, even if uncommitted, will be same order Flexible Update Propagation for Weakly Consistent Replication
Basic Anti-entropy Protocol • At Server S to update receiving server R Anti-entropy (S, R) { Get R.V from receiving server R # sending server S gets version vector from R # now send all the writes unknown to R w = first write in S.write_log WHILE (w) DO # traverse S’s write-log IF R.V(w.server_id) < w.accept_stamp THEN # w is new for R SendWrite(R, w) # sends R writes not covered by vector w = next write in S.write_log END } Flexible Update Propagation for Weakly Consistent Replication
Effective Write-log Management • Write Stability by Primary-commit • Stable write: Never change or re-executed at that server • Designates one database replica as primary to stabilize write • Manages CSN (Commit Sequence Number) in the log by primary replica • CSNs and accept-stamps make committed writes be totally ordered • Propagation of Committed Writes • Propagates the commit information of writes, using CSN • Sending server inspects if receiver may miss committed • If receiver already has tentative (uncommitted) writes without knowing CSN, only commit notification is required instead of sending entire writes • Commit notification sends write’s accept stamp, server-id, and new CSN Flexible Update Propagation for Weakly Consistent Replication
Policy Tradeoffs • Write-log Truncation • Replicas truncate any prefix of the stable part of the write-log • Maintains omitted version vector (O) and omitted sequence number (OSN) • May cause complete database transfer when S.OSN > R.CSN • Storage and Networking Resource Tradeoff • Storage requirements vs. Network resources • Avoids a full database transfer • Maintains running estimates or use threshold • Rolling Back the Write-log Tradeoff • Replicas can roll its write-log forward, that is, redo rolled-back writes • Either time threshold or immediately • Latency of next read vs. Cost of near consecutive sessions Flexible Update Propagation for Weakly Consistent Replication
Bayou’s Anti-entropy Protocol (1/3) • Step 1: Decide if a full transfer is needed • S can detect missing writes if S.OSN is larger than R’s R.CSN • Enables write-log truncation Request R.V and R.CSN from receiving server R IF S.OSN > R.CSN THEN# if S truncated any needed writes, execute a full database transferRoll back S’s database to the state corresponding to S.O SendDatabase(R, S.DB) SendVector(R, S.O) # this will be R’s new R.O vector SendCSN(R, S.OSN) # R’s new R.OSN will now be S.OSN END Flexible Update Propagation for Weakly Consistent Replication
Bayou’s Anti-entropy Protocol (2/3) • Step 2: Bring R up-to-date with remaining committed writes • Enables committed writes # send all the committed writes that R does not know about IF R.CSN < S.CSN THEN # if R is missing committed writes w = first committed write that R does not yet know about WHILE (w) DO # check R’s write-log to decide to send writes or commit notifications IF w.accept-stamp <= R.V(w.server-id) THEN SendCommitNotification(R, w.accept-stamp, w.server-id, w.CSN) ELSE SendWrite(R, w) END w = next committed write in S.write-log ENDEND Flexible Update Propagation for Weakly Consistent Replication
Bayou’s Anti-entropy Protocol (3/3) • Step 3: Bring R up-to-date with remaining uncommitted writes • Basic anti-entropy for tentative writes w = first tentative write in S.write-logWHILE (w) DO # check R’s vector to see if has the write IF R.V(w.server-id) < w.accept-stamp THEN SendWrite(R, w) END w = next write in S.write-logEND Flexible Update Propagation for Weakly Consistent Replication
Anti-entropy Protocol Extensions (1/2) • Anti-entropy through Transportable Media • Off-line reconciliation by file-anti-entropy • Uses CSN and full version vector to determine anything new • Session Guarantees and Eventual Consistency • Causal order to provide Session Guarantees to applications • Each server maintains logical clock (LC) for accept-stamp (AS) • LC advances when new write accepted or higher AS received • Total order to ensure Eventual Consistency • Writes are propagated and stored according to the total order • Total order: <CSN, accept-stamp, server-id> • Server-id breaks ordering ties between writes with equal ASs Flexible Update Propagation for Weakly Consistent Replication
Anti-entropy Protocol Extensions (2/2) • Light-weight Server Creation and Retirement • Creation • Sends creation write to inform other servers of existence of new server • Gets the globally unique server-id • Retirement • Sends retirement write to itself and other server • Before removing version vector, all writes are processed • Server Si is absent from R’s version vector for two reasons • If R.V(Sk) ≥ Tk,i, then R has seen Si’s creation write (Tk,i is AS assigned by Sk) • If R.V(Sk) < Tk,iI, then R has not yet seen Si’s creation write and its retirement • Logically Complete Version Vectors • Entry for Sk in R.V is not necessary • CompleteV(Si=<Tk, i, Sk>) for Si, Sk with time stampTk,i • V(Si) if explicitly available • + if Si=0, the first server • + if CompleteV(Sk) ≥ Tk,I • - if CompleteV(Sk) < Tk,I Flexible Update Propagation for Weakly Consistent Replication
Discussion Issues • Most properties are not special in themselves, the combination is novel • Ideas can be applied to other systems (other than Bayou) • Disadvantages of anti-entropy design • Potential large size of two data structures - version vectors and write-log • Lots of policy decisions to be made • When to reconcile, with whom, when to truncate log • Periodic, manually triggered, system triggered reconciliation • Depends on server-id length, up-to-dateness, bandwidth, write-log completeness • Security • Significantly affects performance • Use security meta-data (certificates) to insure user can make update • Bayou system relies on digital certificates and a hierarchy of trust delegations Flexible Update Propagation for Weakly Consistent Replication
Performance (Execution Time) • Major performance factors • Network Transfer is the most significant cost • Anti-entropy setup • Applying the newly received writes at the receiver Flexible Update Propagation for Weakly Consistent Replication
Performance (Network Independent) • Major Factors • Insertion of all newly received writes in the receiver’s write-log • Applying the newly received writes to the receiver’s database • Determination of new writes is negligible Flexible Update Propagation for Weakly Consistent Replication
Conclusion • Protocol is practical, implemented, and simple • Presents lazily propagating updates between weakly consistent replicas rationale • Suggests basic design decisions and support diversity of networking with incremental progress • Light-weight mechanism for server creation and retirement • Separation from the protocol of the policies • Choosing pairs of replicas to reconcile • When and with which to reconcile Flexible Update Propagation for Weakly Consistent Replication
Appendix - Terminology • Bayou server • Storage system at each replica • Writes • Ordered log of updates • Components - set of updates, dependency check, merge procedure • Database • Results from the in-order execution of writes • Write-log • Contains all writes received by server from other servers or applications • Accept-stamp • Time-stamp or simple generation counter representing total order over all writes • Accept-order • Defines partial order over all writes • Prefix Property • Partial accept-order during anti-entropy to maintain a closure constraint on the set of writes known to a server Flexible Update Propagation for Weakly Consistent Replication