500 likes | 675 Views
Fault Tolerance and Replication. This power point presentation has been adapted from: (1) web.njit.edu/~ gblank /cis633/Lectures/ Replication . ppt. Content. Introduction System model and the role of group communication Fault tolerant services Case study: Bayou and Coda
E N D
Fault Tolerance and Replication This power point presentation has been adapted from: (1) web.njit.edu/~gblank/cis633/Lectures/Replication.ppt
Content • Introduction • System model and the role of group communication • Fault tolerant services • Case study: Bayou and Coda • Transaction with replicated data
Content • Introduction • System model and the role of group communication • Fault tolerant services • Case study: Bayou and Coda • Transaction with replicated data
Introduction • Replication • Duplicate limited or heavily loaded resources • to provide access and ensure access after failures • Replication is important for performance enhancement, increased availability and fault tolerance.
Introduction • Replication • Performance enhancement • Data are replicated between several originating servers in the same domain • The workload is shared between the servers by binding all the server IP addresses to the site’s DNS name • It increases performance with little cost to the system
Introduction • Replication • Increased availability • Replication is a technique for automatically maintaining the availability of data despite server failures • If data are replicated at two or more failure-independent servers, then client software may be able to access data at an alternative server should the default server fail or become unreachable
Introduction • Replication • Fault tolerance • Highly available data is not necessarily providing correct data (may be out of date) • A fault-tolerant service always guarantees the correctness of the freshness of data supplied to the client and the effects of the client’s operations upon the data
Introduction • Replication • Replication requirements: • Transparency • Users should not need to be aware that data is replicated, and the performance and utility of the information retrieval should not be noticeably different from unreplicated data • Consistency • Different copies of replicated data should be the same. When data are changed, it is distributed to all replicated servers
Content • Introduction • System model and the role of group communication • Fault tolerant services • Case study: Bayou and Coda • Transaction with replicated data
System Model & The Role of Group Communication • Introduction • The data in the system are composed of objects (e.g.,files, components, Java objects, etc.) • Each logical object is implemented by a collection of physical objects called replicas, each stored on a computer. • The replicas of a given object are not necessarily identical, at least not at any particular point in time. Some replicas may have received updates that others have not received.
System Model & The Role of Group Communication • System Model
System Model & The Role of Group Communication • System Model • Replica Managers (RM) • components that contain the objects on a particular computer and perform operations on them. • Front ends (FE) • Components that handle client’s requests • communicate with one or more of the replica managers by message passing • A front end may be implemented in the client’s address space, or it may be a separate process
System Model & The Role of Group Communication • System Model • 5 phases in the a request upon replicated objects [Wiesmannet al. 2000] • Front end requests service from one or more RMs which may communicate with the other RMs. The front end may communicate through one RM or multicast to all of them. • RMs coordinate to prepare to execute the request. This may require ordering of the operations. • RMs execute the request (may be reversible later). • RMs reach agreement on effect of the request. • One or more RMs pass a response back to the front end.
System Model & The Role of Group Communication • The role of group communication • RM in group communication is complex, especially in the case of dynamic groups. • A group membership service may be used to manage the addition and removal of replica managers, and detect and recover from crashes and faults.
System Model & The Role of Group Communication • The role of group communication • Tasks of a Group Membership Service • Provide an interface for group membership changes • Implement a failure detector • Notify members of group membership changes • Perform group address expansion for multicast delivery of messages.
Group address expansion Leave Group send Multicast Group membership Fail communication management Join Process group System Model & The Role of Group Communication • The role of group communication
Content • Introduction • System model and the role of group communication • Fault tolerant services • Case study: Bayou and Coda • Transaction with replicated data
Fault Tolerant Services • Introduction • Replicating data and functionality at replica managers can be used to provide a service that is correct despite process failures • A replication service is correct if it keeps responding despite faults • Clients can’t see the difference between a service provided by replication and one with a single copy of the data.
Fault Tolerant Services • Introduction • A criteria for replicated objects is linearizable • Every operation is synchronous • Clients must wait for one operation to complete before starting another. • A replicated shared object is sequentially consistent if for any execution interleaved operations produce a single correct copy and the order of the operations is consistent with the order in which they were performed
Fault Tolerant Services • Update process • Read-only requests have no impact on the replicated object • Update processes may need to managed properly to avoid inconsistency. • A strategy to avoid inconsistency • Make all updates to a primary copy of the data and copy that to the other replicas (passive replication). • If the primary fails, one of the backups is promoted to act as primary.
Fault Tolerant Services • Passive (primary-backup) replication
Fault Tolerant Services • Passive (primary-backup) replication • The sequence of events when a client requests an operation • Request: front end issues a request with a unique identifier to the primary replica manager. • Coordination: primary processes request atomically, checking ID for duplicate requests. • Execution: request is processed and stored. • Agreement: if an update, primary sends info to backups, which update and acknowledge. • Response: primary notifies front end, which passes information to client.
Fault Tolerant Services • Passive (primary-backup) replication • It gives fault tolerance at a cost in performance. • high overhead to updating the replicas, so it gives lower performance than non-replicated objects. • To solve this issue: • Allow read-only requests to be made to backup RMs, but send all updates to the primary. • Limited value for transaction processing systems but is very effective for decision support systems (mostly read-only requests).
Fault Tolerant Services • Active Replication
Fault Tolerant Services • Active Replication • Active Replication steps: • Request: front end attaches unique ID to request and multicasts (totally ordered, reliable) to RMs. Front end is assumed to fail only by crashing. • Coordination: every correct RM receives request in same total order. • Execution: every RM executes the request. • Coordination: (not required due to multicast) • Response: each RM sends response to front end, which manages responses depending on failure assumptions and multicast algorithm.
Fault Tolerant Services • Active Replication • The model assumes totally ordered and reliable multicasting. • This is equivalent to solving consensus, which requires either a synchronous system or a technique such as failure detectors in an asynchronous system. • The model can be simplified if updates are assumed to be commutative, so that the effect of two operations is the same in any order. • E.g. A bank account—daily deposits and withdrawals can be done in any order unless the balance goes below zero. If a process avoids overdrafts, the effects are commutative.
Content • Introduction • System model and the role of group communication • Fault tolerant services • Case study: Bayou and Coda • Transaction with replicated data
Case study: Bayou and Coda • Introduction • Implementation of replication techniques to make services highly available • Giving clients access to the service (with reasonable response times) • Fault tolerant systems send updates and all correct RMs receive updates as soon as possible. • May be unacceptable for high availability systems. • May be desirable to increase performance by providing slower (but still acceptable) updates with a minimal set of RMs. • Weaker consistency tends to require less agreement and provides more availability.
Case study: Bayou and Coda • Bayou • Is an approach to high availability • Users working in a disconnected fashion can make any updates in any partition at any time, with the updates recorded at any replica manager. • The replica managers are required to detect and manage conflicts at the time when two partitions are rejoined and the updates are merged. • Domain specific policies, called operational transformations, are used to resolve conflicts by giving priority to some partitions.
Case study: Bayou and Coda • Bayou • Bayou holds state values in a database to support queries and updates. • Updates are a special case of a transaction, using the equivalent of a stored procedure to guarantee the ACID properties. • Eventually every RM gets the same set of updates and applies them so that their databases are identical. • However, since this is delayed, in an active system with a consistent stream of updates the databases may never really be identical.
Case study: Bayou and Coda • Bayou • Bayou Update Resolution • Updates are marked as tentative when they are first applied to a database. • Once coordination with the other RMS makes it possible to resolve conflicts and place the updates in a canonical order, they are committed. • Once committed, they remain applied in their allotted order. Usually, this is achieved by designating a primary RM. • Every update includes a dependency check and follows a merge procedure.
Case study: Bayou and Coda • Bayou
Case study: Bayou and Coda • Bayou • In Bayou, replication is not transparent to the application. • Knowledge of the application semantics is required to increase data availability while maintaining a replication state that can be called eventually sequentially consistent. • Disadvantages include increased complexity for the application programmers and the users. • The operational transformation approach is particularly suited for groupware, where workers access documents remotely.
Case study: Bayou and Coda • Coda • The Coda file system is a descendent of Andrew File System (AFS) • To address several requirements that AFS does not meet – particularly the requirement to provide high availability despite disconnected operation • It was developed in a research project at Carnegie-Mellon University • Increasing users of AFS that use laptop: • A need to support disconnected use of replicated data and to increase performance and availability.
Case study: Bayou and Coda • Coda • The Coda architecture: • Coda has Venus processes at the client computers and Vice processes at the file servers. • The Vice processes are replica managers. • A set of servers holding replicas of a file volume is a volume storage group (VSG). • Clients access a subset known as the available volume storage group (AVSG), which varies as servers are connected or disconnected. • Updates are distributed by broadcasting to the AVSG after a close. • If the AVSG is empty (disconnected operation) files are cached until reconnected.
Case study: Bayou and Coda • Coda • Coda uses an optimistic replication strategy • files can be updated when the network is partitioned or during disconnected operation. • A Coda version vector (CVV) is a timestamp that is used at each site to determine whether there are any conflicts among updates at the time of reconnection. • If no conflict, updates are performed. • Coda does not attempt to resolve conflicts. • If there is a conflict, the file is marked inoperable, and the owner of the file is notified. This is done at the AVSG level, so conflicts may recur at the VSG level.
Content • Introduction • System model and the role of group communication • Fault tolerant services • Case study: Bayou and Coda • Transaction with replicated data
Transaction with Replicated Data • Introduction • Client should see that transactions on replicated objects should appear the same as on non-replicated objects • Client transactions are interleaved in a serially equivalent manner. • One-copy serializability: • If replicated object transactions are performed and the result is the similar as on a single set of objects
Transaction with Replicated Data • Introduction • 3 replication schemes for network partition: • Available copies with validation • Available copies replication is applied in each partition. When a partition is repaired, a validation procedure is applied and any inconsistencies are dealt with. • Quorum consensus: • A subgroup must have a quorum (has sufficient members) in order to be allowed to continue providing a service in the presence of a partition. When a partition is repaired (and when a replica manager restarts after a failure), replica managers get their objects up-to-date by means of recovery procedures. • Virtual partition: • A combination of quorum consensus and available copies. If a virtual partition has a quorum, it can use available copies replication.
Transaction with Replicated Data • Available copies • Allows for some RMs to be unavailable. • Updates must be made to all available replicas of the data, with provisions to restore and update a RM that has crashed.
Transaction with Replicated Data • Available copies
Transaction with Replicated Data • Available copies with validation • An optimistic approach that allows updates in different partitions of a network. • When the partition is corrected, conflicts must be detected and compensating actions must be taken. • This approach is limited to situations in which such compensation is possible.
Transaction with Replicated Data • Quorum consensus • Is a pessimistic approach to replicated transactions. • A quorum is a subgroup of RMs that is large enough to give it the right to carry out transactions even if some RMs are not available. • This limits updates to a single subset of the RMs, which update other RMs after a partition is corrected. • Gifford’s File Replication: • a Quorum scheme in which a number of votes is assigned to each copy of a replicated file. • A certain number of votes are required for either read or update operations, with writes limited to subsets of more than half the RMs. • The rest of the RMs will be updated as a background task when they are available. • Copies of data without enough read votes are considered weak copies and may be read locally with limits assumed on their currency and quality.
Transaction with Replicated Data • Virtual Partition Algorithm • This approach combines Quorum Consensus to handle partitions and Available Copies for faster read operations. • A virtual partition is an abstraction of a real partition and contains a set of replica managers.
Transaction with Replicated Data • Virtual Partition Algorithm
Transaction with Replicated Data • Virtual Partition Algorithm
Transaction with Replicated Data • Virtual Partition Algorithm • Issues: • If network partitions are intermittent, different virtual partitions can form: • Overlapping virtual partitions violate one-copy serializability. • Higher logical timestamps determine the selection of consistent virtual partitions where partitions are uncommon.
Transaction with Replicated Data • Virtual Partition Algorithm