240 likes | 360 Views
Highly Available Services and Transactions with Replicated Data Jason Lenthe. Highly Available Services (1 of 17). What is availability ? The percentage of time that a service is “up†What is highly available? Availability close to 100% With reasonable response times
E N D
Highly Available Services and Transactions with Replicated Data Jason Lenthe
Highly Available Services (1 of 17) • What is availability? • The percentage of time that a service is “up” • What is highly available? • Availability close to 100% • With reasonable response times • May not conform to sequential consistency
Highly Available Services (2 of 17) • The Gossip Architecture • Is a framework for implementing highly available services • Replica Managers periodically “gossip” with each to convey updates they have received RM RM RM Gossip Front Ends Clients
Highly Available Services (3 of 17) • The Gossip Architecture (con't) • Outline for Processing Queries and Updates: • 1) Request – Front end sends request to a replica manager • 2) Update Reponse – If request is a update, the replica manager replies when it has the request • 3) Coordination – Replica managers “gossip” (send gossip messages) • 4) Execution – Replica manager executes the request • 5) Query Response – If request is a query, the replica manager responds • 6) Agreement – More gossip messages may be sent out. Generally, a lazy approach is taken
Highly Available Services (4 of 17) • The Gossip Architecture (con't) • Each Front End maintains a vector timestamp for each value that it has accessed • Contains last update for each replica manager • Is sent as part of a query/update • Each Replica Manager uses the received vector timestamp is find out if they are up-to-date • If they are not, they can wait for updates or request them explicitly
Highly Available Services (5 of 17) • The Gossip Architecture (con't) • Examples of the Gossip architecture? • Textbook is skimpy on examples in this section, but... • Suggests a bulletin board service • Clients may have a different view of the bulletin board at any time, if the network is partitioned • All messages will eventually be propagated to each replication manager
Highly Available Services (6 of 17) • The Gossip Architecture – Conclusions • Clients can operate when is partitioned network (as long as 1 replica manager is accessible). • Lazy approach makes it inappropriate for near-real time collaboration • Not particularly scalable • 2 + (R – 1)/G equals the number of messages transmitted per update where R = number of replica manager and G = number of updates packed into a gossip message
Highly Available Services (7 of 17) • The Bayou System • Another framework for providing highly available services • Uses Operational Transformation • Allows domain-specific conflict detection and conflict resolution
Highly Available Services (8 of 17) • The Bayou System (con't) • Updates have two states: • Tentative – may be undone or reapplied as the system becomes consistent • Committed – cannot be undone
Highly Available Services (9 of 17) • The Bayou System (con't) • Uses application specific dependency checks and merge procedures • Dependency checks determine is an new update conflicts with an update that has already been applied • Merge procedure produces a new update that does not conflict with the previous update
Highly Available Services (10 of 17) • The Bayou System – Conclusions • Uses application-specific logic to produce an eventually sequentially consistent state • Complicated for the application programmer and user • Programmer needs to provide dependency check and merge procedures • User needs to deal with tentative data • Generally limited to applications where • Conflicts are rare • Data semantics are simple
Highly Available Services (11 of 17) • The Coda File System • Coda is basically a highly available version of AFS • Aims to provide constant data availability • Good for mobile environments • Follows an optimistic strategy – conflicts are not likely
Highly Available Services (12 of 17) • The Coda File System (con't) • Architecture • Venus – client process • Vice – server process • Volume Storage Group (VSG) – the set of servers that have a copy of a particular file volume • Available Volume Storage Group (AVSG) – the subset of the VSG for a file volume that is accessible
Highly Available Services (13 of 17) • The Coda File System (con't) • Basic Operation • On open: • Venus gets the file from its local cache or • Determines which server in the AVSG has the most recent version (the preferred server) and gets the file (and callback promises) from there • On close (after modification): • Venus sends the updated file to everyone in the AVSG using multicast RPC • But, some servers might be in the AVSG of this client...
Highly Available Services (14 of 17) • The Coda File System (con't) • Venus periodically sends out of probe for each file in its cache • This determines that AVSG for each file • Each server responds with a version CVV • Contains summary of all files in the volume • Mismatches are detected
Highly Available Services (15 of 17) • The Coda File System (con't) • Disconnected operation is supported (AVSG is empty) • User specifies which files Venus should make available during periods of disconnectivity • When connectivity is restored, the reintegration process begins • Conflicts are detected and files are flagged for manual integration
Highly Available Services (16 of 17) • The Coda File System (con't) • Performance: Coda vs. AFS • With no replication: about the same • With three-fold replication: • For 5 users, Coda increases benchmark time by 5% • Going to 50 users, Coda increase benchmark time by 70% while AFS increases it by 16%
Highly Available Services (17 of 17) • The Coda File System – Summary • Coda FS provides a highly available filesystem which works during periods of disconnectivity • Requires some user interaction • Identifying files to be available during disconnectivity • Manually resolving occasional update conflicts • Does not perform as well as AFS
Transactions with Replicated Data (1 of x) • The goal of normal distributed transactions is serial equivalence • When replicated data is involved, one-copy serializability is needed • Which means the effect of the transactions is the same as if they were • Performed one at a time • On a single set of objects
Transactions with Replicated Data (2 of x) • Architectural Issues • Eager vs. Lazy Update Propagation • Eager – propagate updates to replica manager during the transaction (before commit) • Lazy – commit the transaction and propagate updates later • Two Phase Commit Protocol needed • Primary Copy Replication • Only one replica manager at a time can interact with front ends • All other replica managers are backups (could be the primary if the current one fails)
Transactions with Replicated Data (3 of x) • Schemes for Dealing with Network Partitions • Available copies with validation • Quorum consensus • Virtual Partition
Transactions with Replicated Data (3 of x) • Available Copies with Validation Method • Reads are serviced by any available replica manager • Updates must be performed by all available replica managers (some replica managers may be unavailable) • When the network is partitioned each partition can carry out transactions • When the network is fixed, conflicts may have arisen • Conflicts are eliminated by aborting one of the transactions
Transactions with Replicated Data (4 of x) • Quorum Consensus Method • Only one of the network partitions has the right to carry on with transactions • When the network is fixed replica managers are brought up-to-date with those in the quorum • Quorum is determined by a voting algorithm which is applied on each operation request
Transactions with Replicated Data (4 of x) • Virtual Partition Method • Combines Available copies method with Quorum consensus method • New virtual partition created on write failure • If a virtual partition has a quorum, transactions can proceed