500 likes | 642 Views
Consistency and Replication. Chapter 6. Part II Distribution Models & Consistency Protocols. Distribution Models. How do we propagate updates between replicas? Replica Placement: Where is a replica placed? When is a replica created? Who creates the replica?. Types of Replicas.
E N D
Consistency and Replication Chapter 6 Part II Distribution Models & Consistency Protocols
Distribution Models • How do we propagate updates between replicas? • Replica Placement: • Where is a replica placed? • When is a replica created? • Who creates the replica?
Types of Replicas • The logical organization of different kinds of copies of a data store into three concentric rings.
Permanent Replicas • Initial set of replicas • Other replicas can be created from them • Small and static set • Example: Web site horizontal distribution • Replicate Web site on a limited number of machines on a LAN • Distribute request in round-robin • Replicate Web site on a limited number of machines on a WAN (mirroring) • Clients choose which sites to talk to
Server-Initiated Replicas (1) • Dynamically created at the request of the owner of the DS • Example: push-caches • Web hosting servers can dynamically create replicas close to the demanding client • Need dynamic policy to create and delete replicas • One is to keep track of Web page hits • Keep a counter and access-origin list for each page
Server-Initiated Replicas (2) • Counting access requests from different clients.
Server-Initiated Replicas (3) • Each server can determine which is the closest server to a client • Hits from clients who have the same closest server will go to the same counter • cntQ(P,F) = count of hits to replica F hosted at server Q, initiated by P’s clients • If cntQ(P,F) drops below a threshold D, delete replica F from Q
Server-Initiated Replicas (4) • If cntQ(P,F) drops below a threshold R, replicate or migrate F to P • Choose D and R with care • Do not allow deletion on permanent-replicas • If for some server P, cntQ(P,F) exceeds more than half the total requests for F, migrate F from Q to P • What if it cannot be migrated to P?
Client-Initiated Replicas • These are caches • Temporary storage (expire fast) • Manage by clients • Cache hit: return data from cache • Cache miss: load copy from original server • Kept on the client machine, or at the same LAN • Multi-level caches
Design Issues for Update Propagation • A client sends an update to a replica. The update is then propagated to all other replicas • Propagate state or operation • Pull or Push protocols • Unicast or multicast propagation
State versus Operation Propagation (1) • Propagate a notification of update • Invalidate protocols • When data item x is changed at a replica, it is invalidated at other replicas • An attempt to read the item causes an “item-fault” triggering updating the local copy before the read can complete • the update is done based on the consistency model supported • Uses little network bandwidth • Good when read-to-write ratio is low
State versus Operation Propagation (2) • Transfer modified data • Uses high network bandwidth • Good when read-to-write ratio is high • Propagate the update operation • Each replica must have a process capable of performing the update • Uses very low network bandwidth • Good when read-to-write ratio is high
Pull versus Push • Push: updates are propagated to other replicas without solicitation • Typically, from permanent to server-initiated replicas • Used to achieve a high degree of consistency • Efficient when read-to-write ratio is high • Pull: A replica asks another for an update • Typically, from client-initiated replica • Efficient when read-to-write ratio is low • Cache misses result in more response time
Pull versus Push Protocols • A comparison between push-based and pull-based protocols in the case of multiple client, single server systems.
Unicast versus Multicast • Unicast: a replica sends separate n-1 updates to every other replica • Multicast: a replica sends one message to multiple receivers • Network takes care of multicasting • Can be cheaper • Suites push-based protocols
Consistency Protocols • Is an implementation of a specific consistency model • Consider an implementation where each server (site) has a complete copy of the data store • The data store is fully replicated • Will present simple protocols that implement P-RAM, CC, and SC • Exercise: suggest protocols for WC and Coherence
Consistency Protocols – Design Issues • Primary replica protocols • Each data item has a primary replica, an owner, denoted x.prime • x is not replicated, it lives at site x.prime • Primary replica, remote write protocol • All writes to x are sent to x.prime • Primary replica, local write protocol • Each write is performed locally • x migrates (x.prime changes) • A process gets exclusive permission to write from x.prime
Design Issues – Distribution p q s t X, Y, Z A, B, C E, F, G U, V, W Only p has X. All operations on X must be forwarded to p X can migrate
Design Issues – Primary Fixed Replica p q s t A, B, C, D A, B, C, D A, B, C, D A, B, C, D Only p can write to A. All writes to A must be forwarded to p Others can read A locally
Design Issues – Primary Variable Replica p q s t A, B, C, D A, B, C, D A, B, C, D A, B, C, D p is the current Owner of A. All writes to A must be forwarded to p Ownership can change
Design Issues – Free Replication p q s t A, B, C, D A, B, C, D A, B, C, D A, B, C, D Any process can Write to A Writes can be propagated
System Model (1) p q s t Replica DSp Replica DSq Replica DSs Replica DSt
System Model (2) • Each process (n processes) maintains: • DSp, a complete replica of DS • A data item x in replica DSp is denoted by DSp.x • OutQp, FIFO queue for outgoing messages • InQp, queue for incoming messages (need not be FIFO) • Sending/receiving (removing from OutQ/adding to InQ) messages is handled by independent daemons
Sending and Receiving Messages • sendp(): /* executed infinitely often */ • If not OutQp.empty() then • remove message m from OutQp • send m to each q p • receivep(): • upon receiving a message m from q • add m to InQp • Assume each m contains exactly one write • Better: send a bunch of writes in one message
P-RAM Protocol - (P-RAM) Messages • Messages have the format: • <write,p,x,v>, p = process id, x data item, v is a value: • Representing writep(x)v • InQp is a FIFO queue, for each process p • Let Tp be a sequence of operations for process p
P-RAM Protocol - (P-RAM) Read and Write Implementation • readp(x)?: • return DSp.x • /* add rp(x)* to Tp */ • writep(x)v: • DSp.x = v • add <write,p,x,v> to OutQp • /* add wp(x)v to Tp */
P-RAM Protocol - (P-RAM) Apply Implementation • applyp(): /* to apply remote writes locally */ • /* executed infinitely often */ • If not InQp.empty() then • let <write,q,x,v> be at the head of InQp • remove <write,q,x,v> from InQp • DSp.x = v • /* Add wq(x)v to Tp */
(P-RAM) – Correctness • Claim: any computation of (P-RAM) satisfies P-RAM • Proof: • Let (O|p O|w, <p) be Tp for each process p • Clearly, program order is maintained, Tp is valid, and contains all of O|p O|w
CC Protocol – (CC) • Need an additional variable for each process p: • vtp[n], a vector timestamp for write operations • If vtp[q] = k, then p thinks q has performed k writes • Messages now have the structure: • <write,q,x,v,t> representing writeq(x)v with vector timestamp t • InQp is a priority queue based on vt values • Add a new message, • in front of all the messages with higher vt • behind the messages that have incomparable vt
(CC) – Read and Write Implementation • readp(x)?: • return DSp.x • (read operation has timestamp of vtp) • /* Add rp(x)* to Tp */ • writep(x)v: • vtp[p]++ /* One more write by p*/ • DSp.x = v • add <write,p,x,v,vtp> to OutQp • (write operation has timestamp of vtp) • /* Add wp(x)v to Tp */
(CC) – Apply Implementation • applyp(): /* to apply remote writes locally */ • If not InQp.empty() then • let <write,q,x,v,t> be at the head of InQp • if t[k] vtp[k] for each k p • and t[q] == vtp[q] + 1 then • remove <write,q,x,v,t> from InQp • vtp[q] = t[q] • DSp.x = v • (write operation has timestamp of t) • /* Add wq(x)v to Tp */
if t[k] vtp[k] for each k p and t[q] == vtp[q] + 1 then • Not (t[k] vtp[k]) for each k p • There is a j such that t[j] > vtp[j] • There is at least one write by j that q knows of, but p does not • If so, p must wait • t[q] == vtp[q] + 1 • To apply this write of q with vt = t[q] = x, it must be the case that the last write by q that p applied has a vt = x – 1
(CC) – Timestamps Reflect Causality • Observation 1: if o1 <co o2 then o1.vt o2.vt • If o1 <po o2 , processes never decrement vt • If o1 <wbr o2 , after o1 is applied by p then vtp = o1.vt. Any subsequent operation by p will have a timestamp of at least vtp • Transitivity, there is some operation o’ such that o1 <co o’ <co o2. By induction o1.vt o’.vt o2.vt is transitive, thus o1.vt o2.vt • Observation 2: if o1 <co o2 and o2 is a write then o1.vt < o2.vt • Proof similar to Observation 1
(CC) – Liveness (1) • Observation 3: If w is a write by p then each q eventually applies w • If q = p, trivial • If q p, then w is either: • In OutQp: will be eventually sent (send is executed infinitely often) • In transit: will be delivered to q (channels are reliable) • In InQq: need to argue that it will eventually be applied • Applied by q: we’re done
(CC) – Liveness (2) • Any w will eventually be applied by q • When q adds w to InQq, InQq has a finite number of writes • After w is added to InQq, more writes with lower vt can be added • These is a finite number of such writes • w eventually reaches the head of InQq
(CC) – Liveness (3) • A write w by p at the head of InQq is eventually applied by q • Let vt be vtq when m is at the head of InQq • We must show that vt[k] w.vt[k] for all k p and vt[p] == w.vt[p] + 1
(CC) – Liveness (4) vt[k] w.vt[k] for all k p • Let k be a process other than p • Let w’ by the w.vt[k]th write by k • p applies w’ before it performs w • This implies that w’.vt < w.vt • Thus, w’ is ahead of w in InQq • When w is at the head of InQq, w’ is already applied • Once q applies w’, vt[k] w.vt[k]
(CC) – Liveness (5) vt[p] + 1 = w.vt[p] • Let w’’ be the (w.vt[p] – 1)st write by p • By observation 1, w’’.vt w.vt • Thus, w’’ is ahead of w in InQp • w’’ must have been already applied • Therefore, vt[p] = w.vt[p] – 1
(CC) – Correctness (1) • Claim: any computation of (CC) satisfies CC • Proof: • Let (O|p O|w, <p) be Tp for each process p • By Observation 3, Tp contains all of O|p O|w • By memory property, Tp is valid • Tp is acyclic (exercise) • Need to show that (O|p O|w, <co) Tp
(CC) – Correctness (2)(O|p O|w, <co) Tp • Let o1 and o2 be two operations such that o1 <co o2; we show that o1 precedes o2 in Tp [let q p] • By Observations 1, o1.vt o2.vt • o1,o2 O|p: trivial from (CC) • o1 O|q|w & o2 O|p: p sets its vt[q] = o1.ts[q] after applying o1. Since o2.vt[q] o1.vt[q], o2 can only occur after o1 • o1 O|p|w & o2 O|q|w: o1.vt o2.vt => o1.vt[p] o2.vt[p] => q applies o1 before o2. p applies o2 after q and o1 before q => p applies o1 before o2 • o1 O|p|r & o2 O|q|w: it must be o1 <co o’ <co o2, where o’ is a write. Cases 1 and 3 apply • o1,o2 O|q|w: exercise (Hint: Observation 2 => o1.vt < o2.vt)
SC Protocol – (SC) • Arrange process on a logical ring • Circulate a special message (token) <token> • Introduce another message type <ack,p,x,v,t> • Ack to p that w(x)v of ts t has been applied • Protocol uses the token-ring to apply writes in a mutual exclusive manner • Another way is to enforce a total order on all read and write operations using timestamps (exercise)
(SC) – Read and Write Implementation • readp(x)?: • return DSp.x • writep(x)v: • wait for <token> • DSp.x = v • add <write,p,x,v> to OutQp • wait for <ack,p,x,v> from each q • add <token> OutQp
(SC) – Apply Implementation • applyp(): /* to apply remote writes locally */ • If not InQp.empty() then • let <write,q,x,v> be at the head of InQp • remove <write,q,x,v> from InQp • DSp.x = v • add <ack,q,x,v> to OutQp
(SC) – Correctness • Claim: any computation of (SC) satisfies SC • Note that any computation of (SC) satisfies P-RAM
(SC) Provides SC w3 w5 w1 (O|q O|w ,<p) (O,<) w3 w3 w5 w5 w1 w1 (O|w,<w) w3 w5 w1 (O|p O|w ,<p) • All processes agree on one order of writes (O|w, <w) • (O|w,<p) = (O|w,<q) = (O|w,<w), for each pair p,q • To get (O,<) s.t. (O,<po) (O,<):