1 / 48

Consistency and Replication

Consistency and Replication. Chapter 6. Part II Distribution Models & Consistency Protocols. Distribution Models. How do we propagate updates between replicas? Replica Placement: Where is a replica placed? When is a replica created? Who creates the replica?. Types of Replicas.

maude
Download Presentation

Consistency and Replication

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Consistency and Replication Chapter 6 Part II Distribution Models & Consistency Protocols

  2. Distribution Models • How do we propagate updates between replicas? • Replica Placement: • Where is a replica placed? • When is a replica created? • Who creates the replica?

  3. Types of Replicas • The logical organization of different kinds of copies of a data store into three concentric rings.

  4. Permanent Replicas • Initial set of replicas • Other replicas can be created from them • Small and static set • Example: Web site horizontal distribution • Replicate Web site on a limited number of machines on a LAN • Distribute request in round-robin • Replicate Web site on a limited number of machines on a WAN (mirroring) • Clients choose which sites to talk to

  5. Server-Initiated Replicas (1) • Dynamically created at the request of the owner of the DS • Example: push-caches • Web hosting servers can dynamically create replicas close to the demanding client • Need dynamic policy to create and delete replicas • One is to keep track of Web page hits • Keep a counter and access-origin list for each page

  6. Server-Initiated Replicas (2) • Counting access requests from different clients.

  7. Server-Initiated Replicas (3) • Each server can determine which is the closest server to a client • Hits from clients who have the same closest server will go to the same counter • cntQ(P,F) = count of hits to replica F hosted at server Q, initiated by P’s clients • If cntQ(P,F) drops below a threshold D, delete replica F from Q

  8. Server-Initiated Replicas (4) • If cntQ(P,F) drops below a threshold R, replicate or migrate F to P • Choose D and R with care • Do not allow deletion on permanent-replicas • If for some server P, cntQ(P,F) exceeds more than half the total requests for F, migrate F from Q to P • What if it cannot be migrated to P?

  9. Client-Initiated Replicas • These are caches • Temporary storage (expire fast) • Manage by clients • Cache hit: return data from cache • Cache miss: load copy from original server • Kept on the client machine, or at the same LAN • Multi-level caches

  10. Design Issues for Update Propagation • A client sends an update to a replica. The update is then propagated to all other replicas • Propagate state or operation • Pull or Push protocols • Unicast or multicast propagation

  11. State versus Operation Propagation (1) • Propagate a notification of update • Invalidate protocols • When data item x is changed at a replica, it is invalidated at other replicas • An attempt to read the item causes an “item-fault” triggering updating the local copy before the read can complete • the update is done based on the consistency model supported • Uses little network bandwidth • Good when read-to-write ratio is low

  12. State versus Operation Propagation (2) • Transfer modified data • Uses high network bandwidth • Good when read-to-write ratio is high • Propagate the update operation • Each replica must have a process capable of performing the update • Uses very low network bandwidth • Good when read-to-write ratio is high

  13. Pull versus Push • Push: updates are propagated to other replicas without solicitation • Typically, from permanent to server-initiated replicas • Used to achieve a high degree of consistency • Efficient when read-to-write ratio is high • Pull: A replica asks another for an update • Typically, from client-initiated replica • Efficient when read-to-write ratio is low • Cache misses result in more response time

  14. Pull versus Push Protocols • A comparison between push-based and pull-based protocols in the case of multiple client, single server systems.

  15. Unicast versus Multicast • Unicast: a replica sends separate n-1 updates to every other replica • Multicast: a replica sends one message to multiple receivers • Network takes care of multicasting • Can be cheaper • Suites push-based protocols

  16. Consistency Protocols • Is an implementation of a specific consistency model • Consider an implementation where each server (site) has a complete copy of the data store • The data store is fully replicated • Will present simple protocols that implement P-RAM, CC, and SC • Exercise: suggest protocols for WC and Coherence

  17. Consistency Protocols – Design Issues • Primary replica protocols • Each data item has a primary replica, an owner, denoted x.prime • x is not replicated, it lives at site x.prime • Primary replica, remote write protocol • All writes to x are sent to x.prime • Primary replica, local write protocol • Each write is performed locally • x migrates (x.prime changes) • A process gets exclusive permission to write from x.prime

  18. Design Issues – Distribution p q s t X, Y, Z A, B, C E, F, G U, V, W Only p has X. All operations on X must be forwarded to p X can migrate

  19. Design Issues – Primary Fixed Replica p q s t A, B, C, D A, B, C, D A, B, C, D A, B, C, D Only p can write to A. All writes to A must be forwarded to p Others can read A locally

  20. Design Issues – Primary Variable Replica p q s t A, B, C, D A, B, C, D A, B, C, D A, B, C, D p is the current Owner of A. All writes to A must be forwarded to p Ownership can change

  21. Design Issues – Free Replication p q s t A, B, C, D A, B, C, D A, B, C, D A, B, C, D Any process can Write to A Writes can be propagated

  22. System Model (1) p q s t Replica DSp Replica DSq Replica DSs Replica DSt

  23. System Model (2) • Each process (n processes) maintains: • DSp, a complete replica of DS • A data item x in replica DSp is denoted by DSp.x • OutQp, FIFO queue for outgoing messages • InQp, queue for incoming messages (need not be FIFO) • Sending/receiving (removing from OutQ/adding to InQ) messages is handled by independent daemons

  24. Sending and Receiving Messages • sendp(): /* executed infinitely often */ • If not OutQp.empty() then • remove message m from OutQp • send m to each q  p • receivep(): • upon receiving a message m from q • add m to InQp • Assume each m contains exactly one write • Better: send a bunch of writes in one message

  25. (P-RAM)P-RAM Protocol

  26. P-RAM Protocol - (P-RAM) Messages • Messages have the format: • <write,p,x,v>, p = process id, x data item, v is a value: • Representing writep(x)v • InQp is a FIFO queue, for each process p • Let Tp be a sequence of operations for process p

  27. P-RAM Protocol - (P-RAM) Read and Write Implementation • readp(x)?: • return DSp.x • /* add rp(x)* to Tp */ • writep(x)v: • DSp.x = v • add <write,p,x,v> to OutQp • /* add wp(x)v to Tp */

  28. P-RAM Protocol - (P-RAM) Apply Implementation • applyp(): /* to apply remote writes locally */ • /* executed infinitely often */ • If not InQp.empty() then • let <write,q,x,v> be at the head of InQp • remove <write,q,x,v> from InQp • DSp.x = v • /* Add wq(x)v to Tp */

  29. (P-RAM) – Correctness • Claim: any computation of (P-RAM) satisfies P-RAM • Proof: • Let (O|p  O|w, <p) be Tp for each process p • Clearly, program order is maintained, Tp is valid, and contains all of O|p  O|w

  30. (CC)CC Protocol

  31. CC Protocol – (CC) • Need an additional variable for each process p: • vtp[n], a vector timestamp for write operations • If vtp[q] = k, then p thinks q has performed k writes • Messages now have the structure: • <write,q,x,v,t> representing writeq(x)v with vector timestamp t • InQp is a priority queue based on vt values • Add a new message, • in front of all the messages with higher vt • behind the messages that have incomparable vt

  32. (CC) – Read and Write Implementation • readp(x)?: • return DSp.x • (read operation has timestamp of vtp) • /* Add rp(x)* to Tp */ • writep(x)v: • vtp[p]++ /* One more write by p*/ • DSp.x = v • add <write,p,x,v,vtp> to OutQp • (write operation has timestamp of vtp) • /* Add wp(x)v to Tp */

  33. (CC) – Apply Implementation • applyp(): /* to apply remote writes locally */ • If not InQp.empty() then • let <write,q,x,v,t> be at the head of InQp • if t[k]  vtp[k] for each k  p • and t[q] == vtp[q] + 1 then • remove <write,q,x,v,t> from InQp • vtp[q] = t[q] • DSp.x = v • (write operation has timestamp of t) • /* Add wq(x)v to Tp */

  34. if t[k]  vtp[k] for each k  p and t[q] == vtp[q] + 1 then • Not (t[k]  vtp[k]) for each k  p • There is a j such that t[j] > vtp[j] • There is at least one write by j that q knows of, but p does not • If so, p must wait • t[q] == vtp[q] + 1 • To apply this write of q with vt = t[q] = x, it must be the case that the last write by q that p applied has a vt = x – 1

  35. (CC) – Timestamps Reflect Causality • Observation 1: if o1 <co o2 then o1.vt  o2.vt • If o1 <po o2 , processes never decrement vt • If o1 <wbr o2 , after o1 is applied by p then vtp = o1.vt. Any subsequent operation by p will have a timestamp of at least vtp • Transitivity, there is some operation o’ such that o1 <co o’ <co o2. By induction o1.vt  o’.vt  o2.vt  is transitive, thus o1.vt  o2.vt • Observation 2: if o1 <co o2 and o2 is a write then o1.vt < o2.vt • Proof similar to Observation 1

  36. (CC) – Liveness (1) • Observation 3: If w is a write by p then each q eventually applies w • If q = p, trivial • If q  p, then w is either: • In OutQp: will be eventually sent (send is executed infinitely often) • In transit: will be delivered to q (channels are reliable) • In InQq: need to argue that it will eventually be applied • Applied by q: we’re done

  37. (CC) – Liveness (2) • Any w will eventually be applied by q • When q adds w to InQq, InQq has a finite number of writes • After w is added to InQq, more writes with lower vt can be added • These is a finite number of such writes • w eventually reaches the head of InQq

  38. (CC) – Liveness (3) • A write w by p at the head of InQq is eventually applied by q • Let vt be vtq when m is at the head of InQq • We must show that vt[k]  w.vt[k] for all k  p and vt[p] == w.vt[p] + 1

  39. (CC) – Liveness (4) vt[k]  w.vt[k] for all k  p • Let k be a process other than p • Let w’ by the w.vt[k]th write by k • p applies w’ before it performs w • This implies that w’.vt < w.vt • Thus, w’ is ahead of w in InQq • When w is at the head of InQq, w’ is already applied • Once q applies w’, vt[k]  w.vt[k]

  40. (CC) – Liveness (5) vt[p] + 1 = w.vt[p] • Let w’’ be the (w.vt[p] – 1)st write by p • By observation 1, w’’.vt  w.vt • Thus, w’’ is ahead of w in InQp • w’’ must have been already applied • Therefore, vt[p] = w.vt[p] – 1

  41. (CC) – Correctness (1) • Claim: any computation of (CC) satisfies CC • Proof: • Let (O|p  O|w, <p) be Tp for each process p • By Observation 3, Tp contains all of O|p  O|w • By memory property, Tp is valid • Tp is acyclic (exercise) • Need to show that (O|p  O|w, <co)  Tp

  42. (CC) – Correctness (2)(O|p  O|w, <co)  Tp • Let o1 and o2 be two operations such that o1 <co o2; we show that o1 precedes o2 in Tp [let q  p] • By Observations 1, o1.vt  o2.vt • o1,o2  O|p: trivial from (CC) • o1  O|q|w & o2  O|p: p sets its vt[q] = o1.ts[q] after applying o1. Since o2.vt[q]  o1.vt[q], o2 can only occur after o1 • o1  O|p|w & o2  O|q|w: o1.vt  o2.vt => o1.vt[p]  o2.vt[p] => q applies o1 before o2. p applies o2 after q and o1 before q => p applies o1 before o2 • o1  O|p|r & o2  O|q|w: it must be o1 <co o’ <co o2, where o’ is a write. Cases 1 and 3 apply • o1,o2  O|q|w: exercise (Hint: Observation 2 => o1.vt < o2.vt)

  43. (SC)SC Protocol

  44. SC Protocol – (SC) • Arrange process on a logical ring • Circulate a special message (token) <token> • Introduce another message type <ack,p,x,v,t> • Ack to p that w(x)v of ts t has been applied • Protocol uses the token-ring to apply writes in a mutual exclusive manner • Another way is to enforce a total order on all read and write operations using timestamps (exercise)

  45. (SC) – Read and Write Implementation • readp(x)?: • return DSp.x • writep(x)v: • wait for <token> • DSp.x = v • add <write,p,x,v> to OutQp • wait for <ack,p,x,v> from each q • add <token> OutQp

  46. (SC) – Apply Implementation • applyp(): /* to apply remote writes locally */ • If not InQp.empty() then • let <write,q,x,v> be at the head of InQp • remove <write,q,x,v> from InQp • DSp.x = v • add <ack,q,x,v> to OutQp

  47. (SC) – Correctness • Claim: any computation of (SC) satisfies SC • Note that any computation of (SC) satisfies P-RAM

  48. (SC) Provides SC w3 w5 w1 (O|q  O|w ,<p) (O,<) w3 w3 w5 w5 w1 w1 (O|w,<w) w3 w5 w1 (O|p  O|w ,<p) • All processes agree on one order of writes (O|w, <w) • (O|w,<p) = (O|w,<q) = (O|w,<w), for each pair p,q • To get (O,<) s.t. (O,<po)  (O,<):

More Related