610 likes | 762 Views
Topics in Database Systems: Data Management in Peer-to-Peer Systems. Agenda για σήμερα. 1. Σύντομη περίληψη για replication σε αδόμητα p2p συστήματα 2. Επιδημικοί Αλγόριθμοι για Ενημερώσεις Αντιγράφων (Demers et al paper + μια εφαρμογή ) 3. Τρία παραδείγματα αδόμητων συστημάτων
E N D
Topics in Database Systems: Data Management in Peer-to-Peer Systems
Agenda για σήμερα 1. Σύντομη περίληψη για replication σε αδόμητα p2p συστήματα 2. Επιδημικοί Αλγόριθμοι για Ενημερώσεις Αντιγράφων (Demers et al paper + μια εφαρμογή) 3. Τρία παραδείγματα αδόμητων συστημάτων α. GIA b. KAZAA c. Bittorent Την επόμενη Πέμπτη: Freenet, Pastry, eDonkey + Ένα παράδειγμα μια p2p database (PIER)
Reasons for Replication • Performance load balancing locality: place copies close to the requestor geographic locality (more choices for the next step in search) reduce number of hops • Availability In case of failures Peer departures
Replication Theory: Replica Allocation Policies in Unstructured P2P Systems E. Cohen and S. Shenker, “Replication Strategies in Unstructured Peer-to-Peer Networks”. SIGCOMM 2002 Q. Lv et al, “Search and Replication in Unstructured Peer-to-Peer Networks”, ICS’02 – Replication Part και τα δυο αναφέρονται σε “performance”
Replication: Allocation Scheme Question: how to use replication to improve search efficiency in unstructured networks? How many copies of each object so that the search overhead for the object is minimized, assuming that the total amount of storage for objects in the network is fixed
Replication Theory - Model Assume m objects and nnodes Each node capacity ρ, total capacity R = n ρ How to allocate R among the m objects? Determine rinumber of copies(distinct nodes) that hold a copy of i Σi=1, m ri = R (R total capacity) Also, pi= ri/R – Fraction of total capacity allocated to i Allocation represented by the vector (p1, p2, …. pm) = (r1/R, r2/R, rm/R)
Replication Theory - Model Assume that object i is requested with relative rates qi, we normalize it by setting Σi=1, mqi= 1 For convenience, assume 1 << ri n and that q1 q2 … qm Map the query distribution q to an allocation vector p Bounds for pi At least one copy, ri 1, Lower value l = 1/R At most n copies, ri n, Upper value, u = n/R
Replication Theory Assume that searches go on until a copy is found We want to determine ri that minimizes the average search size (number of nodes probed) to locate an item i Need to compute average search size per item Searches consist of randomly probing sites until the desired object is found: search at each step draws a node uniformly at random and asks whether it has a copy
Replication Theory Ai: Expectation (average search size) for object i is the inverse of the fraction of sites that have replicas of the object Ai = n/ri The average search sizeA of all the objects (average number of nodes probed per object query) A = Σi qi Ai = n Σi qi/ri Minimize: A = n Σi qi/ri
Replication Theory Minimize: Σi qi/pi Subject to Σpi = 1 and l pi u Monotonicity Since q1 q2 … qm, we must have p1 p2 … pm More copies to more popular, but how many?
Uniform Replication Create the same number of replicas for each object ri = R/m Average search size for uniform replication Ai = n/ri = m/ρ Auniform = Σi qi m/ρ = m/ρ(m n/R) Which is independent of the query distribution
Proportional Replication Create a number of replicas for each object proportional to the query rate ri = R qi
Proportional Replication Create a number of replicas for each object proportional to the query rate ri = R qi Number of replicas for each object: ri = R qi Average search size for uniform replication Ai = n/ri = n/R qi Aproportioanl = Σi qi n/R qi= m/ρ = Auniform again independent of the query distribution Why? Objects whose query rate are greater than average (>1/m) do better with proportional, and the other do better with uniform The weighted average balances out to be the same
Example: 3 items, q1=1/2, q2=1/3, q3=1/6 Uniform Proportional Uniform and Proportional Replication Summary: • Uniform Allocation: pi = 1/m • Simple, resources are divided equally • Proportional Allocation: pi = qi • “Fair”, resources per item proportional to demand • Reflects current P2P practices
Space of Possible Allocations Definition: Allocation p1, p2, p3,…, pm is “in-between” Uniform and Proportional if for 1< i <m, q i+1/q i < p i+1/p i < 1 (=1 for uniform, = for proportial, we want to favor popular but not too much) Theorem1: All (strictly) in-between strategies are (strictly) better than Uniform and Proportional Theorem2: p is worse than Uniform/Proportional if for all i, p i+1/p i> 1 (popular gets less) OR for all i, q i+1/q i> p i+1/p i (less popular gets less than “fair share”) Proportional and Uniform are the worst “reasonable” strategies
Square-Root Replication Find ri that minimizes A, A = Σi qi Ai = n Σi qi/ri This is done for ri = λ √qi where λ = R/Σi √qi Then the average search size is Aoptimal = 1/ρ (Σi√qi)2
How much can we gain by using SR ? Zipf-like query rates Auniform/ASR
Other Metrics: Discussion Utilization rate, the rate of requests that a replica of an object i receives Ui = R qi/ri • For uniform replication, all objects have the same average search size, but replicas have utilization rates proportional to their query rates • Proportional replication achieves perfect load balancing with all replicas having the same utilization rate, but average search sizes vary with more popular objects having smaller average search sizes than less popular ones
Assumption that there is at least one copy per object • Query is soluble if there are sufficiently many copies of the item. • Query is insoluble if item is rare or non existent. What is the search size of a query? • Soluble queries: number of probes until answer is found. • Insoluble queries: maximum search size
SR is best for soluble queries • Uniform minimizes cost of insoluble queries What is the optimal strategy? OPT is a hybrid of Uniform and SR Tuned to balance cost of soluble and insoluble queries: uniformly allocate a minimum number of copies per item, use SR for the rest
We now know what we need. How do we get there?
Replication Algorithms Uniform and Proportional are “easy” • Uniform: When item is created, replicate its key in a fixed number of hosts. • Proportional: for each query, replicate the key in a fixed number of hosts (need to know or estimate the query rate) • Fully distributed where peers communicate through random probes; minimal bookkeeping; and no more communication than what is needed for search. • Converge to/obtain SR allocation when query rates remain steady. Desired properties of algorithm:
Replication Algorithms Uniform and Proportional are “easy” • Uniform: When item is created, replicate its key in a fixed number of hosts. • Proportional: for each query, replicate the key in a fixed number of hosts (need to know or estimate the query rate)
Replication Algorithms • Fully distributed where peers communicate through random probes; minimal bookkeeping; and no more communication than what is needed for search. • Converge to/obtain SR allocation when query rates remain steady. Desired properties of algorithm:
Achieving Square-Root Replication How can we achieve square-root replication in practice? • Assume that each query keeps track of the search size • Each time a query is finished the object is copied to a number of sites proportional to the number of probes On average object i will be replicated on c n/ri times each time a query is issued (for some constant c) It can be shown that this gives square root
Achieving Square-Root Replication What about replica deletion? Steady state: creation time equal with the deletion time The lifetime of replicas must be independent of object identity or query rate FIFO or random deletions is ok LRU or LFU no
Replication Thus, for Square-root replication an object should be replicated at a number of nodes that is proportional to the number of probes that the search required
Replication - Implementation Two strategies are popular Owner Replication When a search is successful, the object is stored at the requestor node only (used in Gnutella) Path Replication When a search succeeds, the object is stored at all nodes along the path from the requestor node to the provider node (used in Freenet) Following the reverse path back to the requestor
Replication - Implementation If a p2p system uses k-walkers, the number of nodes between the requestor and the provider node is 1/k of the total nodes visited (number of probes) Then, path replication should result in square-root replication Problem: Tends to replicate nodes that are topologically along the same path
Replication - Implementation Random Replication When a search succeeds, we count the number of nodes on the path between the requestor and the provider Say p Then, randomly pick p of the nodes that the k walkers visited to replicate the object Harder to implement
Experimental Evaluation Both path and random replication generates replication ratios quite close to square-root of query rates Path replication and random replication reduces the overall message traffic by a factor of 3 to 4 respectively Much of the traffic reduction comes from reducing the number of hops Path and random, better than owner For example, queries that finish with 4 hops, 71% owner, 86% path, 91% random
Reasons for Replication Besides storage, cost associated with replication: Consistency Maintenance
Methods for spreading updates: Push: originate from the site where the update appeared To reach the sites that hold copies Pull: the sites holding copies contact the master site Epidemics for spreading updates
A. Demers et al, Epidemic Algorithms for Replicated Database Maintenance, SOSP 87 Update at a single site Randomized algorithms for distributing updates and driving replicas towards consistency Ensure that the effect of every update is eventually reflected to all replicas: Sites become fully consistent only when all updating activity has stopped and the system has become quiescent Analogous to epidemics
Methods for spreading updates: Direct mail (server-initiated): each new update is immediately mailed from its originating site to all other sites (+) Timely & reasonably efficient (-) Not all sites know all other sites (stateless) (-) Mails may be lost Anti-entropy: every site regularly chooses another site at random and by exchanging content resolves any differences between them (+) Extremely reliable but requires exchanging content and resolving updates (-) Propagates updates much more slowly than direct mail
Methods for spreading updates: Rumor mongering: • Sites are initially “ignorant”; when a site receives a new update it becomes a “hot rumor” • While a site holds a hot rumor, it periodically chooses another site at random and ensures that the other site has seen the update • When a site has tried to share a hot rumor with too many sites that have already seen it, the site stops treating the rumor as hot and retains the update without propagating it further Rumor cycles can be more frequent that anti-entropy cycles, because they require fewer resources at each site, but there is a chance that an update will not reach all sites
Anti-entropy and rumor spreading are examples of epidemic algorithms Three types of sites: • Infective: A site that holds an update that is willing to share is hold • Susceptible: A site that has not yet received an update • Removed: A site that has received an update but is no longer willing to share Anti-entropy: simple epidemic where all sites are always either infective or susceptible
Το paper αναφέρεται σε ανταλλαγή όλου του περιεχομένου των κόμβων A set S of n sites, each storing a copy of a database The database copy at site s S is a time varying partial function s.ValueOf: K {u:V x t :T} set of keys set of values set of timestamps (totally ordered by < V contains the element NIL s.ValueOf[k] = {NIL, t}: item with k has been deleted from the database Assume, just one item s.ValueOf {u:V x t:T} thus, an ordered pair consisting of a value and a timestamp The first component may be NIL indicating that the item was deleted by the time indicated by the second component
The goal of the update distribution process is to drive the system towards • s, s’ S: s.ValueOf = s’.ValueOf Operation invoked to update the database Update[u:V] s.ValueOf {r, Now{})
Direct Mail At the site s where an update occurs: For each s’ S PostMail[to:s’, msg(“Update”, s.ValueOf) s originator of the update s’ receiver of the update Each site s’ receiving the update message: (“Update”, (u, t)) If s’.ValueOf.t < t s’.ValueOf (u, t) • The complete set S must be known to s (stateful server) • PostMail messages are queued so that the server is not delayed (asynchronous), but may fail when queues overflow or their destination are inaccessible for a long time • n (number of sites) messages per update • traffic proportional to n and the average distance between sites
Anti-Entropy At each site s periodically execute: For some s’ S ResolveDifference[s, s’] s pushes its value to s’ ss’ Three ways to execute ResolveDifference: Push(sender (server) - driven) If s.Valueof.t > s’.Valueof.t s’.ValueOf s.ValueOf Pull(receiver (client) – driven) If s.Valueof.t < s’.Valueof.t s.ValueOf s’.ValueOf Push-Pull s.Valueof.t > s’.Valueof.t s’.ValueOf s.ValueOf s.Valueof.t < s’.Valueof.t s.ValueOf s’.ValueOf s pulls s’ and gets s’ value
Anti-Entropy Assume that • Site s’ is chosen uniformly at random from the set S • Each site executes the anti-entropy algorithm once per period Αποδεικνύεται ότι, • An update will eventually infect the entire population • ξεκινώντας από έναν μολυσμένο (infected) κόμβο, αυτό επιτυγχάνεται σε χρόνο ανάλογο to the log of the population size η σταθερά της αναλογίας εξαρτάται από το αν θα χρησιμοποιηθεί push ή pull
Anti-Entropy Let pibe the probability of a site remaining susceptible (has not received the update) after the i cycle of anti-entropy (θέλουμε να τείνει στο 0 όσο το δυνατόν πιο γρήγορα) For pull, A site remains susceptible after the i+1 cycle, if (a) it was susceptible after the i cycle and (b) it contacted a susceptible site in the i+1 cycle pi+1 = (pi)2 For push, A site remains susceptible after the i+1 cycle, if (a) it was susceptible after the i cycle and (b) no infectious site choose to contact in the i+1 cycle pi+1 = pi (1 – 1/n)n(1-pi) pi+1 = pi e-1 1 – 1/n (site is not contacted by a node) n(1-pi) number of infectious nodes at cycle i Pull is preferable than push
Anti-Entropy Θεωρεί ότι οι κόμβοι ανταλλάσσουν όλο το περιεχόμενο τους– οπότε υπάρχει το θέμα τι στέλνουμε στο δίκτυο και πως συγκρίνουμε τα στιγμιότυπα • Use checksums Ok, αν τα checksums συνήθως συμφωνούν • + A list of recent updates for which (now – timestamp) < threshold τ Compare fist recent updates, update databases and the ckecksums and then compare the updated checksums, choice of τ • Maintain an inverted list of updates ordered by timestamp Perform anti-entropy by exchanging timestamps at reverse timestamp order until their checksums agree
Complex Epidemics: Rumor Spreading • Initial State: n individuals initially inactive (susceptible) Rumor planting&spreading: • We plant a rumor with one person who becomes active (infective), phoning other people at random and sharing the rumor • Every person bearing the rumor also becomes active and likewise shares the rumor • When an active individual makes an unnecessary phone call (the recipient already knows the rumor), then with probability 1/k the active individual loses interest in sharing the rumor (becomes removed) We would like to know: • How fast the system converges to an inactive state (no one is infective) • The percentage of people that know the rumor when the inactive state is reached
Complex Epidemics: Rumor Spreading Let s, i, r be the fraction of individuals that are susceptible, infective and removed s + i + r = 1 ds/dt = - s i di/dt = s i – 1/k(1-s) i s = e –(k+1)(1-s) An exponential decrease of s with k For k = 1, 20% miss the rumor For k = 2, only 6% miss it Unnecessary phone calls
Criteria to characterize epidemics Residue The value of s when i is zero: the remaining susceptible when the epidemic finishes Traffic m = Total update traffic / Number of sites Delay • Average delay (tavg): difference between the time of the initial injection of an update and the arrival of the update at a given site averaged over all sites • The delay until (tlast) the reception by the last site that will receive the update during an epidemic
Simple variations of rumor spreading Blind vs. Feedback Feedback variation: a sender loses interest only if the recipient knows the rumor Blind variation: a sender loses interest with probability 1/k regardless of the recipient Counter vs. Coin Instead of losing interest with probability 1/k, use a counter so that we loose interest only after k unnecessary contacts s = e-m There are nm updates sent The probability that a single site misses all these updates is (1 – 1/n)nm m is the traffic Όλες την ίδια σχέση μεταξύ traffic και residue Counters and feedback improve the delay, with counters playing a more significant role