Epidemics

Epidemics by Charles Yang & Ted Pongthawornkamol 9/16/20 Epidemics, CS 598 IG, Fall 2004

Prelude: Multicasting • Many protocols • MBONE, 6BONE, XTP, etc. • Principally designed for scalability • Fault tolerance really isn’t addressed Epidemics, CS 598 IG, Fall 2004

Multicasting (cont…) • Scalable Reliable Multicast • But as graph shows, not that scalable Epidemics, CS 598 IG, Fall 2004

So now what? Epidemics! • Recap from Indy’s 1st lecture: • Definitions: • Infective – node with update it wants to share • Susceptible – node which has not yet received the update • Removed – previously infective node which is no longer sharing Epidemics, CS 598 IG, Fall 2004

Recap (cont…) • Infective node n receives a msg and forwards with probability p to a susceptible node • Can be shown that spreads quickly with high probability • Lightweight • Highly fault-tolerant Epidemics, CS 598 IG, Fall 2004

Outline of Presentation • Epidemic Algorithms for Replicated Database Maintenance • Bimodal Multicast • Gossip-Based Ad Hoc Routing Epidemics, CS 598 IG, Fall 2004

Epidemic Algorithms for Replicated Database Maintenance • Xerox’s Corporite Internet (CIN), Clearinghouse Servers, about 1986-1987 • Name resolution service • several hundred ethernets, connected by gateways and phone lines • DB’s were filling up bandwidth for replication Epidemics, CS 598 IG, Fall 2004

The Problem • Inject an update at one server, and have it propagate to all other servers • how to make it robust and scale well? • important factors: • convergence time – time req’d for update to propagate to all sites • network traffic – traffic req’d to propagate a single update (want to minimize!) Epidemics, CS 598 IG, Fall 2004

3 Methods for Spreading Updates • direct mail (basically multicast or flooding) • anti-entropy (epidemic) • rumor mongering/gossiping (epidemic) Epidemics, CS 598 IG, Fall 2004

CIN’s Initial Configuration • Direct Mail to send updates • Anti-entropy to bring DB’s to sync • Re-mailing if previous anti-entropy disagreed • Anti-entropy Run once/day between 12am to 6am • Eventually, anti-entropy couldn’t complete in allowed time due to traffic • For instance, for a domain stored at 300 sites, 90,000 messages might be introduced 1 night Epidemics, CS 598 IG, Fall 2004

Direct Mail s Epidemics, CS 598 IG, Fall 2004

Direct Mail Issues • a lot of b/w - n messages per update • not quite reliable: message can be lost (crashes, buffer overflows) • s may also not have current knowledge of S (set of all sites) Epidemics, CS 598 IG, Fall 2004

Anti Entropy • Run in bg to recover from errors • initially from direct mail, later from rumor mongering • Executed periodically FOR SOME s’  S DO ResolveDifference[s, s’] ENDLOOP Epidemics, CS 598 IG, Fall 2004

Anti-Entropy (after direct mail) s Epidemics, CS 598 IG, Fall 2004

Anti-Entropy (Cycle 1, start) s Epidemics, CS 598 IG, Fall 2004

Anti-Entropy (Cycle 1, end) s Epidemics, CS 598 IG, Fall 2004

Anti Entropy (cont…) • Assume s’ is chosen uniformly (talk about spatial distribs later) • slow and expensive, but reliable • since usually used as backup, the # of susceptible sites is small • Pull, Push-pull, push Epidemics, CS 598 IG, Fall 2004

Pull • pi is prob that site remains susceptible in ithcycle • A site remains susceptible after i+1stcycle if: • it was susceptible after ith cycle • and it contacted a susceptible site in i+1st cycle  pi+1 = (pi)2, • converges rapidly to 0 when pi is small • In other words: very unlikely that susceptible sites will remain after a while Epidemics, CS 598 IG, Fall 2004

Push • A site remains susceptible after i+1stcycle if: • it was susceptible after ith cycle • and no infectious site contacted it in i+1st cycle pi+1 = pi(1-1/n)n(1-pi) • Approximately: pi+1 = pie-1 • Converges too, but not nearly as quick as pull • Hence: pull, or push-pull is preferred to just push Epidemics, CS 598 IG, Fall 2004

Some Anti-Entropy Optimizations • Comparing DB’s is expensive, but since most DB’s are pretty similar… • Could maintain checksum of db • compare checksums • If don’t match, then start comparing DB’s • Naïve! Epidemics, CS 598 IG, Fall 2004

Optimizations (cont…) • Define time window  (time that updates should be spread by) • Keep checksums of database AND a recent update list w/age <  • 2 sites first exchange checksums and recent update list • compute new checksums, and then compare •  must be chosen well • If n grows too much • expected time for msg spread >  • recent update lists likely to be diff • Another variation: inverted index of db by timestamp • sites can exchange updates in reverse timestamp order until the checksums match Epidemics, CS 598 IG, Fall 2004

Complex Epidemics / Rumor Mongering / Gossip • Replace multicasting • At the expense of slightly larger convergence time • And a distinct, though very small probability of failure • Called complex just to distinguish from simple epidemics like anti-entropy Epidemics, CS 598 IG, Fall 2004

Basic (Complex) Epidemic • Susceptible site receives a hot rumor and becomes infective • Randomly shares with another susceptible site • “Uniform at Random” • When contacts a site that knows rumor already • probability 1/k lose interest in sharing the rumor (and become removed) • After a while, high probability that everyone knows Epidemics, CS 598 IG, Fall 2004

Can model with differential equations (fun!) • s+i+r=1 • Differentiate… Epidemics, CS 598 IG, Fall 2004

c is determined by i(1-)= • For large n,  goes to zero… • Giving a solution: • i(s) is zero when: s=e-(k+1)(1-s) • Yeah, yeah… so what does it mean? • implicit equation for s • s decreases exponentially with k (1/k = prob site becomes removed) • k=1, 20% will miss • k=2, 6% will miss • So with each consecutive round, high probability there will be no susceptibles left Epidemics, CS 598 IG, Fall 2004

Can vary complex epidemics • Concerned with: • Residue – when i is zero, what’s s? (people who never heard the rumor) • Traffic • Delay • tavg - time for a random node to receive the msg • tlast - time for the last node who will receive the msg, to receive it Epidemics, CS 598 IG, Fall 2004

Variations (cont…) • Blind vs Feedback • blind loses interest with 1/k no matter if contacted node knew msg or not • Counter vs Coin • With counter, can lost interest after k unnecessary contacts • Push vs Pull • Basic used push, but can use pull • will work if high number of independent updates • but when db is quiescent, more useless overhead than push Epidemics, CS 598 IG, Fall 2004

 Variations (cont…) • Minimization • Use a push and pull together, and if both sides know update, then the site with smaller counter is incremented (equality, both incremented) • Connection limit • If there’s a lot of updates, need a connection limit • Pull gets worse but push gets better! • Hunting • If one connection rejected, try another Epidemics, CS 598 IG, Fall 2004

So instead of mailing & anti-entropy • Use rumor mongering • And back up with anti-entropy Epidemics, CS 598 IG, Fall 2004

Death Certificates • With anti-entropy, deletion doesn’t really work • absence of entry will be replaced by an old version • Death Certificates • carry timestamps • when compared with older entry, the older entry is deleted • they take up space • but if you delete them, risk chance of seeing old resurrected data • Enter: Dormant Death Certificates Epidemics, CS 598 IG, Fall 2004

Dormant Death Certificates • Two thresholds 1 and 2 • Each server retains DC within 1 • After 1 , most sites delete DC, while a few keep it • If old data meets dormant DC, propagate the DC again • After 1 + 2 , delete the dormant DC Epidemics, CS 598 IG, Fall 2004

Dormant DCs (cont…) • Does not scale indefinitely • n grows so much, time to propagate DCs exceeds 1 • More likely to activate dormant DCs, which are propogated adding to overhead… • “The ultimate result is catastrophic failure.” Epidemics, CS 598 IG, Fall 2004

Dormant DCs (cont…) • Don’t spread dormant DC • And if reactivated, can reset timestamp • But this is wrong (might cancel a legitimate update) • So use second ts called activation timestamp which is set if it’s reactivated Epidemics, CS 598 IG, Fall 2004

Spatial Distributions • networks aren’t heterogeneous • some links are slower than others • can be broken up into different types of zones • we want to favor locality as we spread updates to minimize traffic Epidemics, CS 598 IG, Fall 2004

Spatial Distributions (cont…) • probability of connecting to a site at distance d is 1/da, where a is to be determined • intuitively, a indicates the amount of locality you’re going to be connecting at • So: increase in a -> increase in locality • w/ increased locality, need to compensate in order to “break out of” locality • more connections • more rounds • Also generalized to more more dimensions 1/d-2D Epidemics, CS 598 IG, Fall 2004

Spatial Distribution • Anti-Entropy • notice Bushey (trans-Atlantic) traffic • uniform (75.74) vs a=2 (2.38) • For gossiping: • since rumors eventually become inactive, it needs to spread a lot in the beginning • hence, pump up k Epidemics, CS 598 IG, Fall 2004

Summary for Demers et al • Direct Mailing • Rumor Mongering • Anti-Entropy • Issues • Research into effect of and optimizing for topology • Need to know S • Scalability with n • churn • Bimodal Multicast will address: • What about throughput stability • What about higher rate of msgs? Epidemics, CS 598 IG, Fall 2004

Bimodal multicast • A technique to apply epidemic concept to achieve scalable and reliable multicast • Use epidemic in term of anti-entropy • Randomly choose members in the group • Synchronize state Epidemics, CS 598 IG, Fall 2004

Two classes of multicast • strong reliability • atomicity • delivery ordering • virtual synchrony • security • real-time • more overhead, unpredictable behavior under some situations • best-effort reliability • scalable • provide no end-to-end delivery • No strong membership view • Certain level failure discovery • SRM,MUSE,RMTP,etc. Epidemics, CS 598 IG, Fall 2004

Multicast : Examples • Virtual synchrony • Strong reliable • significant degradation even just few node failures • suitable for small groups, limited to short bursts of multicasts • SRM • Best-effort reliable • Error-prone to stochastic failures • Meltdown can occur in large network • None of them addresses stability problem under failures Epidemics, CS 598 IG, Fall 2004

Fault-tolerance problem • Virtual synchrony perform badly under failures Epidemics, CS 598 IG, Fall 2004

Bimodal multicast • Also called probabilistic broadcast (pbcast) • fill the gap between two approaches • scalable • predictably reliable even under bad conditions • Complement with existing mechanism, such as Virtual Synchrony • Atomic • Provide stability • Throughput stability • Multicast stability Epidemics, CS 598 IG, Fall 2004

Pbcast protocol • consists of two concurrent subprotocols • Optimistic dissemination protocol , such as IP-multicast • Two-phase anti-entropy protocol to deal with synchronization problem • first phase detect packet message loss • second phase corrects losses Epidemics, CS 598 IG, Fall 2004

Optimistic dissemination protocol • each nodes must possess the list of all members • generate set of spanning trees • Simple algorithms • Randomly choose a spanning tree • every node uses the same spanning tree to forward the message • A set of spanning trees is needed to calculate each time nodes join or nodes leave Epidemics, CS 598 IG, Fall 2004

Epidemics

Epidemics

Presentation Transcript

Epidemics on networks

Epidemics

Epidemics in Blogspace

Network Modeling of Epidemics

V5 Epidemics on networks

Health Care and Epidemics

Epidemics in Social Networks

Epidemics

EPIDEMICS

Epidemics Rubric

Epidemics in Social Networks

Are global epidemics predictable ?

Epidemics

Outbreaks and Epidemics

Epidemics

Epidemics and Pandemics

EPIDEMICS

Lesson 9.2: Epidemics

Epidemics and Pandemics

Epidemics WebQuest

Epidemics

Epidemics