Epidemics

Epidemics Spring 2008: CS525 Farhana Ashraf and Fariba Khan

Problem: Multicast • Simultaneously deliver information to a group of destinations. • Examples: • News groups • Update for replicated database systems • Real-time media (i.e. Sports) • Conference bridge • Software update in ad-hoc and sensor network

Challenges • Reliable: • Strong • Best-effort • Probabilistic (bimodal) • Delay • Bandwidth • Fault-tolerance

Epidemic Algorithm for Replicated Database Maintenance Alan Demers, Dan Greene, Carl Hauser, Wes Irish, John Larson, Scott Shenker, Howard Sturgis, Dan Swineheart, Doug Terry PODC 1987

Motivation: Replicated database Xerox Clearinghouse Data is replicated in 300 sites worldwide. An update in any site has to be forwarded to other 299. Financial Distributed Systems Data and Code replicated worldwide. There is a 7-hour window between the stock market closing in Tokyo and opening in NYC.

Approach • Naive • Direct Mail • Epidemic • Anti-entropy • Rumor-mongering

Timely Immediately mailed from the entry site to all other site Not entirely reliable Incomplete information about other sites Mail may be lost PostMail Direct Mail PostMail PostMail PostMail PostMail PostMail Infectious node Susceptible node

Extremely reliable Resolves difference with random site periodically Slow & Expensive Examines the contents of the entire databases Database content sent over network Anti-Entropy ResolveDiff ResolveDiff ResolveDiff ResolveDiff Infectious node Susceptible node

Anti-Entropy: Push and Pull PUSH PULL Not updated Not updated PUSH - PULL Infectious node Susceptible node

Pull > Push pi – Probability that a node is susceptible after the ith round • If Anti-entropy is back-up for, e.g., Direct-Mail • Pull converges faster than push, thus providing better delay Pull Push

Anti-Entropy: Optimization • Checksum • Exchange checksum first, compare database if checksum disagree [saves network traffic] • As network size increases, time to distribute update to all sites increase [more possibility of checksum mismatch] • Recent update list • Exchange recent update list within T to update database and new checksum • Compare databases if new checksum disagree • Choice of T critical • Inverted index of database by timestamp • Exchange updates in reverse timestamp order, compute checksum until checksum match • Cost of additional inverted index at each site • Synchronization of time

Less expensive Require fewer resources Can be done more frequently than anti-entropy Less reliable Some chance that updates will not reach all sites Complex Epidemics: “Not Anti-entropy”Rumor Spreading Susceptible node Infectious node Removed node

Designing a Good Epidemic • Residue • Number of sites not receiving the update, when epidemic ends • Traffic • Average number of messages sent from a typical site • Delay • tavg: difference between initial injection and avg. arrival of update at a given site • Tlast: Delay until reception by the last site that receives the update during epidemic

Variants of Rumor Spreading • Blind vs. Feedback • Blind: sender looses interest with prob. 1/k regardless of recipient • Feedback: sender looses interest with prob. 1/k only if recipient already knows the rumor • Counter vs. Coin • Counter: loose interest only after k unnecessary contacts • Push vs. Pull

Problem with Deletion • Problem • Absence of an item does not spread • Propagation of old copies of deleted item will cause insertion of the item back to the site • Solution • Replace deleted item with Death Certificate (DC)

Discussion • Direct Mail or Epidemics? • Economics and Industry? • Anti-entropy or Rumor?

Bimodal Multicast Ken Birman, Mark Hayden, OznurOzkasap, Zhen Xiao, MihaiBudiu, YaronMinsky ACM TOCS 1999

Dilemma • Application is extremely critical: stock market, air traffic control, medical system • Hence need a strong model, guarantees • But these applications often have a soft-realtime subsystem • Steady data generation • May need to deliver over a large scale

Probabilistic Broadcast: pbcast • Atomicity: bimodal delivery guarantee • almost all or almost none • Throughput stability: variation can be characterized • Ordering: FIFO per sender • Multicast stability: safely garbage collected (no dormant death certificate) • Detection of lost messages • Scalability: cost is a function of network size • Soft failure recovery: bounded number of recoveries from buffer overflow, transient network.

Pbcast 2-stage Protocol • Stage 1: Best effort dissemination • Hierarchical broadcast • Unreliable best-effort approach • Stage 2: Anti-entropy • Exchange digest and correct loss • Probabilistic end-to-end

Pbcast: Best Effort Dissemination IP multicast or “virtual” multicast spanning trees Sender randomly generates a spanning tree Neighbors forward based on the tree identifier Number of random trees can be tuned

Pbcast: Random Spanning Tree P Q R S

Pbcast: Hierarchical Multicast m1 P Q R S P={m1} Q={m1} R={m1} S={m1}

Pbcast: Hierarchical Multicast m1 m2 m3 P Q R S P={m1, m2} Q={m1, m2} R={m1, m2, m3} S={m1, m2, m3}

Pbcast: Hierarchical Multicast m1 m2 m3 m4 P Q R S P={m1, m2} Q={m1, m2} R={m1, m2, m3, m4} S={m1, m2, m3}

Pbcast: Two phase anti-entropy • Progresses in rounds • In each round • Gossip summary to randomly chosen nodes • Solicit any message they found lacking • Message resend

Pbcast: Anti-entropy m1 m2 m3 m4 P m3 Q m3 R S P={m1, m2} Q={m1, m2} R={m1, m2, m3} S={m1, m2, m3}

Optimizations (2) Independent Numbering of Rounds: numbers used for local decisions only (solicitation, garbage collect) Random Graphs for Scalability: spanning tree with network (LAN, WAN) knowledge. Multicast for Some Retransmissions

Optimizations (1) Soft-Failure Detection: retransmissions serviced in the same round Round Retransmission Limit: maximum amount of data per node per round Cyclic Retransmissions: avoid resending a message that might be in transit. Most-Recent-First Transmissions: No starvation

Analytic Results • Initial unreliable multicast failed • Run the gossip rounds for some time • Probability of Message loss in 5% • Crash failure 0.1%

Bimodal Delivery Distribution k

Experimental Setup • SP2 is a large network of parallel computers • Nodes are UNIX workstations • Interconnect is an ATM network • Software is standard Internet stack (TCP, UDP) • 128 nodes on Cornell SP2 in Theory Center • pbcast was run on this

Source to dest latency distributions Groups of 8 One forced to sleep

Throughput variation as a function of scale (25% nodes perturbed) mean and standard deviation of pbcast throughput: 128-member group standard deviation of pbcast throughput 220 150 215 210 100 205 throughput (msgs/sec) 200 standard deviation 195 50 190 185 180 0 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0 50 100 150 process group size perturb rate Very Small and Slow Growth

Discussion • Good enough for Voip (low variance in delay)? • Random spanning tree • WAN, LAN, subgroup sizes, trans-oceanic delays • Asymmetric network conditions (cellphone vs server)

Exploring the Energy – Latency Trade-off for Broadcasts in Energy-Saving Sensor Networks Matthew J. Miller Cigdem Sengul Indranil Gupta ICDCS 2005 40

WSN Applications • Code update • Energy primary constraint • Attribute based search • Latency primary constraint msg msg

Background: IEEE 802.11 PSM B C A ATIM msg DATA msg A M1 Latency B M2 Latency C M3 BI Problem: High Latency

Reducing Latency : Immediate Broadcast B C A A M1 B M2 C BI Problem: C does not get the update, Reliability decreases

Probability-Based Broadcast Forwarding (PBBF) B C A A M1 Solution1: Immediate Broadcast with p p (1-p) B M2 Solution2: C remains awake with q C BI

Effect of p and q on Reliability, Energy, Latency • Reliability = pq + (1 - p) • p=0, q=0 : IEEE 802.11 PSM • p=1, q=1 : “always on” • Still have the ATIM window overhead Immediate broadcast Wait for the next BI

Experimental Setup • Application: a base station periodically sends patches for sensors to apply • Simulation in ns-2, where • 50 nodes • Average One-Hop Neighborhood Size = 10 • Uniformly random node placement in square area • Topology connected • Full MAC layer

E does not depend on p For fixed p, E increases with q Energy, E NO PSM Energy Joules/Broadcast PBBF PSM q

For fixed p, L decreases with q For fixed q, increase in p gets less L Latency, L Latency Average 5-Hop Latency Increasing p q

Reliability, R • Reliability, R in terms of Average fraction of broadcasts received per node • For high p, R is small for small q p=0.5 Average Fraction of Broadcasts Received q

Energy – Latency Tradeoff Achievable region for reliability ≥ 99% Joules/Broadcast Reliability = 99% Average Per-Hop Broadcast Latency (s)

Epidemics

Epidemics

Presentation Transcript

Epidemics on networks

Epidemics in Blogspace

Network Modeling of Epidemics

V5 Epidemics on networks

Health Care and Epidemics

Epidemics in Social Networks

Epidemics

EPIDEMICS

Epidemics Rubric

Epidemics in Social Networks

Are global epidemics predictable ?

Epidemics

Outbreaks and Epidemics

Epidemics

Epidemics and Pandemics

EPIDEMICS

Epidemics

Lesson 9.2: Epidemics

Epidemics and Pandemics

Epidemics WebQuest

Epidemics

Epidemics