540 likes | 656 Views
Epidemics. Spring 2008: CS525 Farhana Ashraf and Fariba Khan. Problem: Multicast. Simultaneously deliver information to a group of destinations. Examples: News groups Update for replicated database systems Real-time media ( i.e. Sports) Conference bridge
E N D
Epidemics Spring 2008: CS525 Farhana Ashraf and Fariba Khan
Problem: Multicast • Simultaneously deliver information to a group of destinations. • Examples: • News groups • Update for replicated database systems • Real-time media (i.e. Sports) • Conference bridge • Software update in ad-hoc and sensor network
Challenges • Reliable: • Strong • Best-effort • Probabilistic (bimodal) • Delay • Bandwidth • Fault-tolerance
Epidemic Algorithm for Replicated Database Maintenance Alan Demers, Dan Greene, Carl Hauser, Wes Irish, John Larson, Scott Shenker, Howard Sturgis, Dan Swineheart, Doug Terry PODC 1987
Motivation: Replicated database Xerox Clearinghouse Data is replicated in 300 sites worldwide. An update in any site has to be forwarded to other 299. Financial Distributed Systems Data and Code replicated worldwide. There is a 7-hour window between the stock market closing in Tokyo and opening in NYC.
Approach • Naive • Direct Mail • Epidemic • Anti-entropy • Rumor-mongering
Timely Immediately mailed from the entry site to all other site Not entirely reliable Incomplete information about other sites Mail may be lost PostMail Direct Mail PostMail PostMail PostMail PostMail PostMail Infectious node Susceptible node
Extremely reliable Resolves difference with random site periodically Slow & Expensive Examines the contents of the entire databases Database content sent over network Anti-Entropy ResolveDiff ResolveDiff ResolveDiff ResolveDiff Infectious node Susceptible node
Anti-Entropy: Push and Pull PUSH PULL Not updated Not updated PUSH - PULL Infectious node Susceptible node
Pull > Push pi – Probability that a node is susceptible after the ith round • If Anti-entropy is back-up for, e.g., Direct-Mail • Pull converges faster than push, thus providing better delay Pull Push
Anti-Entropy: Optimization • Checksum • Exchange checksum first, compare database if checksum disagree [saves network traffic] • As network size increases, time to distribute update to all sites increase [more possibility of checksum mismatch] • Recent update list • Exchange recent update list within T to update database and new checksum • Compare databases if new checksum disagree • Choice of T critical • Inverted index of database by timestamp • Exchange updates in reverse timestamp order, compute checksum until checksum match • Cost of additional inverted index at each site • Synchronization of time
Less expensive Require fewer resources Can be done more frequently than anti-entropy Less reliable Some chance that updates will not reach all sites Complex Epidemics: “Not Anti-entropy”Rumor Spreading Susceptible node Infectious node Removed node
Less expensive Require fewer resources Can be done more frequently than anti-entropy Less reliable Some chance that updates will not reach all sites Complex Epidemics: “Not Anti-entropy”Rumor Spreading Susceptible node Infectious node Removed node
Designing a Good Epidemic • Residue • Number of sites not receiving the update, when epidemic ends • Traffic • Average number of messages sent from a typical site • Delay • tavg: difference between initial injection and avg. arrival of update at a given site • Tlast: Delay until reception by the last site that receives the update during epidemic
Variants of Rumor Spreading • Blind vs. Feedback • Blind: sender looses interest with prob. 1/k regardless of recipient • Feedback: sender looses interest with prob. 1/k only if recipient already knows the rumor • Counter vs. Coin • Counter: loose interest only after k unnecessary contacts • Push vs. Pull
Problem with Deletion • Problem • Absence of an item does not spread • Propagation of old copies of deleted item will cause insertion of the item back to the site • Solution • Replace deleted item with Death Certificate (DC)
Discussion • Direct Mail or Epidemics? • Economics and Industry? • Anti-entropy or Rumor?
Bimodal Multicast Ken Birman, Mark Hayden, OznurOzkasap, Zhen Xiao, MihaiBudiu, YaronMinsky ACM TOCS 1999
Dilemma • Application is extremely critical: stock market, air traffic control, medical system • Hence need a strong model, guarantees • But these applications often have a soft-realtime subsystem • Steady data generation • May need to deliver over a large scale
Probabilistic Broadcast: pbcast • Atomicity: bimodal delivery guarantee • almost all or almost none • Throughput stability: variation can be characterized • Ordering: FIFO per sender • Multicast stability: safely garbage collected (no dormant death certificate) • Detection of lost messages • Scalability: cost is a function of network size • Soft failure recovery: bounded number of recoveries from buffer overflow, transient network.
Pbcast 2-stage Protocol • Stage 1: Best effort dissemination • Hierarchical broadcast • Unreliable best-effort approach • Stage 2: Anti-entropy • Exchange digest and correct loss • Probabilistic end-to-end
Pbcast: Best Effort Dissemination IP multicast or “virtual” multicast spanning trees Sender randomly generates a spanning tree Neighbors forward based on the tree identifier Number of random trees can be tuned
Pbcast: Random Spanning Tree P Q R S
Pbcast: Random Spanning Tree P Q R S
Pbcast: Random Spanning Tree P Q R S
Pbcast: Random Spanning Tree P Q R S
Pbcast: Hierarchical Multicast m1 P Q R S P={m1} Q={m1} R={m1} S={m1}
Pbcast: Hierarchical Multicast m1 m2 m3 P Q R S P={m1, m2} Q={m1, m2} R={m1, m2, m3} S={m1, m2, m3}
Pbcast: Hierarchical Multicast m1 m2 m3 m4 P Q R S P={m1, m2} Q={m1, m2} R={m1, m2, m3, m4} S={m1, m2, m3}
Pbcast: Two phase anti-entropy • Progresses in rounds • In each round • Gossip summary to randomly chosen nodes • Solicit any message they found lacking • Message resend
Pbcast: Anti-entropy m1 m2 m3 m4 P m3 Q m3 R S P={m1, m2} Q={m1, m2} R={m1, m2, m3} S={m1, m2, m3}
Optimizations (2) Independent Numbering of Rounds: numbers used for local decisions only (solicitation, garbage collect) Random Graphs for Scalability: spanning tree with network (LAN, WAN) knowledge. Multicast for Some Retransmissions
Optimizations (1) Soft-Failure Detection: retransmissions serviced in the same round Round Retransmission Limit: maximum amount of data per node per round Cyclic Retransmissions: avoid resending a message that might be in transit. Most-Recent-First Transmissions: No starvation
Analytic Results • Initial unreliable multicast failed • Run the gossip rounds for some time • Probability of Message loss in 5% • Crash failure 0.1%
Experimental Setup • SP2 is a large network of parallel computers • Nodes are UNIX workstations • Interconnect is an ATM network • Software is standard Internet stack (TCP, UDP) • 128 nodes on Cornell SP2 in Theory Center • pbcast was run on this
Source to dest latency distributions Groups of 8 One forced to sleep
Throughput variation as a function of scale (25% nodes perturbed) mean and standard deviation of pbcast throughput: 128-member group standard deviation of pbcast throughput 220 150 215 210 100 205 throughput (msgs/sec) 200 standard deviation 195 50 190 185 180 0 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0 50 100 150 process group size perturb rate Very Small and Slow Growth
Discussion • Good enough for Voip (low variance in delay)? • Random spanning tree • WAN, LAN, subgroup sizes, trans-oceanic delays • Asymmetric network conditions (cellphone vs server)
Exploring the Energy – Latency Trade-off for Broadcasts in Energy-Saving Sensor Networks Matthew J. Miller Cigdem Sengul Indranil Gupta ICDCS 2005 40
WSN Applications • Code update • Energy primary constraint • Attribute based search • Latency primary constraint msg msg
Background: IEEE 802.11 PSM B C A ATIM msg DATA msg A M1 Latency B M2 Latency C M3 BI Problem: High Latency
Reducing Latency : Immediate Broadcast B C A A M1 B M2 C BI Problem: C does not get the update, Reliability decreases
Probability-Based Broadcast Forwarding (PBBF) B C A A M1 Solution1: Immediate Broadcast with p p (1-p) B M2 Solution2: C remains awake with q C BI
Effect of p and q on Reliability, Energy, Latency • Reliability = pq + (1 - p) • p=0, q=0 : IEEE 802.11 PSM • p=1, q=1 : “always on” • Still have the ATIM window overhead Immediate broadcast Wait for the next BI
Experimental Setup • Application: a base station periodically sends patches for sensors to apply • Simulation in ns-2, where • 50 nodes • Average One-Hop Neighborhood Size = 10 • Uniformly random node placement in square area • Topology connected • Full MAC layer
E does not depend on p For fixed p, E increases with q Energy, E NO PSM Energy Joules/Broadcast PBBF PSM q
For fixed p, L decreases with q For fixed q, increase in p gets less L Latency, L Latency Average 5-Hop Latency Increasing p q
Reliability, R • Reliability, R in terms of Average fraction of broadcasts received per node • For high p, R is small for small q p=0.5 Average Fraction of Broadcasts Received q
Energy – Latency Tradeoff Achievable region for reliability ≥ 99% Joules/Broadcast Reliability = 99% Average Per-Hop Broadcast Latency (s)