310 likes | 509 Views
Gossip Algorithms. Presented by George Frederick. Introduction. Designing scalable P2P application-level protocols isn’t straightforward and is still being actively researched
E N D
Gossip Algorithms Presented by George Frederick
Introduction • Designing scalable P2P application-level protocols isn’t straightforward and is still being actively researched • Gossip algorithms are effective solutions for information dissemination in large-scale systems, especially for P2P networks • Gossip algorithms are inherently easy to deploy, robust, and resilient to failures
Gossip Algorithm Properties • Mimic the spread of contagious diseases • Difficult to destroy once set loose into population • Much research has been performed into stopping epidemics but not aiding them • Nodes randomly choose other nodes to pass information to and basically cascade information through out the system along many channels at once
Gossip Algorithm Properties • Buffer Capacity • How much space each node allocates for messages until the buffer capacity (b) is reached • Relay Count • Number of times (t) to relay a message • Pool of Potential Recipients/Fanout • The number of processes visible from this process, known as the fanout (f)
Gossip Algorithm Properties • Most gossip algorithms are distinguished by varying values for b, t, and f • These parameters may be independent of the number (n) of processes in the system but work best when they scale with it • If b, t, and f are properly tuned, the same guarantees that deterministic algorithms make can also apply to gossip algorithms
Practical Issues • Membership Maintenance • How do processes become aware of other processes? • Network Awareness • How do processes maintain a consistent view of overall network topology?
Practical Issues • Buffer Management • How do processes handle messages when the message buffer becomes full? • Message Filtering • How do processes know which messages are relevant to send to which nodes?
Membership Maintenance • Important because determining what view this process has of other processes affects the behavior of the entire algorithm • Maintaining a list of all processes in each process can take up too much space and network load • Therefore each process needs to see a subset of processes in the system
Membership Maintenance • Tradeoff must be made between reliability and scalability • The more knowledge a process has about the system, the more storage and network traffic it requires • Knowing too little about the system increases the odds that processes become isolated and cannot effectively spread their messages throughout the system
Network Awareness • Membership alone doesn’t take into account the fact that not all processes are created equal • Some processes may be running remotely and sending traffic to geographically distant processes just to communicate with close ones is a waste • A possible solution is to organize processes into hierarchies based on geographical distance
Buffer Management • How to deal with new messages when the message buffer is full • If new messages are disallowed, new information never gets disseminated • If the oldest is disposed of, then it too may not be propagated • Can address with a couple of methods • Prioritization • Time stamping
Message Filtering • It seems obvious that messages unnecessary to a node should be filtered • It is difficult to know which processes are interested in what messages • If a process isn’t interested, other processes in its view still might be • Can be addressed through the aforementioned hierarchy by also grouping by interest type
Modeling • Gossip algorithms are predominantly evaluated empirically rather than theoretically • Theoretical models do not adequately encompass all factors present in practical situations • Real world networks can change dynamically, but most models assume a static network structure
Key Elements • Problem • A distributed computing problem for the gossip algorithm to solve • System • The communication network and operating environment • Complexity • Time Complexity • Connectivity Complexity • Space Complexity
Time Complexity • Total number of delivery rounds, measured from start state to termination • Due to nondeterministic nature of gossip algorithms, it is useful to measure using the probability that a given number of nodes will have been reached by given round
Connectivity Complexity • Total communication channels established throughout the course of execution • Connections can be gained and lost dynamically throughout execution
Space Complexity • The total amount of memory devoted to the algorithm throughout its execution • Space is used for views, message buffering, and history management • View measurements should be a function of network size • Message and history data should be a function of the total amount of data transferred over network channels throughout execution
Problem Families • Three typical groupings of gossip algorithms • Information Spread • How to transmit effectively relay a message to the rest of the network • Aggregate Computation • How to gather data from each node to perform a computation • Overlay Management • How to organize the network in such a way that it exhibits some gestalt property
Internal Process • Communication Phase • Choose subset of communication partners from local view and exchange information • Processing Phase • Perform state transition from current state to new state, determined by internal structure and new message information • Determines what message will be sent in the next round
Node Structure and Behavior • Transition Model • Two basic modes: push or pull • Push mode spreads information to other nodes • Pull mode gathers information from other nodes • Communication Strategy • Determines how many nodes to communicate with and which to choose in each round • Can be deterministic or random
Node Structure and Behavior • Buffer Management and Message Size • Determines how long to keep sharing a message • What messages to send when • What to do with duplicates • History Buffer Size • Related to buffer management • Determines if message has already been received and what to do if it has or hasn’t
Network Topology • Degree Distribution • More edges coming from a node means more candidates to receive message • More edges coming into a node means higher likelihood of receiving messages • Scale-free networks may spread very efficiently or not efficiently at all, depending on the tuning of the gossip algorithm
Network Topology • Closeness • Average distance between nodes • Each edge traversed has associated with it a probability that the message will be dropped • The closer a node is to a desired destination node, the more likely it is that the destination node will receive the message
Network Topology • Betweenness • How many shortest paths a node lies on • High betweenness node can act as a choke point • Gossip algorithms perform best when many routes are available
Network Topology • Eigenvector • Describes number of “popular” nodes connected to a specified node • “Popular” nodes are generally more likely to receive messages due to high degree • The more connections to popular nodes, the higher the chance that the node will receive messages
Network Topology • Assortativity/Disassortativity • Describes tendency for nodes to form connections to similar or dissimilar nodes • High assortativity could help facilitate faster spread of specialized information among groups of nodes that need it • High disassortativity could hinder the spread of information, as the nodes desiring the information may be spread far apart
Network Topology • Connectivity and density • Disconnected subgraphs obviously cannot be reached • Edge density can increase rate of spread as more routes are available for traversal • Conversely, could hinder spread in the case of large volume of redundant messages • Depends on gossip algorithm tuning and implementation
References • Patrick T. Eugster, Rachid Guerraoui, Anne-Marie Kermarrec, Laurent Massoulieacute;, "Epidemic Information Dissemination in Distributed Systems," Computer, vol. 37, no. 5, pp. 60-67, May, 2004 • Y. Fernandess, A. Fernández, and M. Monod. 2007. A generic theoretical framework for modeling gossip-based algorithms. SIGOPS Oper. Syst. Rev. 41, 5 (Oct. 2007), 19-27. DOI= http://doi.acm.org/10.1145/1317379.1317384 • D. J. Watts, P. S. Dodds, M. E. J. Newman. Identity and Search in Social Networks. Science 269(5571), 2002. • Indranil Gupta, Anne-Marie Kermarrec, Ayalvadi J. Ganesh, "Efficient Epidemic-Style Protocols for Reliable and Scalable Multicast," srds, p. 180, 21st IEEE Symposium on Reliable Distributed Systems (SRDS'02), 2002 • R. Karp, C. Schindelhauer, S. Shenker, B. Vocking, "Randomized rumor spreading," focs, p. 565, 41st Annual Symposium on Foundations of Computer Science, 2000 • L. Alvisi and J. M. Doumen, R. Guerraoui, B. Koldehofe, H. Li, R. van Renesse and G. Tredan. (2007) How robust are gossip-based communication protocols? Operating Systems Review, 41 (5). pp. 14-18. ISSN 0163-5980