380 likes | 565 Views
Gossip Algorithms and Implementing a Cluster/Grid Information service MsSys Course. Amar Lior and Barak Amnon. Agenda. A short introduction to gossip algorithms Cluster/Grid Information services requirements How good is old information The distributed bulletin board model Implementation.
E N D
Gossip AlgorithmsandImplementing a Cluster/Grid Information serviceMsSys Course Amar Lior and Barak Amnon
Agenda • A short introduction to gossip algorithms • Cluster/Grid Information services requirements • How good is old information • The distributed bulletin board model • Implementation
A Problem • In an n node system assume that every pair of nodes can communicate directly • node iwishes to send a message (rumor, color) to all other nodes. • Possible deterministic solutions • BROADCAST (only in a broadcast medium) • Defining a static tree between the nodes and sending the message along the edges of this tree
A Gossip Style solution • Starting with the round in which a rumor is generated • each node that holds the rumor selects another node independently and uniformly at random • send the rumor to this node • The distribution of the rumor is terminated after some fixed number of O( ln n ) rounds • At this point all players are informed with high probability
Gossip benefits • Robustness to the presence of node failures • Messages will continue to propagate due to the random selection of destination • F nodes failure results in only O(F)uninformed players • Simplicity • All nodes run the same algorithm • Scalability • The number of massages each nodes send (and possibly receive) each round is fixed
Gossip taxonomy • Other names are • Epidemic algorithms (demers et al) • Randomized communication (Karp et al) • Propagation can be done by • Push – sending the information from the node to the selected node • Pull – the other way around • Push&Pull both ways • We distinguish between 2 conceptual layers • A basic gossip algorithm • by which nodes choose other nodes for communication • A gossip-based protocol • Built on top of a gossip algorithm • Determine the content of the messages that are sent • The way received messages cause nodes to update their internal state
Rumor speeding bounds From a single node to all • Time complexity: • Message complexity (Karp el al) lower bound to the number of messages:
Spatial Gossip (Kampe at al) • New information is most interesting to nodes that are nearby • Combines the benefits of • Uniform gossip • Deterministic flooding • The gossip algorithm chooses the nodes according to • New information is spread to nodes at distance d with high probability,in :
Aggregating values • Gossip can also be used to aggregate a value over all nodes • Average, maximum, minimum … • In this case the question is how fast the local value in each node converge to the desired value
Cluster/Grid Information services • Basic properties of Grid environment • Information sources are distributed • Individual sources are subject to failure • Total number of information providers is large • Both the types of information sources and the ways it is used can be varied • We cannot in general provide users with accurate information: any information delivered to a user is “old” • How useful is old information? (Mitzenmacher) • How to build an information service with guaranteed age properties?
Distributed Bulletin board • The system • Consists of ‘N’ nodes (or clusters) • Distributed • Nodes are subject to failure • Each node maintains a data structure that holds an entry on selected (or all) nodes in the system • We refer to this data structure as “The vector” • Each vector entry holds: • state of the resources (static and dynamic) about the corresponding node • age of the information (tune to the local clock) • The vector is a distributed bulletin board that serves information requests locally
Algorithm 1- Information dissemination • Each time unit • Update local information • Find all vector entries which are up to age t • Choose a random node • Send the above entries to that node • Upon receiving a message • Compute the received entries age • Update the entries which the newly received information is fresher A:1 C:2 D:4 A:1 B:12 C:2 D:4 E:11 B:1 C:3 E:3 A:4 B:12 C:2 D:4 E:11
Bounds and Approximations • We want to know “how old” is the information in the vector • First we find E(Xt) (for the asynchronous case) • The expected number of nodes that have information about node i which is up to t time unit old Synchronous case
Bounds and Approximations • An approximation for the expected age of the vector
Approximating the age distribution • Ak is a random variable describing the number of nodes which are up to age k
Handling inactive nodes • The presence of inactive nodes causes problems • Age quality of the information deteriorate • Number of ARP broadcasts increase linearly • Using a fixed size window improves the age quality but the number of ARP broadcasts stay the same
Algorithm 2 • Algorithm 2 solves the above 2 issues • Works basically the same as algorithm 1 with the following difference when sending a message • Calculate l the number of active nodes (from the local vector) • Generate a random number between k=0…l • If K=0 send the window to all nodes • Else send the window only to the active nodes • Using Algorithm 2 the maximal expected number of messages to inactive nodes ≤ 1 • From all nodes at each round
Algorithm 2 2 t
Algorithm 2 3 t
Algorithm 2 4 t
Supporting Urgent information • In previous algorithm information is propagated from all nodes constantly • In some cases we wish to send an important message urgently to all • such as the detection of a newly dead node • In this case the source node give the message high priority 2*log(n) • When a node assemble the window it is about to send it takes the entries with the highest priority and only then the younger entries • The priority of an entry is decremented every time unit • The result is that urgent messages are disseminated in O(log(n)) steps • And regular information is disseminated a bit slower
Information service clients • MOSIX • load balancing • Fresh information is used by the load balancing algorithm to consider migrating processes • mmon, Mosix Monitoring tool • Presents the vector of a specific node • mmon –h xil-10 • MPICH • Improved assignment of processes to nodes • No assignment to “dead” nodes • Assignment to the least loaded ones • Nagios • Colleting information about clusters over time (history) • Periodically retrieving a vector from a machine and keeping it • Decision algorithms in the cluster level • Leader election (queue fault tolerance) • Node reservation
Conclusions • Constructed a distributed bulletin board • Age properties are guaranteed • The administrator can configure it to the desired properties • No two nodes have the same view of the system • Information requests are served locally • Noise level (messages to inactive) is constant • Urgent messages are propagated quickly
Future Work • Investigating other gossip models • Push and Pull-Push • Using only a partial view of the system