160 likes | 168 Views
This study presents AVMON, a system for selecting and discovering consistent availability monitoring overlays in large-scale distributed systems. The system aims to maintain long-term availability information while addressing challenges such as churn, selfish and colluding nodes, and scalability. AVMON achieves consistency, verifiability, load balancing, randomness, and discoverability through its hash-based implementation and coarse view maintenance protocols.
E N D
AVMON: Optimal and Scalable Discovery of Consistent Availability Monitoring Overlays for Distributed SystemsRams´es Morales and Indranil GuptaDept. of Computer Science, University of Illinois at Urbana-Champaign
Outline • Introduction • Design Goals • Existing Solutions and Deficiencies • AVMON • Final Observation
Introduction • Research Problem • The selection and discovery of consistent availability monitoring overlay • Why? • Large-scale distributed applications: replication, multicast... • Churn: rapid and continuous arrival and departure, failure, and birth and death • Availability-aware strategies • Availability monitoring service: maintain long-term availability information
Introduction (cont.) • Challenges • Selfish nodes • Colluding nodes • Distributed application challenges • The availability monitoring problem consists of two sub-problems • The selection and discovery of availability monitoring overlay • Availability history maintenance
Design Goals • Consistency • Monitoring relationships should be consistent • Maintain long-term availability history • Avoid transferring histories due to churn • Avoid monitoring set pollution • Verifiability • Correctly verify the monitoring relationship between any two nodes • Load Balancing • Scalability
Design Goals (cont.) • Randomness • A node's monitoring set should be picked uniformly at random: • In an identically distributed fashion: reduces pollution, helps in load balancing • Independently of one another: avoids groups of nodes being together in several monitoring sets • Discoverability • Any node should be able to quickly find its monitoring and target sets • Any node should be able to locate at least a constant number of any other node's monitoring sets
Existing Solutions • Self-reporting • Selfish lying nodes • Central availability monitor • Scalability • Load balancing • DHT-based • Consistency • Verifiability • Randomness
AVMON • System Model • Distributed system • Nodes may leave, fail, rejoin, be born, die • Death is silent and not explicit • Communication is reliable and timely • Nodes have persistent storage • Stable system size
AVMON • Overview • Hash-based implementation of the monitoring relationship • Any node can execute the hash-based function to determine if two nodes have a monitoring relationship • Consistent, random, and verifiable • Guarantees that each node has an O(K) nodes in its monitoring set • The system maintains a coarse overlay to discover monitoring relationships between nodes and inform the relevant nodes
AVMON • Monitor discovery via coarse view • Monitors are discovered via each node maintaining a coarse view • The coarse view has a maximal CVS entries • The goal is to have the coarse view be random • This is done by two sub-protocols
AVMON • Joining sub-protocol • Freshly joining & rejoining • Tell CVS other nodes about X • Node X sends a JOIN message to a random node Y • Join message has ID(X) & an integer weight C • When Y receives a JOIN message s.t. C ≠ 0 • It adds X to its coarse view if it is not there already and if there is room for it • It decrements C • It forwards two JOIN messages on behalf of X to random nodes in Y's coarse view. Each of the messages with C/2
AVMON • Coarse view maintenance and discovery • Executed once every protocol period • This protocol has three tasks • Eliminate nodes that left the system • Shuffle new random entries into the coarse view • Discover monitoring relationships along the way • Node X randomly picks a node Z in its coarse view and pings it • Unresponsive nodes are removed • Verify monitoring relationships between pairs of nodes (U, V) from X's and Z's coarse views • Pairs that satisfy the relationship are informed by a NOTIFY message • X selects new coarse view at random from the union of X's and Z's coarse views
AVMON • Using the monitoring overlay • Node X receives a NOTIFY(U, X) • X checks if U is already in its monitoring set • If not, X rechecks the hash-consistent condition • If the condition is true, X adds U to its monitoring set • Node X receives a NOTIFY(X, U) • X checks if U should be in its target set • Node X monitors the availability of its target set by sending monitoring pings every monitoring period, and it maintains the availability history of those nodes • Node X is responsible of reporting a subset of its monitoring set to any node Y that requires it
AVMON • Forgetful pinging • An optimization for cleaning monitoring and target sets from nodes that may never join the system again • These nodes can not be deleted • Instead, reduce the monitoring ping frequency • If a node in X's target set is not responsive for a period of time longer than the threshold • Collusion-resilience • If X has O(N/log(N)) colluding nodes, it is probabilistically impossible for them to pollute X's monitoring set • Even for up to O(N/log(N)) colluding relationshipss
Final Observation • In the joining sub-protocol, how does X know about Y • Effect of coarse view maintenance and discovery on history transfers