440 likes | 454 Views
Design Patterns from Biology for Distributed Computing. Andres J. Ramirez. Paper Information. Authors Babaoglu, Canright, Di Caro, et al. (about 11 different dudes.) Published ACM Transactions on Autonomous and Adaptive Systems, Vol 1, No 1, September 2006. Presentation Outline.
E N D
Design Patterns from Biology for Distributed Computing Andres J. Ramirez
Paper Information • Authors • Babaoglu, Canright, Di Caro, et al. (about 11 different dudes.) • Published • ACM Transactions on Autonomous and Adaptive Systems, Vol 1, No 1, September 2006.
Presentation Outline • What is a design pattern? • Current challenges in designing software systems. • Parallelism to biological systems. • Presentation of design patterns extracted from biological systems. • Experimentation and validation. • Conclusion.
What is a design pattern? • Various definitions proposed: • “Each pattern describes a problem which occurs over and over again in our environment, and then describes the core of the solution to that problem, in such a way that you can use this solution a million times over, without ever doing it the same way twice.” [Chris Alexander] • “Each design pattern systematically names, explains and evaluates an important and recurrent design in object-oriented system.” [Gamma et al.] • “A recurring solution to a standard problem.” [Schmidt] • Overall, most are rather similar.
Design Pattern Presentation • Bare minimum format: • Pattern name. • Problem description. • Solution to the problem. • Consequences of applying the design pattern.
Current Challenges • Distributed environments are common place now • Extremely dynamic. • Unreliable. • Large scale. • Traditional approaches for designing distributed systems are not applicable.
Biological Systems • Effectively organize large numbers of unreliable and dynamically changing components (cells, molecules, individuals) into structures that implement a wide range of functions. • These structures exhibit: • Robustness to failure. • Adaptability to changing conditions. • Lack of reliance on an explicit central coordinator.
Why patterns from biology? • Biological entities evolve to solve a particular problem, usually related to survival issues. • This solution, by the notion of evolution, must be well tested and reliable to be in existence today. • Similarities exist between distributed computing systems and biological systems. • Solutions from one domain can transfer onto the other.
Key Idea • Abstract design patterns from biological systems and apply them in distributed systems. • Serve as a bridge between biological systems and computer systems. • How do they accomplish this? • Formulate the patterns as local communication strategies over arbitrary communication topologies.
Design Pattern Presentation in Paper • Name • Handle for the pattern. Key. • Context • Defined by the system model (more in a bit.) • Problem • Possible functionality we are trying to achieve.
Design Pattern Presentation in Paper • Solution • An algorithm which produces the desired output based on the problem. • Example • Sort of a case study. • Design Rationale • The inspiration from biology.
System Model • Basic Abstraction • Network where nodes communicate via message passing. • Additional Assumptions • Basic components are nodes. • Computing devices which maintain states and perform computations. • Neighbors • Only “visible” neighbors can send messages to each other. • Asynchronous message passing • No message delivery time bound.
System Model • Nodes are unreliable • Nodes may fail. • Can leave and join at any time. • Communication mediums are unreliable • Messages can be lost. • Side note: No mention of corrupted message passing? • Three Generals Problem does not seem to be addressed. • Maybe animals are more trustworthy than humans?
Topology Issues • The topology here is given by the graph defined by the neighbor relation. … typical topology definition from graph theory. • Two particular networks seen in this work: • Overlay Networks • Mobile Ad Hoc Networks (MANETs)
Overlay Networks • Promising paradigm for building applications over large-scale wide-area networks. • Service Clouds is an example. • Logical structures built on top of a physical network with a routing service. • Any node can send to any other node granted it knows the target nodes network address.
Mobile Ad Hoc Networks • Set of wireless mobile devices which self-organize into a network without relying on a fixed infrastructure. • All nodes are treated equal. • Neighbor relations are dependent on the wireless connections between nodes. • Defined by transmission power and physical proximity.s
The actual Design Patterns • Plain Diffusion • Replication • Stigmergy • Chemotaxis (composite) • Reaction Diffusion (composite)
Plain Diffusion • Problem: • Bring the system to a state where each node contains the average value of all the values in the system. • Assign a gradient to each link that is proportional to the change in values when following the link. • Solution: • Rely on message passing. • For each link, each node periodically subtracts a fixed proportion from its current value and sends it along the given link. On the receiving side, add the message to current value.
Plain Diffusion • Solution presented maintains the sum of all the values in the system constant. • All the node values will quickly approach the average value. • Gradients are generated in this process.
Plain Diffusion • Design Rationale • A form of diffusion. • Equalizing the concentration of some substance or some abstract quantity like heat. • Present in many biological and physical systems. • Known to be efficient at convergence. This will be important when testing in a distributed environment.
Replication • Problem: • Propagate novel information to all other nodes. • Assign the maximal value present in the network to all the nodes. • Find a node which contains a document matching a given query.
Replication • Solution: • Nodes receive messages from neighbors and forward them according to application specific rules. • Flooding is an easy but expensive example. • Messages can stand for the maximum value (thus solving problem 2) • Messages can stand for the query until a match is found (thus solving problem 3)
Replication • Design Rationale: • Replication is common place in nature • Growth processes, signal propagation in certain neural networks, epidemic spreading. • Messages can be seen as infective agents which propagate through the system invading hosts (nodes.)
Stigmergy • Problem: • Assuming that the links between nodes have weights attached, find the shortest path between two given nodes. • Nodes need not be directly connected. • Redistribute items found in one node over a small number of nodes where similar items are held at the same node. • Does not really address when all the items are the same? Does it even matter?
Stigmergy • Solution: • Let every node contain a set of variables called stigmergic variables. • Nodes generate messages and send and received these based on application dependent policies. • Reception of a message will trigger an action. • Defined by the message itself and the stigmergic variables of the node. • Stigmergic variables are updated and then the message (also updated) is forwarded. • Essentially, distributed reinforcement learning.
Stigmergy • In the first problem, the estimated cost for a particular path is represented by the stigmergic variables. • As it progresses, the variables are updated with more exact costs. • In the second problem, clusters form by assigning items to the messages and determining whether the message is forwarded or not based on the stigmergic variables.
Stigmergy • Design Rationale: • Typically seen in distributed self-organizing behaviors in diverse social systems. • Nest building, labor division, path finding. • Classic example, ants.
Chemotaxis • Note: Composite pattern based on plain diffusion. • Problem: • Finding a short path from a given node to regions of the network where the concentration of a diffusive substance Is maximal • Does not seem to incorporate finding the shortest path?
Chemotaxis • Solution: • Just follow the maximal gradient. • Start at any given node • Select link with highest gradient • Repeat until local maximum concentration is found. • Greedy Algorithm! Not necessarily the shortest path, and not necessarily where the highest diffusive substance is found.
Chemotaxis • Design Rationale: • Cells or organisms might direct their movements according to the concentration gradients of one or more chemicals in the environment. • Responsible for the development of certain multicellular organisms and pattern formations.
Reaction-Diffusion • Not a pattern, a framework covering a large set of patterns. • A strong generalization of the plain diffusion pattern • Simultaneous diffusion of one or more materials. Also removal. • Nothing else on this framework, pattern, etc.
Evaluating Design Patterns • Insensitivity: • Self-repairing • Self-organizing • Adaptive • Intelligent • Quantifying the notion of good and bad in a sense of merit. • Dependent on too many things, domain specific, not perfectly defined. • Insensitive systems show little variation in the figure of merits as the environment varies.
Evaluating Plain Diffusion • Distributed Aggregation Problem • Calculating global functions over the set of locally known quantities. • We saw these problems earlier. • Simplify the task of controlling, monitoring and optimizing distributed applications, among other things. • Building block for other patterns. • In the paper, the average is found.
Evaluating Plain Diffusion • Algorithm: • Each node p has two threads, active and passive. • Active thread: periodically initiates an information exchange with peer node q selected at random. Message contains state of p. • Passive thread: waits for a message and replies with the local state. • Symmetric information exchange, constant update of values sent and received. • The update is defined by what the problem is trying to solve. In this example, take the average of the two messages. • Could also do a maximum, etc.
Evaluating Plain Diffusion • How good is this solution? • Value at each node will converge to the true global average. • IF the underlying overlay network remains connected. • Just how fast does it converge? • Exponential. • Very high precision estimates are achieved in a few cycles regardless of network size. • It is scalable!
Evaluating Plain Diffusion • Simulation done on PeerSim. • Count protocol -> number of nodes in the network. • Average calculation over a starting set of numbers. • One node has value 1, rest 0. Obtain? • 1/N. • Why do this? • Very sensitive to failures. • Tests scalability and robustness.
Evaluating Plain Diffusion • Converged to a specific value exponentially, as predicted. • What about failures? • If crashed node has a smaller value than the actual global average, estimated average will increase. • N will decrease. • Opposite case? Opposite results. • Crashes have the most impact in the first few iterations. • Churn? Adding and removing nodes (N remains constant though.) • Estimates still reliable.
Evaluating Replication • Distributed Search. • Idea is to spread queries throughout nodes. • Typical, simple, stupid solution? • Flood the network. • Clone the queries received at a node and propagate to all neighbors. • Huge overhead. • Opposing objectives. Higher efficiency vs lower overhead. • Can we do better?
Evaluating Replication • Design the algorithm for an unstructured overlay network. • No relation between the information stored at a node and its position in the overlay network. • Learn from proliferation • Replication strategy inspired by the immune system. • Basically acts as a rate limit on propagated messages. • B cells, after being stimulated by an antigen, proliferate generating antibodies. • After this, basically a gang of antibodies do several drive-bys on the antigens and you are no longer sick!
Evaluating Replication • Treat the query as the antibody and the searched items as the antigens. • Search can be started at any node. • Send query messages to k neighbors. • Receive a message? • Calculate the similarity between query and local contents. • Higher the similarity, more messages sent out. • Only new neighbors.
Evaluating Replication • Restricted proliferation shown to be more effective than random walks. • Even though some fluctuations were present in the results, restricted proliferation performed roughly 50% better than restricted random walk. • Key notion? • Guiding message replication to areas of more promise yields better results.
No more! • I am sure I have bored you by now. • General experiment results of the remaining patterns exhibit better performance and insensitivity to traditional approaches seen in distributed computing. • Want some more specifics, look at the paper. • You did do that already, right? • Good.
Conclusions • Biological systems have evolved through millions of years to reach their current point. • Evolution happens for a reason, it is a search for a solution to survival. • We can extract some of this behavior and apply it with success to distributed computing systems. • Great amounts of parallelism between the two.
Conclusions • Solutions are not perfect, but they are good. • Few patterns extracted, certainly more are possible. • Translate ideas from large, varied and seemingly unrelated systems into one language • Applicable to our domain.