510 likes | 607 Views
The Organic Grid: Self-Organizing Computation on a Peer-to-Peer Network. Presented by : Xuan Lin. Outline. Introduction Motivation Organic Scheduling Scheme Experiment Evaluation Conclusion. Outline. Introduction Motivation Organic Scheduling Scheme Experiment Evaluation Conclusion.
E N D
The Organic Grid: Self-Organizing Computationon a Peer-to-Peer Network Presented by : Xuan Lin
Outline • Introduction • Motivation • Organic Scheduling Scheme • Experiment Evaluation • Conclusion
Outline • Introduction • Motivation • Organic Scheduling Scheme • Experiment Evaluation • Conclusion
Introduction • Scientific Computations require large scale distributed computing. • Traditional Grid vs. Desktop Grid • Centralized vs. Decentralized • Mobile agent. (Weak mobility, Strong mobility, Forced Mobility)
Outline • Introduction • Motivation • Organic Scheduling Scheme • Experiment Evaluation • Conclusion
Motivation • Many previous schemes assume reliable network. • Centralized schemes suffer from poor scalability. • Traditional scheduling schemes assume sufficient system information. • Inspired by Local Activation, Long-range Inhibition (LALI)
Outline • Introduction • Motivation • Organic Scheduling Scheme • Experiment Evaluation • Conclusion
Assumptions • Independent-task application, data initially resides at one location. • Each node initially has a “friend lists”.
A. General Approach • Tree-structured overlay network is selected as the desirable pattern of execution. • Empirically determined the simplest behavior that would organize the communication and task distribution among mobile agents. • Augmented the basic behavior by introducing other desirable properties.
B. Basic Agent Behavior • A computational task is encapsulated in an agent. • A user starts the computation agent on his/her machine. (root of the tree) • The agent starts one thread for computation. • At the same time, the agent is prepared to receive requests.
B. Basic Agent Behavior (con’t)-when get a request • The agent dispatches a clone when get requests. (The requester will be a child). • The clone will ask for its parent for subtasks.
B. Basic Agent Behavior (con’t)-requester • A thread begins to compute. • Other threads are created-when required- to communicate with parents or other machines. • If a requests is received, this ‘child’ sends its own clone to the requester. It will become the parent of the requester. The requester will be a child of this node. • …… Thus, the computation spreads.
B. Basic Agent Behavior (con’t) • An agent requests its parent for more subtasks if it completes its own subtasks. • Every time a node obtain r results, it sends them to its parent.
C. Maintenance of Child-lists • Up to cactive children and up to p potential children. (balance of deep and width of the tree) • Active nodes are ranked by their performance (the rate the node sends result). • Potential children are the ones which the current node has not yet been able to evaluate. • A potential child is added to the active child-list once it has sent enough results to the current node.
C. Maintenance of Child-lists (con’t) • When the node has more than cactive children, the slowest node (sc) will be kicked out. • The sc is then given a list of other nodes, which it can contact to try and get back to the tree. • The sc will also be put into a list which records o former children. (Avoid thrashing )
D. Restructuring of the Overlay Network • Philosophy: Having best nodes close to the top enhances the extraction of subtasks from the root and minimizes the communication delay. • The overlay network is constantly being restructured so that the nodes with the highest throughput migrate toward the root.
D. Restructuring of the Overlay Network (How to achieve that?) • A node periodically informs its parent about its best-performing child.
D. Restructuring of the Overlay Network (con’t) • A sc is not simply discarded. • The parent sends a list of its children in descending order of performance. • The sc attempts to contact these nodes in turn.
E. Size of Result Burst • R result-burst intervals • r results • (R+1)* r • If r and R are too large, it will take too much time for the network to update.
F. Fault Tolerance • What can we do when nodes lost connection? • Every node keeps track of unfinished subtasks that were sent to children. • Each node keeps a list of a ancestors.
G. Cycles • Failure could cause cycles. • (How to find the cycle?) Every node checks its ancestor list on receiving it from its parents to see if itself is in the ancestor. • (How to break the cycle?) Try to obtaining the address of some other agent on its data distribution or communication overlays.
G. Cycles (starvation) • May cause starvation. • If the agent is starved of work for more than a specified time, it self-destructs.
H. Termination • Root sends out termination messages. • The messages will spread down to leaves. • Two scenarios: 1. If a node does not get such message, the situation will be the same as F. 2. n2 does not get the termination messages but it is in n1’s friend-list. n1 terminate when it get informed. n2 will clone itself to n1 when it is informed by n1 ?????
I. Self-adjust of Task List Size • In an ITA-type application, the utilization of a high-performance machine may be poor because it is only requesting a fixed number of subtasks at a time. • So, agents request more or less according to its performance. (compare to last run) • i(t), d(t)
J. Prefetching • Motivation: A potential cause of slowdown in the basic scheduling scheme described earlier is the delay at each node due to its waiting for new subtasks. • Using the self-adjustment function i(t) to prefetch. • However, excessively prefetching will degrade the performance since prefetch will increase the amount of data that needs to be transferred at a time.
Outline • Introduction • Motivation • Scheduling Scheme • Experiment Evaluation • Conclusion
Metric • Total Computation Time • Ramp-up Time The time required for subtasks to reach every single node. • Topology Fast nodes should migrate to the root as close as possible.
Experiment Configuration • Application: NCBI’s nucleotide-nucleotide BLAST, the gene sequence similarity search tool. ( Match a 256KB sequence against 320 data chunks) • A cluster of eighteen heterogeneous machines • Introduced Delays in the application code. • The machines ran the Aglets weak mobility agent environment on top of either Linux or Solaris.
B. Effects of Child Propagation (con’t) • 32% improvement in the running time
C. Result-Burst Size • There is a qualitative improvement in the child-lists as the result-burst size increases. • However, with very large result-bursts, it takes longer for the tree overlay to form and adapt, thus slowing down the experiment.
D. Effects of prefetching • Ramp-up Time is affected by prefetching and the minimum number of subtasks that each node requests.
D. Effects of prefetching (con’t) • Prefecthing degrades the throughput when the No. of subtasks increases.
F. Number of Children • Two experiments: good initial configuration and star topology • The total time are approximately the same. • Children have to wait for a longer time for their requests to be satisfied.
Outline • Introduction • Motivation • Scheduling Scheme • Experiment Evaluation • Conclusion