160 likes | 265 Views
Self-Organizing Agents for Grid Load Balancing. Junwei Cao, Ph.D. Research Scientist Center for Space Research Massachusetts Institute of Technology Cambridge, MA 02139 Phone (617)253-8160 Fax (617)253-7014 Email caoj@mit.edu http://www.mit.edu/~caoj. Acknowledgements.
E N D
Self-Organizing Agents for Grid Load Balancing Junwei Cao, Ph.D. Research Scientist Center for Space Research Massachusetts Institute of Technology Cambridge, MA 02139 Phone (617)253-8160 Fax (617)253-7014 Email caoj@mit.edu http://www.mit.edu/~caoj
Acknowledgements • This work was carried out when the author was with C&C Research Laboratories, NEC Europe Ltd., Sankt Augustin, Germany. The COSY development was partly funded by the European GEMSS project (EC/IST FP5 Project No. IST-2001-37153). • This work originated from the author’s PhD research at the High Performance Systems Group, Department of Computer Science, University of Warwick, Coventry, UK. The author would like to express his gratitude to group members, including Prof. Graham R. Nudd, Dr. Darren J. Kerbyson, Dr. Stephen A. Jarvis, and Mr. Daniel P. Spooner, for their previous contribution. • The author appreciates MIT/LIGO supports to attend the SC2004/GRID2004 conference very much. GRID2004, Pittsburgh, USA
ARMS Agent COSY User User ARMS Agent ARMS Agent COSY ARMS Agent ARMS Agent COSY COSY COSY COSY Two-tier Resource Management in Grid Computing Environments GRID2004, Pittsburgh, USA
Processor 1 Processor 2 Processor 3 Processor 4 Processor 5 Processor 6 Processor 7 Processor 8 2n-1 COSY: Resource Scheduling and Load Balancing for Clusters GRID2004, Pittsburgh, USA
ARMS: Agent-Based Resource Management for Grid Computing GRID2004, Pittsburgh, USA
Grid Load Balancing: User-Driven vs. Self-Organization • An ARMS system can achieve grid load balancing as a result of trying to meet QoS requirements specified explicitly by users • Self-organization is investigated in this work for agents to perform load balancing automatically for batch queuing jobs that are not explicitly associated with execution deadlines. GRID2004, Pittsburgh, USA
Ant-like Self-Organization: Local Rules Lead to Global Changes • An ant wanders from one agent to another randomly and tries to remember the identity of an agent that is most overloaded; • After a certain number of steps (m), the ant changes the mode to search a most underloaded agent, though still wandering randomly. • After the same m steps, the ant stops for one step to suggest the current two remembered agents (considered to be most overloaded and underloaded, respectively) to balance their workload. • After load balancing is performed, the ant is initialized again and starts a new loop from 1. GRID2004, Pittsburgh, USA
Ant-like Self-Organization for Agent-Based Grid Load Balancing GRID2004, Pittsburgh, USA
Performance Evaluation: a Modeling and Simulation Approach GRID2004, Pittsburgh, USA
Performance Metrics: Balancing Speed (s) vs. System Efficiency (e) • Average Workload • Load Balancing Level GRID2004, Pittsburgh, USA
Performance Impact of the Number of Ants on Load Balancing • With the number of ants increased, load balancing speed is improved. • With 200 ants involved, high system efficiency can be achieved as well as reasonable load balancing level and speed. GRID2004, Pittsburgh, USA
Making Tradeoff between Balancing Speed and System Efficiency n=50; step=300 n=100; step=300 n=20; step=1 n=20; step=300 n=1000; step=300 n=2000; step=300 n=200; step=300 n=500; step=300 GRID2004, Pittsburgh, USA
Performance Impact of the Number of Ant Wandering Steps • When workload is seriously unbalanced among agents, more load balancing processes help improve performance instead of wandering a lot. • When the system has achieved a reasonable load balancing level, it seems wandering more to look for more overloaded or underloaded agents in a larger scope becomes more important. GRID2004, Pittsburgh, USA
Performance Impact of the Ant Wandering Style • The optimization guides all of the ants in one direction and a very good load balancing is only achieved in a local area of the agent grid. • An additional mechanism is introduced to enable ants “jumping” randomly once a while before stucking in a local area. GRID2004, Pittsburgh, USA
Ant Jumping for Global Effectiveness of Load Balancing style=optimal; step=300 style=jumping; step=300 GRID2004, Pittsburgh, USA
References and Related Links on Agent-Based Grid Computing • Grid Load Balancing Using Intelligent Agents. Future Generation Computer Systems, Special Issue on Intelligent Grid Environment: Principles and Applications, 2005. (to appear) • ARMSim: a Modeling and Simulation Environment for Agent-based Grid Computing. SIMULATION, Special Issue on Modeling and Simulation Applications in Cluster and Grid Computing, 80(4-5), 221-229, 2004. • Queue Scheduling and Advance Reservations with COSY. IPDPS 2004, Santa Fe, New Mexico, USA. • GridFlow: Workflow Management for Grid Computing. CCGrid 2003, Tokyo, Japan, 198-205. • ARMS: an Agent-based Resource Management System for Grid Computing. Scientific Programming, Special Issue on Grid Computing, 10(2), 135-148, 2002. • Performance Evaluation of an Agent-Based Resource Management Infrastructure for Grid Computing. CCGrid 2001, Brisbane, Australia, 311-318. • http://www.mit.edu/~caoj • http://www.ccrl-nece.de/gemss • http://www.dcs.warwick.ac.uk/~hpsg • http://www.agilecomputing.org GRID2004, Pittsburgh, USA