390 likes | 407 Views
PARMON A Comprehensive Cluster Monitoring System A Single System Image Case Study Developer: PARMON Team Centre for Development of Advanced Computing, Bangalore, India http://www.cdacindia.com Project Leader: Rajkumar Buyya (buyya@computer.org). Topics of Discussion.
E N D
PARMONA Comprehensive Cluster Monitoring SystemA Single System Image Case StudyDeveloper: PARMON TeamCentre for Development of Advanced Computing,Bangalore, Indiahttp://www.cdacindia.comProject Leader: Rajkumar Buyya (buyya@computer.org)
Topics of Discussion • PARMON System Model & Architecture • PARMON Server • PARMON Client • PARMON Features and Services • PARMON Installation and its Usage • Monitoring with PARMON • PARMON Integration with other products • Conclusions and Future Directions
Motivations • Workstation clusters have off late become a cost-effective solution for HPC ? . • C-DAC’s PARAM 10000 is a large cluster of more than 40 Ultra-4 workstations interconnected through low-latency, high bandwidth communication networks. • Monitoring such huge systems is a tedious and challenging task since typical workstations are designed to work as a standalone system, rather than a part of workstation clusters. • System administrators require tools to effectively monitor such huge systems. PARMON provides the solution to this challenging problem.
APPLICATIONS SYSTEM MANAGEMENT TOOLS Development Tools F90 IDE, DIVIA Parallel File system C-PFS Languages C, F77, F90, Message Passing Interfaces C-MPI, PVM Light Weight Protocols SOLARIS CLUSTER HARDWARE C-DAC HPCC Software Architecture
PARMON Capabilities • PARMON allows the user to monitor system activities and resource utilization of various components of workstation clusters. • It monitors the machine at various levels: component, node and the entire system level exhibiting a single system image. • It allows the system administrator to monitor the following. • Aggregation of system resources utilization. • Process activities. • System log activities. • Kernel activities. • Multiple instances of the same resource.
PARMON - Salient Features • Online creation of Node and Group database • Allows to monitor system activities at Component, Node, Group, or entire Cluster level monitoring • Designed using state-of-the-art Java technology • Monitoring of System Components : • CPU, Memory, Disk and Network • Allows to monitor multiple instances of the same componet. • Facility for definition of events and automatic notification • Miscellaneous facilities : Message broadcast, Invocation of system management commands (halt, reboot, etc.), System Information & Configuration • PARMON provides GUI interface for initiating activities/request and presents results graphically.
PARMON High-Speed Switch PARMON System Model PARMON Server on Solaris Node PARMON Client on JVM parmon parmond
PARMON Implementation • Server • Multithreaded using POSIX and Solaris • Developed using C as it need to access system internals • It is a stateless server • Client • Developed using Java • Java features are extensively used.. • New Window is created for each client request, which interacts with server • Threads are used extensively to while creating online resource utilization meters • Dynamically configures with changes to node date base.
Setting up of PARMON • Server installation & invocation • Binding to port • Rights (requires root permission for full functionality) • parmond or parmond <port-no>(either at boot time or on-line) • Needs to be loaded on all nodes to be monitored • Client installation & invocation • Java based client (client machine can be PC/workstation supporting JVM) • CLASSPATH (pointing to classes.zip, parmon.jar) • jar file (parmon.jar) • java parmon or java parmon <port-no>
PARMON Integration with other Products • PARMON can send resource utilization information to any other product if protocols are made available Node 1 parmond Node N PARAM online bulletin board
Summary and Recent Works • PARMON successfully used in monitoring PARAM OpenFrame Supercomputer, which is a cluster of 48 Ultra-4 workstations running SUN-Solaris operating system. • Portable across platforms supporting Java • Comprehensive monitoring support and GUI • PARMON supports Solaris and Linux clusters and planned for supporting NT clusters (one such implementation was carried out at UPC, Barcelona). • It has been extended to support web-based monitoring of clusters, by creating a interface server (running on web-server) between client and PARMON server running on cluster nodes.
References • Project Team: • Rajkumar Buyya • Krishna Mohan • Bindu Gopal • R. Buyya, PARMON: A Portable and Scalable Monitoring System for Clusters, International Journal on Software: Practice & Experience (SPE), John Wiley & Sons, Inc, USA, June 2000. • Further Info: http://www.buyya.com/parmon • C-DAC: http://www.cdacindia.com