1 / 39

Topics of Discussion

PARMON A Comprehensive Cluster Monitoring System A Single System Image Case Study Developer: PARMON Team Centre for Development of Advanced Computing, Bangalore, India http://www.cdacindia.com Project Leader: Rajkumar Buyya (buyya@computer.org). Topics of Discussion.

kgreenfield
Download Presentation

Topics of Discussion

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PARMONA Comprehensive Cluster Monitoring SystemA Single System Image Case StudyDeveloper: PARMON TeamCentre for Development of Advanced Computing,Bangalore, Indiahttp://www.cdacindia.comProject Leader: Rajkumar Buyya (buyya@computer.org)

  2. Topics of Discussion • PARMON System Model & Architecture • PARMON Server • PARMON Client • PARMON Features and Services • PARMON Installation and its Usage • Monitoring with PARMON • PARMON Integration with other products • Conclusions and Future Directions

  3. Motivations • Workstation clusters have off late become a cost-effective solution for HPC ? . • C-DAC’s PARAM 10000 is a large cluster of more than 40 Ultra-4 workstations interconnected through low-latency, high bandwidth communication networks. • Monitoring such huge systems is a tedious and challenging task since typical workstations are designed to work as a standalone system, rather than a part of workstation clusters. • System administrators require tools to effectively monitor such huge systems. PARMON provides the solution to this challenging problem.

  4. APPLICATIONS SYSTEM MANAGEMENT TOOLS Development Tools F90 IDE, DIVIA Parallel File system C-PFS Languages C, F77, F90, Message Passing Interfaces C-MPI, PVM Light Weight Protocols SOLARIS CLUSTER HARDWARE C-DAC HPCC Software Architecture

  5. PARMON Capabilities • PARMON allows the user to monitor system activities and resource utilization of various components of workstation clusters. • It monitors the machine at various levels: component, node and the entire system level exhibiting a single system image. • It allows the system administrator to monitor the following. • Aggregation of system resources utilization. • Process activities. • System log activities. • Kernel activities. • Multiple instances of the same resource.

  6. PARMON - Salient Features • Online creation of Node and Group database • Allows to monitor system activities at Component, Node, Group, or entire Cluster level monitoring • Designed using state-of-the-art Java technology • Monitoring of System Components : • CPU, Memory, Disk and Network • Allows to monitor multiple instances of the same componet. • Facility for definition of events and automatic notification • Miscellaneous facilities : Message broadcast, Invocation of system management commands (halt, reboot, etc.), System Information & Configuration • PARMON provides GUI interface for initiating activities/request and presents results graphically.

  7. PARMON High-Speed Switch PARMON System Model PARMON Server on Solaris Node PARMON Client on JVM parmon parmond

  8. PARMON Implementation • Server • Multithreaded using POSIX and Solaris • Developed using C as it need to access system internals • It is a stateless server • Client • Developed using Java • Java features are extensively used.. • New Window is created for each client request, which interacts with server • Threads are used extensively to while creating online resource utilization meters • Dynamically configures with changes to node date base.

  9. Setting up of PARMON • Server installation & invocation • Binding to port • Rights (requires root permission for full functionality) • parmond or parmond <port-no>(either at boot time or on-line) • Needs to be loaded on all nodes to be monitored • Client installation & invocation • Java based client (client machine can be PC/workstation supporting JVM) • CLASSPATH (pointing to classes.zip, parmon.jar) • jar file (parmon.jar) • java parmon or java parmon <port-no>

  10. Monitoring System Activities and Resource Utilization

  11. PARMON Launcher

  12. Creation of Node Database

  13. Node Deletion

  14. Group Creation

  15. Group Modification/Deletion

  16. Resource Utilization at a Glance

  17. Selection of Nodes/Group

  18. CPU Usage Monitoring

  19. Memory Usage monitoring

  20. Disk/Network Usage Monitoring

  21. Message Viewer (System logs)

  22. Process activities

  23. Kernel Data Catalog - CPU

  24. Kernel Data Catalog - Memory

  25. Kernel Data Catalog - Disk

  26. Kernel Data Catalog - Network

  27. Catalog of CPU Parameters

  28. Component View - Physical

  29. Component View - Logical

  30. Message Broadcast

  31. System Configuration

  32. System Information

  33. Issuing Commands : halt, shutdown, etc.

  34. Node Diagnostics - Online (SunVTS)

  35. Online Help

  36. PARMON Integration with other Products • PARMON can send resource utilization information to any other product if protocols are made available Node 1 parmond Node N PARAM online bulletin board

  37. Summary and Recent Works • PARMON successfully used in monitoring PARAM OpenFrame Supercomputer, which is a cluster of 48 Ultra-4 workstations running SUN-Solaris operating system. • Portable across platforms supporting Java • Comprehensive monitoring support and GUI • PARMON supports Solaris and Linux clusters and planned for supporting NT clusters (one such implementation was carried out at UPC, Barcelona). • It has been extended to support web-based monitoring of clusters, by creating a interface server (running on web-server) between client and PARMON server running on cluster nodes.

  38. References • Project Team: • Rajkumar Buyya • Krishna Mohan • Bindu Gopal • R. Buyya, PARMON: A Portable and Scalable Monitoring System for Clusters, International Journal on Software: Practice & Experience (SPE), John Wiley & Sons, Inc, USA, June 2000. • Further Info: http://www.buyya.com/parmon • C-DAC: http://www.cdacindia.com

  39. Thank YOU ?

More Related