170 likes | 347 Views
Putchong Uthayopas Director High Performance Computing and Networking Center Faculty of Engineering, Kasetsart University Thailand. Grid Monitoring Tools. Introduction. The ability to monitor and manage large scale distributed system is very important
E N D
Putchong Uthayopas Director High Performance Computing and Networking Center Faculty of Engineering, Kasetsart University Thailand Grid Monitoring Tools
Introduction • The ability to monitor and manage large scale distributed system is very important • Determine the source of performance problem • System tuning for both application and system software • Fault detection and Recovery • Support for advance services such as prediction system (NWS), Grid scheduler, accounting service
Some of the Monitoring Entities • Software ( Application and System Software) • Behavior (cpu, memory, disk usage, message, event generated) • Platform • Resources usage • Processor, I/O , memory • Network status • Bandwidth, latency • Availability
Challenges for Grid Monitoring • Scalability across wide area network • Lage number of entities to monitor • Large number of properties to monitor • Large data strorage requirement for monitoring data • Timeliness delivery of data across wide area network • Heterogeneity • Platform, Protocol • Create interoperability problem • Integration with Grid middleware in term of security and naming
Grid Monitoring Architecture • Defined by Grid Performance Working Group of Grid Forum document GWD-Perf-16-1 • Consists of 3 types of components • Directory for Resource Discovery • Producer : make performance data available • Consumer : use the performance data • These components communicate using events Consumer Publish/ Query Directory Services Events Publish/Query Producer
Grid Monitoring Architecture • Directory Service • used by producer and consumer to “discover” each other and shared some characteristics • Service Models • Consumer initiated • Query/ subscribe (stream) • Producer initiated • Push event/ push stream
Some tools • NWS (Network Weather Services) • Monitoring of system, network and predict traffic • Netlogger Toolkit • Toolkit for integrated application and system monitor • host/network monitoring tool, client library, and Simple visualization tools • Ganglia • System monitoring of cluster and some grid extension • Iperf – measure internet bandwidth from point to point • MRTG – Popular Network Data Graphing Tool • FlowScan – Network Flows measurement based on netflow's archtecture • Many more tools
Grid Observer Project (HPCNC/KU) • Objective • Building technology and tools for grid and cluster monitoring and performance analysis • Setup a “Grid Observatory” that • Monitor grid status , Send report/ alert • Collect and distribute monitoring data for further analysis • Explore the deployment of existing tools • Software being developed evolve from our monitoring Service in OpenSCE
Features • Cluster and grid monitoring package • Support system monitor such as processor utilization, I/O, network, memory, temperature • Graph base presenter with Web interface and RRD (Round-Robin Database) tools • Simple grid support • Simple notification service ( a simple analyser)
Observer Recursive Monitoring Structure Analyser Collector Presenter Data Analyser Collector Presenter Data Other Monitoring System (SNMP, NWS, Ganglia etc. ) Sensors Sensors
Components • Sensor is a producer that generates the performance information • Support multicast and query mode • Seperate hardware access layer • Collector : Hybrid producer and consumer • Consumer of sensor data and also act as a producer for the next level • Presenter : consumer that visualize the information • Analyser : Analyser information collected
Sensors API App Sensor Sensor Sensor Sensor Sensor Multicast Channel - Dynamic Loadable Plugin - multithreaded Sensor Core Plugin Plugin Plugin HAL
Work in Progress • Reliability analyser • Interface to other system ( SNMP, Ganglia, NWS) • Event schema ( suppose to be done by GGF) • Better presenter • Redesign many ad-hoc part • More sensors. C language client sensors library is available in other project, not integrated into this yet • Bandwidth measurement • Better Observatory Site (currently at observer.cpe.ku.ac.th) • Add grid security support
Todo list • Interoperability • More detailed framework / architecture • Event schema • Data storage format • Common monitoring protocol • monitoring capabilities definition • Collaborate with GGF various working group
Thank you. Question and Answer? End of Presentation
Issues • Monitoring data is currently stored in file • Avoid the overhead of data movement • Cope with much larger data set than being stored in directory • Remote access is still difficult, lock Presenter to the same node as data • Uniform way to handle • System and application event (dynamics) • Structural data such as system configuration (static)