440 likes | 607 Views
PARYAVEKSHANAM STATUS MONITORING TOOL for INDIAN National Grid: GARUDA. Karuna Karunap@cdacb.ernet.in Co-authors: Deepika H.V.,Mangala N., Prahlada Rao BB, MohanRam N. System Software Development Group, Center for Development of Advanced Computing(C-DAC), Bangalore INDIA.
E N D
PARYAVEKSHANAM STATUS MONITORING TOOL forINDIAN National Grid: GARUDA Karuna Karunap@cdacb.ernet.in Co-authors: Deepika H.V.,Mangala N., Prahlada Rao BB, MohanRam N. System Software Development Group, Center for Development of Advanced Computing(C-DAC), Bangalore INDIA NORDUnet conference
GARUDA Overview GARUDA Architecture Monitoring Requirements Paryavekshanam Objectives Paryavekshanam Architecture Paryavekshanam Features Alert and Notification system Conclusion Presentation Plan 24thNORDUnet conference
GARUDA is initiated by C-DAC, and is funded by Dept. of Information Technology, Govt. of India. GARUDA provides an amalgam of advanced capabilities to enable increasingly interdisciplinary scientific environments required to solve complex problems. GARUDA connects 45 national research and academic institutions, across 17 cities/locations in India. GARUDA is used by applications communities such as Weather / Climate Modeling, Disaster Management, and Bio-informatics. Indian National Grid: GARUDA 24thNORDUnet conference
Geographically distributed resources across 17 cities and 45 research institute and academia Resources are dynamic and Heterogeneous in nature (Linux, Solaris, AIX) Resources are under various administrative domains Network backbone is of 2.43GB, 10/100 Mbps BW links from point –point. GARUDA middleware - Globus 2.x Multi-institutional Virtual Organization GARUDA Grid : Key Features 24thNORDUnet conference
GARUDA Grid Architecture Submit node gridfs GARUDA HeadNode Bangalore C-DAC Bangalore AIX C-DAC (Hyd) Linux RRI-Bangalore Linux Chennai Linux Pune Linux IGIB Linux Cluster Head Node Cluster Head Node Cluster Head Node Cluster Head Node Cluster Head Node Cluster Head Node Compute Nodes Compute Nodes 24thNORDUnet conference
GARUDA Components Application (PoC) • Disaster Management • Bioinformatics • Climate modeling Management & Monitoring • Paryavekshanam Access Methods • Access Portal • Problem Solving Environments Data Management • Storage Resource Broker Development Environment • DIViA for Grid • GridIDE Resource Mgmt & Scheduling • Moab from Cluster Resources • Load Leveler, Torque • Globus 2.x Resources • Compute, Data Storage • Scientific Instruments • Softwares 24thNORDUnet conference
GARUDA Network Fabric Features • Ethernet based High BW capacity of Layer 2/3 MPLS VPN • Scalable over entire geographic area • High levels of reliability • Fault tolerance and redundancy • High security • Effective Network Management 24thNORDUnet conference
C-DAC Centers are contributing computing resources at: Bangalore , Pune, Chennai, and Hyderabad GARUDA Resources • HPC systems from partner sites. • Total processor > 600 • Aggregated compute power = 3.5 TFlops • Satellite terminals from SAC Ahmedabad • Grid Labs at Bangalore, Pune, Hyderabad 24thNORDUnet conference
GARUDA Resources conti.. 24thNORDUnet conference
Institute of Plasma Research, Ahmedabad Physical Research Laboratory, Ahmedabad Space Applications Centre, Ahmedabad Harish Chandra Research Institute, Allahabad Motilal Nehru National Institute of Technology, Allahabad Raman Research Institute, Bangalore National Center for Biological Sciences Indian Institute of Astrophysics, Bangalore Indian Institute of Science, Bangalore Institute of Microbial Technology, Chandigarh Punjab Engineering College, Chandigarh Madras Institute of Technology, Chennai Indian Institute of Technology, Chennai Institute of Mathematical Sciences, Chennai ERNET, Delhi Indian Institute of Technology, Delhi Jawaharlal Nehru University, Delhi Institute for Genomics and Integrative Biology, Delhi Indian Institute of Technology, Guwahati Guwahati University, Guwahati GARUDA Partners 24thNORDUnet conference
GARUDA Partners conti.. • University of Hyderabad, Hyderabad • Centre for DNA Fingerprinting and Diagnostics, Hyderabad • Jawaharlal Nehru Technological University, Hyderabad • Indian Institute of Technology, Kanpur • Indian Institute of Technology, Kharagpur • Saha Institute of Nuclear Physics, Kolkatta • Central Drug Research Institute, Lucknow • Sanjay Gandhi Post Graduate Institute of Medical Sciences, Lucknow • Bhabha Atomic Research Centre, Mumbai • Indian Institute of Technology, Mumbai • Tata Institute of Fundamental Research, Mumbai • IUCCA, Pune • National Centre for Radio Astrophysics, Pune • National Chemical Laboratory, Pune • Pune University, Pune • Indian Institute of Technology, Roorkee • Regional Cancer Centre, Thiruvananthapuram • Vikram Sarabhai Space Centre, Thiruvananthapuram • Institute of Technology, Banaras Hindu University, Varanasi 24thNORDUnet conference
Detect, record, and report faults and service degradations Ensure GARUDA operates optimally Check Status availability & usage of grid resources Monitoring data repository for developers and Admin for Troubleshooting, Scheduling, Performance tuning and Analysis. GARUDA Grid Monitoring- Purpose 24thNORDUnet conference
Needed a simple and easy to use tool Able to handle different users perspective Information should be readily available Should have more graphical views Should produce relevant and accurate timely data Diagnose the problems of GARUDA Environment Monitoring Requirements: GARUDA 24thNORDUnet conference
GARUDA is monitored by PARYAVEKSHANAM PARYAVEKSHANAM in Sanskrit means “Supervision” PARYAVEKSHANAM is a web-based user-friendly grid monitoring tool to monitor GARUDA Grid’s health to enhance the reliability, usability and manageability. PARYAVEKSHANAM is scalable and can be deployed on platforms like AIX, Linux and solaris. It assists users in resource allocation/selection through various GARUDA tools like G-IDE. Paryavekshanam: Monitoring Tool 24thNORDUnet conference
Computing nodes Network Grid middleware Submitted jobs Software Storage and Storage Resource Broker Scientific Instruments Components Monitored by Parya.. 24thNORDUnet conference
Client server architecture with pull model having a centralized server Resource - everything connected to grid Headnode – is the contact node of clusters Four components: Information generator Information Receiver Information Repository Paryavekshanam Visualizer Paryavekshanam Architecture 24thNORDUnet conference
Paryavekshanam Architecture 24thNORDUnet conference
Information Generator Daemon resides on cluster Headnodes Collects the cluster details and creates the data collection. Data collection is processed using the MDS schema and populated into Globus MDS Paryavekshanam Architecture (Conti..) • Information Receiver • Daemon that resides on the monitoring server. • requests Information Generator to produce the Data collection and fetches it from Globus MDS • Information Repository • The data collection obtained from Globus MDS is processed and stored in the Information Repository. • It resides on the monitoring server • It has mirror repository for providing the fault tolerance • Paryavekshanam Visualizer • User friendly Graphical User Interface • It retrieves data from Information Repository and displays through well-structured graphs and tables • Visualizer helps in diagnosing the problem areas. 24thNORDUnet conference
Hierarchical drill down of information Bird’s eye view of Grid Health through Radar Graph Dashboard providing the top level view Status bar for quick and action oriented insights Alerts generation through emails Easy Interface for New site addition Multiple Views: Grid, Nodes, GOC and Network views Visualization of data in tabular and graphical format ‘Data Gallery’ for analysis of historical data Search facility for resources, software stack and jobs Separate resolution for GOC monitoring Paryavekshanam Features 24thNORDUnet conference
Dashboard of Paryavekshanam GARUDA Connected cities on India Map Status Bar Grid Strength Bird’s eye view of Grid Health through Radar Graph 24thNORDUnet conference
Radar Graph Compare performance of different entities on axes starting from same point Easy inference of utilization of quantitative parameters Uniform utilization of various parameters can be inferred from the radar graphs. Provides the glimpse of deviation from Ideal scenario. Dashboard of Paryavekshanam Conti.. Grid Strength • Defines health of grid and mathematically derived from radar graphs parameters • It is % representation on the dashboard • Colored bullets for representing different values of grid strength Globus Strength : Monitoring Globus Strength based on empirical formula. Status Bargives the instantaneous up/down status can be drilled down further. 24thNORDUnet conference
Paryavekshanam captures errors generated in the grid such as failures of link, cluster, node, grid middleware and jobs through AlNotis Provides more visibility into the health of the system Any failure or breakdown of resources needs to be captured and notified Alert & Notification system:AlNotis • Necessary for corrective actions • Whenever any error occurs, generates Error emails • Sends Warning emails when utilization crosses threshold level • Well-defined Escalation procedure • Unattended errors after 48 hrs is sent to grid admins 24thNORDUnet conference
Error Message Description Alert & Notification system conti.. Warning Message Description 24thNORDUnet conference
AlNotis tabulationshowing the error id, date & time the error generated, effected resources and time taken to close the ticket. Alert & Notification system conti.. Alert error messages generated during the last 6 months. 24thNORDUnet conference
Grid Operation Center (GOC) help Desk built for GARUDA monitoring with State of art Wall Display GOC is responsible for monitoring of the Grid Infrastructure as a whole. GOC Desk : Parya.. • GOC operates in four regional areas and centrally reporting to the GOC at Bangalore • Apart from monitoring through Paryavekshanam it coordinates it activities through video conferencing 24thNORDUnet conference
GOC Desk Page • GOC Desk page mainly used daily monitoring • Provides overall performance of parameters like BW utilization etc for 24 hrs • Each graph is a hyperlinked to details of that parameter for the respective grid center. • Additional table for reading accurate value on graphs. 24thNORDUnet conference
GOC Desk Page conti.. 24thNORDUnet conference
It summarizes the performance of the entire grid for users. Provides information of all the parameters for all the centers in a tabular format It can be drilled down to fetch center resource details as Node level Summary It monitors the middleware components that provide detailed status summary for error resolving. It lists all the software available on the clusters. Helps in knowing which components of Globus are up. Grid Overview Page: Parya.. 24thNORDUnet conference
Grid Overview Page: Parya.. 24thNORDUnet conference
Nodes view & Globus component status GSIFTP service is not available 24thNORDUnet conference
Software packages installed at headnodes 24thNORDUnet conference
Routers and switches are monitored Displays the bw avail, bw used, pkt loss, RTT and link status The report generation facility helps in maintaining the SLA of RTT, Pkt loss, Circuit uptime on monthly basis Monitors the operation of network on 24x7x365 basis Network Info Page: Parya.. 24thNORDUnet conference
Network Info Page: Parya.. 24thNORDUnet conference
Status of Storage Resource Broker is checked Space availability of storage servers Report generation in word and excel format SRB Server status check 24thNORDUnet conference
It archives data for reviewing the performance of the Grid in past Can view previous data both in tabular and graphical format Generates report for the duration selected. Data Gallery Page: Parya.. 24thNORDUnet conference
Resource and software search is provided for user Resources can be searched based on os, memory, cpu speed etc Softwares can be searched on categories like debugger, libraries etc. Search Page: Parya.. 24thNORDUnet conference
Paryavekshanam tracks the progress of submitted jobs Shows the current status based on jobid Report of jobs based on users, status, job id, duration and running at clusters are available Job search : Parya.. 24thNORDUnet conference
GARUDA Resource usage • Resources are extensively used • More than 100 registered users • >600 cpus across 14 sites • 65 TB data transferred on 2.43 GB backbone 24thNORDUnet conference
Paryavekshanam adds the new sites and resources through simple interface Managed by access control Modification and deletion of sites supported Admin Page: Parya.. 24thNORDUnet conference
Successfully monitoring GARUDA from last 2 years Dashboard has been a very useful feature aggregating lots of information AlNotis system accelerates the speed of problem rectification Paryavekshanam overall improves the usability of GARUDA Conclusion 24thNORDUnet conference
Thank Q NORDUnet conference
Each distinct value is indicative of the Globus status. It is having a value of 29 - summing up the individual distinct weights as shown below: Major 4 pillars of globus Security – 10 Job Submission – 8 Data Management – 7 Information Services – 4 --------------- 29 E.g. : Globus strength = 21 Result : Security, data mgmt, info services are up and Job submission is not possible. Globus Strength 24thNORDUnet conference
The value 22 shows that Data Mgmt service is down 24thNORDUnet conference
GSIFTP service is not available 24thNORDUnet conference