250 likes | 264 Views
This paper discusses the design and architecture of a configuration monitoring tool for the CMS computing system, which is used in large-scale distributed computing environments. The tool tracks and queries site configuration information, allowing users to generate and submit jobs, access available services, and monitor service status. The tool utilizes existing tools such as the Globus Toolkit, Tomcat servlet container, and a relational database server for storage.
E N D
Configuration Monitoring Tool for Large Scale Distributed Computing Y. Wu1, G. Graham1, X Lu2, A. Afaq1, B.J. Kim3 and I. Fisk1 1. Fermi National Accelerator Laboratory 2. University of Iowa 3. University of Florida Yujun Wu, ACAT03
Outline • Introduction to the CMS computing • Why a configuration monitoring tool • Design consideration and approach • Configuration monitoring tool architecture and components • Current status of the configuration monitoring tool • Future development plan and summary Yujun Wu, ACAT03
Introduction to the CMS computing • CMS (Compact Muon Solenoid) experiment, which will run at the Large Hadron Collider (LHC), is expected to have the following features in its computing: - Will have petabytes of data; - Need very large scale distributed computing systems to analyze the data; - Grid computing will likely be used to achieve much of its offline data analysis needs; - The computing systems utilized in the CMS data analysis will be heterogeneous and dynamic; Yujun Wu, ACAT03
CERN Computer Center MSS MSS MSS MSS MSS Tier2 Center Tier2 Center Tier2 Center Tier2 Center Tier2 Center CMS Data Grid Hierarchy 1 TIPS = 25,000 SpecInt95 PC (2000) = 20 SpecInt95 ~PBytes/sec Online System ~100 MBytes/sec Tier 0+1 Bunch crossing per 25 nsecs.100 triggers per secondEvent is ~1 MByte in size 0.6-2.5 Gbits/sec or Air Freight Tier 1 FNAL Regional Center France Regional Center UK Regional Center Italy Regional Center ~2.4 Gbits/sec Tier 2 ~622 Mbits/sec Tier 3 Physicists work on analysis “channels”. Each institute has ~10 physicists working on one or more channels Data for these channels should be cached by the institute server Institute Institute Institute Institute Physics data cache 100 - 1000 Mbits/sec Tier 4 Yujun Wu, ACAT03
Why A Configuration Monitoring Tool? • To meet the CMS distributed computing challenges, we find we need to have a monitoring system to track and query site configuration information for large-scale distributed CMS applications; • A few selected use cases: - Job generators, e.g. MOP, need to know a list of configurations on a computer resource (e.g., CMS software location, scratch area, etc.) for generating and submitting jobs; - A general user need to know what kind of services are available within an organization (e.g., USCMS) and their corresponding configurations, e.g., gatekeeper port number and available job managers; • - Users also want to know the services status: critical services need to be available even before job submission; Yujun Wu, ACAT03
Design Consideration and Approach • The goal of a configuration monitoring system is to fit the needs of CMS production and user analysis across the US CMS resources; • The following features are desirable (based on a user survey): - The information in the configuration monitoring system should be highly available; - The history configuration information should be archived and retrievable; - The configuration information should only be available for authorized users and/or groups; • Utilize as much existing tools as possible; Yujun Wu, ACAT03
Design Consideration and Approach (2) • Globus Toolkit and Tomcat servlet container are chosen as the building blocks for the configuration monitoring tool; • A relational database server is used to store the configuration information. This has the advantage to log the info for future queries; • The Grid Security Infrastruction (GSI), together with the EDG Java Security package, is used for secure authentication and transparent access to the configuration information across the USCMS grid; Yujun Wu, ACAT03
Design Consideration and Approach (3) • A layered structure is used to develop the whole system. It has the advantage to replace a layer without interfering other layers. Tentatively, the system is divided into the following layers: * Site info provider layer - The module in this layer is distributed at each computing resource. It collects and publishes resource configuration info. * Configuration Database Server layer - It tracks the hosts and services to be monitored, and stores all the collected configuration info. * Tomcat service layer - Through Tomcat, a user can view the info through a web browser and/or query the info in the database through web service; * User Interface - They are here for the convenience of users; Yujun Wu, ACAT03
A Protype Architecture Tomcat Server query VOMS query Configuration Database Server (MySQL) Site Info Provider Site Info Provider Site Info Provider Yujun Wu, ACAT03
Site Information Provider Layer • This layer is responsible for collecting and publishing site configuration information at each resource. It accomplishes the task through Globus MDS with our own information provider and the standard GLUE schema (Grid Laboratory Uniform Environment); • The information provider can publish the information from the following source: - Configuration information in a text file; - Output from a user command ; - Special scripts can also be written as plug-ins for other configuration generations; • The published resource configuration info in MDS can be queried directly using standard Globus commands or through a set of client scripts provided by the configuration monitoring tool. Yujun Wu, ACAT03
Configuration Database Server Layer • The database server layer consists of a relational database server and cron job scripts to track and update the information in the database. It is the core component of the whole configuration monitoring architecture: - Provides a mechanism on controlling hosts and services to be monitored; - Tracks the availability of the services within a Virtual Organization (VO) --- some services are supposed to be available all the time; - Archives the collected configuration information for later use • Currently, we are using MySQL as the relational database server. It is an open source product. It can fit our current need when the number of hosts and services to be monitored is relatively small. Yujun Wu, ACAT03
Configuration Database Server Layer (2) • The configuration information in the database are collected through site information providers and get updated at a scheduled interval using the cron job scripts; • The old configuration information are archived and only updated in the database when there is a change in a resource configuration. In another word: No change in information, no update! Yujun Wu, ACAT03
Tomcat Service Layer • Tomcat plays an important role in our configuration monitoring system. • Tomcat servlet technology is used to provide a web interface for users to accomplish the following tasks: - Browse the available hosts; - Browse the available services, and its configurations; - Make a specific query on the host and/or service; • And the same technology is used for authorizing a person to perform the administration tasks securely: - Update the resources/services to be monitored; - Reset the availability of services; Yujun Wu, ACAT03
Tomcat Service Layer (2) • In the future, we plan to provide web service through Tomcat for both users and administrators: - Users may query the information in the configuration database through command-line scripts. This will include the available resources, services, and their configuration info in the central database (or databases). Still, if a user wants the newest information, he/she has to retrieve those information directly from a local info provider. - Administrators can update their site information through the web service mechanism, e.g., when a service must be shut down immediately. Yujun Wu, ACAT03
Web Interface Screenshot (1) Yujun Wu, ACAT03
Web Interface Screenshot (2) Yujun Wu, ACAT03
Web Interface Screenshot (3) Yujun Wu, ACAT03
Security features of the Configuration Monitoring System • Keeping the configuration information only accessible by authorized users is always one of our top priorities; • As the site info provider is part of the Globus MDS, it has the same security mechanism as the standard Globus toolkit; • The web interface enforces strong authentication and authorization using the digital certificates. This requires a client web browser to be able to: - manage client certificates; - perform SSL mutual authentication; Yujun Wu, ACAT03
Security features of the Configuration Monitoring System (2) • On the server side, all the web pages and servlets are put behind an authorization servlet filter----currently, we are using a filter package developed by EDG. • The authorization filter examines every incoming request and tries to extract the client certificate from the request. It then passes the extracted client DN to an authorization manager for verification. If the authorization manager can verify the client DN, it gives permission for the user to view the web info; otherwise, it just termites the request and informs the user “authorisation failed”. • Currently, the authorization manager is configured to examine a standard grid-mapfile to see if a request user DN can be found in the grid-mapfile. • Furthermore, the user DN entries in the grid-mapfile is extracted from a VOMS (Virtual Organization Membership Service) server; Yujun Wu, ACAT03
Security features of the Configuration Monitoring System (3) User Request (DN, etc) Tomcat Authorization Manager Servlet filter Authorized? Grid-mapfile Configuration Info (.html, .jsp, servlets) VOMS Yujun Wu, ACAT03
The Current Status of Configuration Monitoring Tool • We have finished the initial development on major components of the configuration monitoring tool and tested it using USCMS grid resources; • The information provided by configuration monitoring tool has been used in the USCMS distributed Monte Carlo production---its first customer (detail next page); • Other applications, such as GridServ under development at University of Florida, also show interest in using the info published by the Configuration Monitoring tool. - More info on this can be found at: https://gdsuf.phys.ufl.edu:8443/gridmon/admin/gridserv/dpeclient Yujun Wu, ACAT03
The Current Status of Configuration Monitoring Tool (2) • MOP is a system for distributing CMS production jobs over the distributed grid environment. Currently, it is the main production system used in the USCMS grid testbeds. • In order to generate and submit MOP jobs, the MOP job submitter need to know a set of parameters at each remote site intended to run jobs: MOP_MAX_JOBS=100MOP_REMOTE_JOB_MANAGER_FOR_RUN=garlic.hep.wisc.edu:/jobmanagerMOP_REMOTE_JOB_MANAGER_FOR_STAGE_IN=garlic.hep.wisc.edu:/jobmanagerMOP_REMOTE_JOB_MANAGER_FOR_STAGE_OUT=garlic.hep.wisc.edu:/jobmanagerMOP_REMOTE_JOB_MANAGER_FOR_PUBLISH=garlic.hep.wisc.edu:/jobmanagerMOP_REMOTE_JOB_MANAGER_FOR_CLEANUP=garlic.hep.wisc.edu:/jobmanagerMOP_REMOTE_RUNTIME_AREA=/afs/hep.wisc.edu/grid3/shared-tmpMOP_EXPORT_DIR=/afs/hep.wisc.edu/grid3/shared-tmpMOP_REMOTE_VDT_LOCATION=/data/grid/GRID3/MOP_REMOTE_DAR_ROOT=/afs/hep.wisc.edu/grid3/app/uscms01 Yujun Wu, ACAT03
The Current Status of Configuration Monitoring Tool (3) • Before Using Configuration Monitoring Tool: - Remote system administrators had to mail this information to the person who generated MOP jobs. He/she would put these info into a configuration file. - It was a model very prone to failure: If there was a change in the site configuration, there is a potential of job failure even before submitting the jobs---sometime a system administrator forgot to mail this info; or a MOP user forgot to check the e-mail to modify the submitter side file. • After using the tool: - The site system administrator just need modify a local copy of the configuration file. The configuration monitoring tool will take care of the rest. Yujun Wu, ACAT03
Future development We think further developments are needed in the following areas: • Need to provide web services to query the info from the database and/or to update the info in the database through Tomcat; • More resource configuration information need to be collected from other monitoring tools, like MonaLisa, Ganglia, etc.; • Provide a web interface to view history data; • - They are now archived in the database with timestamp. We need to have an interface to view those info. Yujun Wu, ACAT03
Summary • A configuration monitoring tool has been developed on top of the Globus technology and web service to allow users/sites to publish the site configuration info, archive the collected info and query them; • The Grid Security Infrastructure, together with EDG Java Security packages, are used for secure authentication and transparent access to the configuration information across the USCMS grid; • The configuration monitoring tool has been installed on the USCMS Grid testbeds and tested in the USCMS grid production jobs; • Further improvements have been identified and will be available in the near future; Yujun Wu, ACAT03