380 likes | 600 Views
Quattor in CMS (a CMS for CMS). J.A. Coarasa CERN, Geneva, Switzerland for the CMS TriDAS group. 11 th Quattor Workshop, 16-18 March 2011, CERN, Geneva. Outline. The environment: CMS the CMS Online Cluster. The Quattor installation Insfrastructure Anatomy of a profile
E N D
Quattor in CMS (a CMS for CMS) J.A. Coarasa CERN, Geneva, Switzerland for the CMS TriDAS group. 11th Quattor Workshop, 16-18 March 2011, CERN, Geneva
Outline • The environment: • CMS • the CMS Online Cluster. • The Quattor installation • Insfrastructure • Anatomy of a profile • The tools around Quattor (some examples) • The template summarizer • The software updater • The tools for the cluster • Summary J.A. Coarasa
CMS design parameters Detectors Detector Channels Ev. Data Pixel 60000000 50 (kB) Tracker 10000000 650 Preshower 145000 50 ECAL 85000 100 HCAL 14000 50 Muon DT 200000 10 Muon RPC 200000 5 Muon CSC 400000 90 J.A. Coarasa
Requirements and implications: General The IT infrastructure (computing and networking) of the CMS Online Cluster is responsible for the CMS data acquisition and experiment control. The requirements were: • Autonomous (i.e. independent from all other networks, CERN campus network included) uninterrupted operation 24/7 on two far apart (~200 m) physical locations, with one Control Room; ⇒All IT infrastructure and services must be local and redundant. ⇒Strict security is implied. • Remote control and monitoring of computers is necessary. • Fast configuration turnaround required due to evolving nature of applications during commissioning phase; • Scalable services design to accommodate future expansions; • Serving the needs of a community of more than 900 Users. ⇒Some level of user configuration is required/mandatory. J.A. Coarasa
Detector Front-end Level 1 Trigger Readout Systems Run Event Builder Networks Control Manager Builder and Filter Systems ComputingServices CMS DAQ challenge • Unprecedented Data Volumes • Reduction 1 in 100 000 online • Level 1 Maximum trigger rate 100 kHz ; event size 1 MByte • Event Building 1 Terabit/s ~700 sources, 2000 destinations • High Level Trigger (~offline SW): selectivity 1 in 1000 • Strategy: invest in commercial processing and network technologies (TP 1994) 40 MHz 100 kHz 100 Hz J.A. Coarasa
Requirements and implications: DAQ Readout, Event building, HLT, Storage • Data Network capable of reading from electronics at 100 kHz (~100 GBytes/s); ⇒Computers reading from electronics need myrinet networking. • Data Network and sufficient computing power to run the second (high) level trigger software to select a maximum of 2 GBytes/s from the 100 GBytes/s; ⇒At least 2500 cores (2.5 GHz). ⇒High bandwidth networking. • Enough local storage to operate for 2 days without connection to Tier 0; ⇒At least 300 TBytes of local storage. • Capacity of transferring at most 2 GBytes/s to “Central Data Center” (Tier 0); J.A. Coarasa
Constraints and implications • Limited man power (~5 FTE): ⇒Automatic procedures where possible. • Harsh operational conditions: • Unexpected cooling failures and power cuts; • UPS protects only the central servers; ⇒Automatic shutdown with rising temperature. • Computers connected to the electronics have to be swiftly replaced in case of failure (Other computers, like HLT, run fault tolerant software…); ⇒Need for spares in location. ⇒Fast turnaround in reinstallation and/or reconfiguration. J.A. Coarasa
The Solution: IT infrastructure. Computing High bandwidth networking • More than 2500 computers mostly running Scientific Linux CERN 4/5: • 640 to read out from the electronics, equipped with 2 Myrinet and 3 independent 1 Gbit Ethernet lines for data networking; • 720 with 8 CPU cores (5760) + 288 (being commissioned) with 12 CPU cores (3456) and as high level trigger computers with 2 Gbit Ethernet lines for data networking; • 16 with access to 300 TBytes of FC storage, 4 Gbit Ethernet lines for data networking and 2 additional ones for networking to Tier 0; • More than 350 used by subdetectors and to control electronics (includes 90 Windows and miscellaneous hardware); • 12 as an ORACLE RAC; • 15 as CMS control computers; • 50 as desktop computers in the control rooms; • 200 for commissioning and testing in a partly replicated setup; • 20 as infrastructure and access servers; • More than 200 active spare computers; J.A. Coarasa
The Solution: IT infrastructure. Networking CMS Networks Internet CERN Network Firewall Computer gateways Service Network Control… Readout, HLT Storage Manager Data Network CDR Network • 14 Myrinet switches • 9 Force 10 E1200 High concentration (up to 1260 ports) 1 Gbit Ethernet switches • ~100 1Gbit Ethernet switches • CMS Networks: • Public CERN Network (GPN); • Private Networks: • Service Networks; • Data Network • Souce routing on computers • VLANs on switches • Central Data Recording (CDR). Network to Tier 0. • NetApp Network Attached Storage filer J.A. Coarasa
Redundancy:The Network Attached Storage. NAS 2x10Gbit redundant 4Gbit • Our important data is hosted in a Network Attached Storage: • User home directories; • CMS data: calibration, data quality monitoring…; • Repositories and configuration management data; • Admin data. • 2 NetApp filer heads in failover configuration (in two racks); • With 3 mirrored storage drawers (6 in total) with internal Dual Parity RAID 6; • And the snapshot feature active (saves as from going to Backup). • Tested throughput > 380 MBytes/s. J.A. Coarasa
Redundancy and Load balancing for the CMS IT Structural Services. The Concept 1 master N slave/replicas • Pattern to provide redundancy: • 1 master + N slave/replicas (now N=3 for most of the services) hosted in different racks; ⇒Easy scalability. ⇒Needs replication for all services. • Services working under DNS alias where possible. ⇒Allows to move the service. ⇒No service outage. • Load balancing of primary server for client: • DNS Round Robin; • explicit client configuration segregating in groups of computers. J.A. Coarasa
Redundancy and Load balancing for the CMS IT Structural Services. The Concept 1 master N slave/replicas Primary servers Secondary servers Explicit client configuration segregating in groups of computers • Pattern to provide redundancy: • 1 master + N slave/replicas (now N=3 for most of the services) hosted in different racks; ⇒Easy scalability. ⇒Needs replication for all services. • Services working under DNS alias where possible. ⇒Allows to move the service. ⇒No service outage. • Load balancing of primary server for client: • DNS Round Robin; • explicit client configuration segregating in groups of computers. J.A. Coarasa
CMS Management and Configuration Infrastructure: The tools • IPMI (Intelligent Platform Management Interface) is used to manage the computers remotely: • reboot, console access,…; • PXE and anaconda kickstart through http are used as bootstrap installation method; • Quattor (QUattor is an Administration ToolkiT for Optimizing Resources) is used as the configuration management system; ⇒All Linux computers configured through it or rpms distributed with it (even the Quattor servers themselves): BIOS, all Networking parameters… J.A. Coarasa
CMS Quattor infrastructure I • Based on Quattor 1.3 • CDB (cdb 2.0.4, PANC 6.0.8) (16 Gbyte, 8 core) • cmscdb-01 + 2 standby: cmscdb-02 cmscdb-03 (active selected through a dns alias) • CDB data hosted on filer ⇒Nothing special on the active server. In few minutes the active can be a different one. • Repositories (swrep 2.1.38) • 6 computers DNS Round Robin load balanced • swrep.cms: CERN IT’s offline copy of SLC4/SLC5 • cmsswrep, with “zones”: • /cms_system • /cms_cdaq • /cms_rcms • /cms_cmssw • /cms_subdet • /cms_ecal • Repositories hosted on filer • At installation time the repository computers act as cache (the filer is not stressed) J.A. Coarasa
CMS Quattor infrastructure II - Allows automatic non-quattor knowledgeable persons to control their software in a reproducible way - Everything is logged. • It uses in-house: • restricted format in templates: “hierarchical”+other conventions; • areas for cdb and swrep to define subdetector software and versioning in them; ⇒Allowed in-house easy developments: • Template summarizer/“inventory maker” http://cmsdaq0.cern.ch/cmscdb • Dropbox for rpms • Template updater J.A. Coarasa
CMS Quattor performance • ~2500 computers managedin ~80 types • PANC compilation takes less than 4 min for 1050, 8 min for 2100… due to cdb.conf tunning: • 150 computers per process • 7 processes maximum configuration • 2.2 Gbytes maximum memory taken per process • Notification active, spreading time less than 10 min depending on _type_ • Limitation comes from the available bandwidth to the repository servers (currently 6 Gbit) and inability of spma to retry after http timeout ⇒Cluster-wide reconfiguration takes more than 10 min • Reinstallation takes 6-22 minutes per computer • Reinstallation of 1000 computers in ≲ 1 ¼ hour J.A. Coarasa
profile anatomy object template profile_bufu-c2a12-01; include pro_declaration_profile_base; include pro_hardware_DELLPowerEdge1950_4x2x2666_16; include netinfo_bufu-c2a12-01; include cms_DELLPowerEdge1950_pro_software_slc5_x86_64; include pro_type_cdaq_bufu_slc5_x86_64; J.A. Coarasa • Flat structure • Hardware definition • Network definition (not used) • Hardware specific software definitions • Computers belong to one type. No individual distinction!
_type_ anatomy template pro_type_cdaq_bufu_slc5_x86_64; "/system/cluster/name" = default( "cdaq_ruhardware_slc4_32" ); include cms_pro_software_slc5_x86_64; variable loadpath = list("cms_cdaq_slc5_64","cms_cmssw_slc5_64"); include cms_cdaq_boost_pro_software_slc5_x86_64; […] include cms_cdaq_filter_pro_software_slc4_32; include cms_cmssw_general_pro_software_slc4_32; include cms_pro_system; include cms_pro_kernel_version_slc5_x86_64; include cms_cmsnet_pro_system_slc5_x86_64; include cms_cdaq_pro_system; include cms_pro_system_acl; include cms_cdaq_ruhardware_pro_system_acl; include cms_autofs_hilton_pro_system; J.A. Coarasa • Flat Structure • Software configuration • System configuration: • krb, ldap, ntpd, sshd, autofs, grub, iptables, chkconfig… • acl, sudoers,
Workarounds for Quattor Limitations in our setup • Sometimes the components are buggy (specially in removal of configuration) • Sometimes they do not exist Examples of workarounds • Copy configuration files to individual computers (copyd) • Data networks configuration done through in-house development • rpms are used • To do some configuration (SLP…) • To create local users J.A. Coarasa
The Template Summarizer (Functionality) • Gives us an up to date customized overview of the cluster: J.A. Coarasa
The Template Summarizer (Functionality) • Gives us an up to date customized overview of the cluster: • Organized by _type_ J.A. Coarasa
The Template Summarizer (Functionality) • Gives us an up to date customized overview of the cluster: • Organized by _type_ • With details of the DAQ software version J.A. Coarasa
The Template Summarizer (Functionality) • Gives us an up to date customized overview of the cluster: • Organized by _type_ • With details of the DAQ software version • With information about the sudoers (clusters) J.A. Coarasa
The Template Summarizer (Functionality) • Gives us an up to date customized overview of the cluster: • Organized by _type_ • With details of the DAQ software version • With information about the sudoers (clusters) • Fast access to the template viewer • And to the nagios status of the computers J.A. Coarasa
The Software Updater Tools (Functionality) • They can update (up or downgrade) an existing RPM in a group of computers. • They can rollback the changes you made. • They let you check the status of the update in your computers. J.A. Coarasa
The Software Updater Tools (Limitations) • Only one person can run it at a time. You will be told who is running it. • You can not add an rpm that was never in your templates. For this, you still need to create a savannah request. • The configuration of Quattor has to be put to non-permissive. • You can still install rpms manually. J.A. Coarasa
The Software Updater Tools (Howto) • Log in into the cmsdropbox computer • Create a directory in your working directory with the name of your area • Copy your rpms into this directory • Run the software updater and follow the instructions • Wait a bit (currently ~10 min) and run the command to check the status J.A. Coarasa
What the Updater DoesFirst Step • Uploads the RPMs in this directory, to the Quattor software repository • First it performs checks and • Aborts if: • RPMs not named properly; • RPMs already in the repository and different; • 2 versions of the same RPM are given. • Warns if: • RPMs already in repository but equal. J.A. Coarasa
What the updater doesSecond Step • Edits the templates to substitute the RPMs you gave • It will warn if an RPM is not found in the templates • If allowed, starts the update on the computers and tells you how to rollback the change J.A. Coarasa
What the updater doesThird Step • Tells you how to get the status of the computers affected by the update. • You will have to wait the “distribution time of updates” (currently 10 minutes) J.A. Coarasa
An Example session (I) 1st Step 2nd Step [coarasa@srv-C2C04-15 ~]$ cp ~coarasa/TriDAS/sysadmin/RemoteDropboxAndUpdate/RemoteDropboxAndUpdate-1.0.0-1.noarch.rpm cms_general/ [coarasa@srv-C2C04-15 ~]$ /usr/local/bin/UpdateTemplatesWithRPMs.sh cms_general 25043 INFO: Creating /tmp/UpdateTemplatesWithRPMs.sh-2009-12-07_17:06:00-coarasa-Backup to backup profiles. Use these ones to go back on your change 25043 INFO: I found the following RPMs to update: cms_general/RemoteDropboxAndUpdate-1.0.0-1.noarch.rpm 25043 INFO: Is it correct? Are you sure you want to continue? [no] y 25043 INFO: 1 Files that will be uploaded: cms_general/RemoteDropboxAndUpdate-1.0.0-1.noarch.rpm 25043 INFO: 1 Successfuly uploaded files (deleted from incoming area): cms_general/RemoteDropboxAndUpdate-1.0.0-1.noarch.rpm 25043 INFO: Continuing to install the following RPMs in computers. 25043 INFO: List of RPMS: cms_general/RemoteDropboxAndUpdate-1.0.0-1.noarch.rpm 25043 INFO: Do you want to install them now? Are you sure you want to continue? [no] y 25043 INFO: Found the following files on area cms_general: cms_general/RemoteDropboxAndUpdate-1.0.0-1.noarch.rpm 25043 INFO: Modified the following RPMs versions: cms_general/RemoteDropboxAndUpdate-1.0.0-1.noarch.rpm 25043 INFO: I will modify the following templates: cms_general/cms_dropbox_pro_software_slc4_32.tpl 25043 INFO: Are you sure you want to continue? [no] y 25043 INFO: To rollback to the previous version use the command (one single line): cd /tmp/UpdateTemplatesWithRPMs.sh-2009-12-07_17:06:00-coarasa-Backup ; sudo /usr/local/bin/cdbop_batch.sh cms_general "update cms_general/cms_dropbox_pro_software_slc4_32.tpl ;commit " 25043 INFO: Modified the following templates: cms_general/cms_dropbox_pro_software_slc4_32.tpl 25043 INFO: Issue the following command to check the computers afected by the update. 25043 INFO: The command will have to wait around 1200s to Check the computers. TimeOfLastRunInUnixTime=1260202105 sudo /usr/local/bin/Check_spma_ForComputersOnTemplate.sh cms_general/cms_dropbox_pro_software_slc4_32.tpl J.A. Coarasa
An Example session (II) 3rd Step [coarasa@srv-C2C04-15 ~]$ TimeOfLastRunInUnixTime=1260202105 sudo /usr/local/bin/Check_spma_ForComputersOnTemplate.sh cms_general/cms_dropbox_pro_software_slc4_32.tpl 25904 INFO: Writing output in /tmp/Check_spma_ForComputersOnTemplate.sh2009-12-07_17:11:26/output.log 25904 INFO: Update did not finished yet everywhere. Sleeping 1136 s 25904 INFO: The update affected the following computers: srv-c2c03-30 srv-c2c04-15 srv-c2c04-30 srv-c2c05-30 srv-s2d16-30 25904 INFO: All computers seem to be ok. J.A. Coarasa
Checking in Detail 3rd Step [root@srv-C2C04-15 ~]# TimeOfLastRunInUnixTime=1259864892 /usr/local/bin/Check_spma_ForComputersOnTemplate.sh cms_cmssw/cms_cmssw_general_pro_software_slc4_32.tpl 926 INFO: Writing output in /tmp/Check_spma_ForComputersOnTemplate.sh2009-12-07_18:15:45/output.log 926 INFO: The update affected the following computers: bufu-c2a12-01 […] vmepcs2g19-16 926 INFO: The following computers could not be pinged to know the status: bufu-c2d16-27 bufu-c2e12-23 bufu-c2f13-20 fuval-c2a11-21 fuval-c2a11-08 fuval-c2a11-23 fuval-c2a11-11 fuval-c2a11-18 fuval-c2a11-10 fuval-c2a11-03 fuval-c2a11-05 fuval-c2a11-09 fuval-c2a11-19 fuval-c2a11-07 fuval-c2a11-16 fuval-c2a11-12 fuval-c2a11-24 fuval-c2a11-25 fuval-c2a11-17 fuval-c2a11-04 fuval-c2a11-06 fuval-c2a11-26 fuval-c2a11-22 fuval-c2a11-14 fuval-c2a11-27 fuval-c2a11-15 fuval-c2a11-30 fuval-c2a11-20 fuval-c2a11-29 fuval-c2a11-28 fuval-c2a11-13 ru-c2a05-14 ru-c2a08-12 ru-c2f03-10 srv-c2d05-18 vmepcs2b17-01 926 INFO: The following computers have an undefine state (Was the update applied?): dvbufu-c2f36-21 fuval-c2f12-19 ru-c2e06-08 926 INFO: The following computers did not catch the update now: dcspcs2g19-36 fuval-c2f12-02 ecal-laser-room-04 fuval-c2f12-05 fuval-c2f12-03 fuval-c2f12-01 fuval-c2f12-04 vmepcs2b17-33 vmepcs1d12-18 vmepcs2f17-14 srv-c2d17-04 vmepcs2b17-30 vmepcs2b18-05 vmepcs2b17-11 srv-s2f19-26 vmepcs2b17-08 srv-c2d17-17 srv-c2d17-16 vmepcs2b17-24 vmepcs2b17-06 vmepcs2f17-11 vmepcs2b17-05 vmepcs2b19-01 vmepcs1d12-17 srv-c2d17-15 srv-c2d05-01 […] vmepcs2g17-01 926 INFO: The following computers did not return from the ssh in time: ru-c2f02-14 ru-c2e03-13 ru-c2e05-01 ru-c2e02-08 ru-c2f01-05 ru-c2f07-16 ru-c2f06-06 srv-c2d04-24 vmepcs2b18-23 926 INFO: Check the file /tmp/Check_spma_ForComputersOnTemplate.sh2009-12-07_18:15:45/output.log for details or /var/log/Check_spma_ForComputersOnTemplate.sh.log for even more details J.A. Coarasa
The Tools for the cluster (Functionality) • Allow you to run commands simultaneously on computers including a template an get a summary • Shutdown • Reboot • Power on • General commands: CommandToExecuteRemotely="ls /tmp/finish_patch.log” \; MatchingRegExp='^(ru|bufu)’ \; QuattorTemplatesToCheck="pro_type_tracker_hardware_slc4_32" \; ExecuteCommandRemotelyInComputers.sh –all J.A. Coarasa
Conclusions Quattor has been and is a scalable CMS for the CMS online Cluster (now 2500 computers) The quality and robustness of the configuration components is not always what you would like You can always resort to rpms to do some of the configuration The use of tools may greatly improve the Quattor experience J.A. Coarasa
Thank you. J.A. Coarasa