100 likes | 255 Views
GDB, 13/02/2013. Operation Coordination and Commissioning. Andrea Sciabà On behalf of the Operations Coordination Team. Outline. News Other task force updates perfSONAR deployment SL6 migration Conclusions. News.
E N D
GDB, 13/02/2013 Operation Coordination and Commissioning Andrea Sciabà On behalf of the Operations Coordination Team
Outline • News • Other task force updates • perfSONAR deployment • SL6 migration • Conclusions
News • PepeFlix (PIC) and Alessandra Forti (Manchester) will join the team as co-chairs of the fortnightly Operations Coordination team • Agenda modified to make participation from Tier-1 sites easier • Improving communication with Tier-2 sites is a priority • Identify volunteers to act as contact people for regions • Get more site people to coordinate task forces • During LS1, the daily meetings will reduce in frequency as of 4th March 2013 (Mondays, Thursdays) • Weekly activities reviewed on Thursday • No upgrades on Friday! • Friday and weekend reviewed on Monday • New topics for future task forces identified • Data placement • optimisation of storage resources based on data popularity information • Data access • Transparent access to remote and shared Tier-1 storage, WNs on the OPN • Operations of Common Analysis Framework • Consolidation of pilot technologies • Cloud infrastructure testing with experiment workflows • Leveraging IT expertise on Agile infrastructure
Other task force updates (1/3) • Middleware deployment • EMI-2 tarball (GridPP) available with latest lcg_util updates fixing a timeout issue critical for ATLAS • Still residual problems in Taiwan under investigation • EMI-2 UI now recommended (with caveat of a bug in gLite WMS submission affecting about 2% of jobs) • WLCG VOBOX basically ready and tested by ALICE • Just waiting for a WLCG repository courtesy of EGI • CVMFS (see status) • 47 (of the 96 targeted by the TF) sites deployed it, 23 ATLAS and LHCb sites more by April 30 • Only 6 sites never replied • egee.irb.hr, egee.srce.hr, INFN-TORINO, INSU01-PARIS, NCP-LCG2, ru-Moscow-SINP-LCG2
Other task force updates (2/3) • Squid monitoring • Task force concluded its activity with some agreements • Register all Squid servers in GOCDB/OIM • Propose specific SAM tests based on MRTG monitoring information and possibly on hits to Frontier/CVMFS • gLExec • Already used in production at several CMS EGI sites (1 T1, 7 T2s) • More to come by early March • LHCb needs testing gLExec support (already in the code) • Aim at having some milestones defined in two weeks • SHA-2 • New CERN CA soon available
Other task force updates (3/3) • FTS 3 • Several new features demonstrated • MySQL backend, new monitoring page, user and SE blacklists, etc. • xrootd 3rd party transfers, explicit file staging (for LHCb) • Stress tests ongoing and ramping up after winter conferences, then a proposal for a deployment schedule will be written • Tracking tools • A meeting with EGI, OSG and WLCG evaluated new authentication methods not based on certificates for GGUS, GOCDB and other Grid services • The conclusion was to develop a registration method based on vetting by trusted IDentity Providers (IDPs) • Agenda and minutes available
perfSONAR deployment TF • 60% of the sites installed • Very good participation from US, IT, UK • Now also ES, DE, FR on board • Still missing a contact person for part of the Asian sites • In many cases, perfSONAR is not yet treated as production service • E.g. “degradation” of the OPN perfSONAR infrastructure • For sites and experiments to use perfSONAR as the main network monitoring framework, we need to improve its reliability • SAM tests, GGUS tickets, proactivity • Manpower in the TF very limited and very best effort • Operations will require a team (either central or distributed) • Curing network issues addressed through perfSONAR is not in the mandate of the TF
SL6 migration discussion • Discussed at the WLCG Operations Coordination meeting (7 February 2013) • Experiments validated production and analysis workflows on the grid on SL(C)6 WNs (on selected T1s/T2s; still wishing for a more complete validation setup in WLCG which includes CERN) • If a site would like to move to SL6 there is green light • T1s would be the exception, to be discussed (ATLAS still validating group production on SL6) • There is no push from experiments to sites to move to SL6 • LXPLUS should move only part of the resources on SL6, proportionally to the amount of resources on SL6 on the grid and on LXBATCH • Many experiments use LXPLUS to compile code to be run on the grid • The alias is irrelevant, as long as any change is communicated well in advance • Plan proposed by PES: https://twiki.cern.ch/twiki/bin/view/PESgroup/LxplusMigrationToSLC6 • Experiments assume they will be able to obtain SL5 VOBOXES (VMs and real hardware) for many months to go • Operations Coordination proposes that a task force is created with participation from sites and experiments to coordinate the SL6 migration • Alessandra will coordinate it • target date for migration: September 2013
Conclusions • Significant progress reported by several task forces • One is concluded • SL6 migration will be a hot topic in the next months • A dedicated TF has been proposed • Manpower issues to be addressed in tracking and troubleshooting network problems seen by perfSONAR and maintaining the pS infrastructure • Push for a stronger active participation of Tier-2 people in the WG activities • A number of new activities to follow up during LS1 discussed recently (pre-GDB, fortnightly meetings) • Need help and involvement from sites
Links • WLCG Operations coordination twiki • https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsCoordination • Mailing list • wlcg-ops-coord@cern.ch