60 likes | 216 Views
WLCG perfSONAR-PS Deployment Task-Force: Final Report. Shawn McKee / Simone Campana (on behalf of WLCG perfSONAR deployment TF ) 17 /04/2014. Summary of perfSONAR Deployment.
E N D
WLCG perfSONAR-PS Deployment Task-Force: Final Report Shawn McKee / Simone Campana (on behalf of WLCG perfSONAR deployment TF) 17/04/2014
Summary of perfSONAR Deployment • All WLCG sites should have installed/upgraded perfSONAR following instructions at https://twiki.cern.ch/twiki/bin/view/LCG/PerfsonarDeployment • Baseline release is 3.3.2. • Task-force deadline wasApril 1, 2014 • We have 205 hosts running and in the mesh • There are 8 sites not installed • We have 64 sites not at the current version • Versions prior to 3.3 unable to use the Mesh-config WLCG perfSONAR-PS TF
Still Missing 8 Sites… • There are 8 sites not installed yet: • BelGrid-UCL: asked for SLC6 installation, pointed to https://code.google.com/p/perfsonar-ps/wiki/Level1and2Install • GR-07-UOI-HEPLAB: no hardware, on hold. • GoeGrid: no reply, 4 reminders • ICM: "We do not have free resources to deploy perfSonar", ticket closed. • MPPMU: procuring hardware • RO-11-NIPNE: site under upgrade on 09/01/2014, no news since then (2 reminders) • T2_Estonia: under installation/configuration • TECHNION-HEP: first reply yesterday (3 reminders). • USCMS-FNAL-WC1: installed and configured (since long time), not publishing in OIM • Reported at last WLCG Ops Coordination meeting https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes140403#perfSONAR_deployment_TF • Good that we have so few missing and most should eventually be deployed WLCG perfSONAR-PS TF
Monitoring Status March 6 April 16 Have OMD monitoring the perfSONAR-PS instances https://maddash.aglt2.org/WLCGperfSONAR/omd/ These services should migrate to OSG over the next month. This monitoring should be useful for any future work to find/fix problems. • MaDDash instance at http://maddash.aglt2.org/maddash-webui • Has shown we still have some issues: Too much “orange” meaning data is either not be taken (configuration or firewall) or access to results are blocked WLCG perfSONAR-PS TF
Task-Force Lessons Learned • Installing a service at every site is one thing, but commissioning a NxN system of links is squared the effort. • This is why we have perfSONAR-PS installed but not all links are monitored. • perfSONAR is a “special” service • It tests a multi-domain network path, involving a service at the source and at the destination • It requires dedicated hardware and comes in a bundle with the OS. • We understand this creates complications to some fabric infrastructure. An RPM bundle was provided to help those sites, we encouraged sites also to share configuration experience • We had many releases of perfSONAR during the deployment process, each coming with new features or bug-fixes we requested. • Some sites did install perfSONAR but they are at old releases with many missing functionalities. • The change of OS version (v3.2 -> v3.3) was a major reason for the inertia of some sites. • The modular dashboard effort was dropped suddenly • This was a major disruption and quite some effort had to be diverged to cope with that. • At the same time, the MaDDash monitoring we have now looks much better to monitor the status of the installations. • We still have issues with firewalls. There are 2 kid of firewalls to be considered: • For the hosts to be able to run the tests among themselves • For the hosts to be able to expose information to the monitoring tools. • Many sites get the first one right but not the second ones. WLCG perfSONAR-PS TF
Important Remaining Issues • Get sites running older versions to upgrade • Verify we consistently get the needed metrics • Involve cloud/VO leads in debugging/fixing issues • Fix Firewalls: still a problem for many sites • Test coverage and parameters • Should we have more VO-specific meshes/tests? e.g., WLCG->WLCG-ATLAS, WLCG-CMS? • What frequency of testing for traceroute, BW? • Better Docs: How-tos, Debugging “orange” • Some future task-force or working group should address these points WLCG perfSONAR-PS TF