1 / 6

WLCG perfSONAR-PS Deployment Task-Force: Final Report

WLCG perfSONAR-PS Deployment Task-Force: Final Report. Shawn McKee / Simone Campana (on behalf of WLCG perfSONAR deployment TF ) 17 /04/2014. Summary of perfSONAR Deployment.

chinara
Download Presentation

WLCG perfSONAR-PS Deployment Task-Force: Final Report

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WLCG perfSONAR-PS Deployment Task-Force: Final Report Shawn McKee / Simone Campana (on behalf of WLCG perfSONAR deployment TF) 17/04/2014

  2. Summary of perfSONAR Deployment • All WLCG sites should have installed/upgraded perfSONAR following instructions at https://twiki.cern.ch/twiki/bin/view/LCG/PerfsonarDeployment • Baseline release is 3.3.2. • Task-force deadline wasApril 1, 2014 • We have 205 hosts running and in the mesh • There are 8 sites not installed • We have 64 sites not at the current version • Versions prior to 3.3 unable to use the Mesh-config WLCG perfSONAR-PS TF

  3. Still Missing 8 Sites… • There are 8 sites not installed yet: • BelGrid-UCL: asked for SLC6 installation, pointed to https://code.google.com/p/perfsonar-ps/wiki/Level1and2Install • GR-07-UOI-HEPLAB: no hardware, on hold. • GoeGrid: no reply, 4 reminders • ICM: "We do not have free resources to deploy perfSonar", ticket closed. • MPPMU: procuring hardware • RO-11-NIPNE: site under upgrade on 09/01/2014, no news since then (2 reminders) • T2_Estonia: under installation/configuration • TECHNION-HEP: first reply yesterday (3 reminders). • USCMS-FNAL-WC1: installed and configured (since long time), not publishing in OIM • Reported at last WLCG Ops Coordination meeting https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes140403#perfSONAR_deployment_TF • Good that we have so few missing and most should eventually be deployed WLCG perfSONAR-PS TF

  4. Monitoring Status March 6 April 16 Have OMD monitoring the perfSONAR-PS instances https://maddash.aglt2.org/WLCGperfSONAR/omd/ These services should migrate to OSG over the next month. This monitoring should be useful for any future work to find/fix problems. • MaDDash instance at http://maddash.aglt2.org/maddash-webui • Has shown we still have some issues: Too much “orange” meaning data is either not be taken (configuration or firewall) or access to results are blocked WLCG perfSONAR-PS TF

  5. Task-Force Lessons Learned • Installing a service at every site is one thing, but commissioning a NxN system of links is squared the effort. • This is why we have perfSONAR-PS installed but not all links are monitored. • perfSONAR is a “special” service • It tests a multi-domain network path, involving a service at the source and at the destination • It requires dedicated hardware and comes in a bundle with the OS. • We understand this creates complications to some fabric infrastructure. An RPM bundle was provided to help those sites, we encouraged sites also to share configuration experience • We had many releases of perfSONAR during the deployment process, each coming with new features or bug-fixes we requested. • Some sites did install perfSONAR but they are at old releases with many missing functionalities. • The change of OS version (v3.2 -> v3.3) was a major reason for the inertia of some sites. • The modular dashboard effort was dropped suddenly • This was a major disruption and quite some effort had to be diverged to cope with that. • At the same time, the MaDDash monitoring we have now looks much better to monitor the status of the installations. • We still have issues with firewalls. There are 2 kid of firewalls to be considered: • For the hosts to be able to run the tests among themselves • For the hosts to be able to expose information to the monitoring tools. • Many sites get the first one right but not the second ones. WLCG perfSONAR-PS TF

  6. Important Remaining Issues • Get sites running older versions to upgrade • Verify we consistently get the needed metrics • Involve cloud/VO leads in debugging/fixing issues • Fix Firewalls: still a problem for many sites • Test coverage and parameters • Should we have more VO-specific meshes/tests? e.g., WLCG->WLCG-ATLAS, WLCG-CMS? • What frequency of testing for traceroute, BW? • Better Docs: How-tos, Debugging “orange” • Some future task-force or working group should address these points WLCG perfSONAR-PS TF

More Related