230 likes | 375 Views
SA2: Networking Support Status Report. Xavier Jeannin Activity Manager CNRS EGEE-III First Review, 24-25 June, 2009. SA2 Overview. SA2 provides an interface with the network
E N D
SA2: Networking Support Status Report Xavier Jeannin Activity Manager CNRS EGEE-III First Review, 24-25 June, 2009
SA2 Overview SA2 provides an interface with the network • Operational interface that ensures the daily relations with the network infrastructures: ENOC, advanced network services tasks • Relational interface that ensures the “higher level” of interactions with the network providers Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009
SA2 Global view SA2 – EGEE-III TSA2.1 Running the ENOC TSA2.4 Management and general project tasks TSA2.2 Support for the ENOC Operational procedures (CNRS) TSA2.3 Overall Networking coordination WLCG Support (CNRS) IPv6 (GARR, CNRS) Operational tools and maintenance (RRC-KI, CNRS) IPv6 (GARR, CNRS) TT exchange standardization (GRNET) Monitoring(DFN) Advanced network services (GRNET) Troubleshooting(DFN) TNLC Site networking needs (RedIRIS) Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009
EGEE Network Sites Sites NRENs Sites NRENs Sites NRENs NRENs GGUS Support Units GÉANT2 Users EGEE Network Operation Centre CNRS A single point of contact between EGEE and the NREN ENOC Role of the ENOC Operated by DANTE Operated by NOC of RC1 Operated by NOC of NREN A Operated by NOC of NREN B Operated by NOC of RC2 RC 1 RC 2 NREN A GÉANT2 NREN B Grid site 1 Grid site 2 ENOC ensuring E2E connectivity for Grid sites on the whole path Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009
network connectivity assessment Tool Downcollectorhttps://ccenoc.in2p3.fr/DownCollector 55% of sites have lessthan a day of yearlyunscheduled network downtime 46 sites 80% of off-site network troubles are solved within 30 minutes 85% of sites < 4 daysof downtime/year = 98.90% reachability/year Assessment for year 2008 on EGEE certified Grid sites (~ 300) Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009
ENOC metrics Issue: very few Grid user notifications about network problems 19 NRENS 11 different languages 75 % of European certified sites covered ENOC webserver statistics Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009
WLCG SupportCNRS • SA2 has also taken the lead in designing and implementing a pioneering federated operational model for the LHCOPN • https://twiki.cern.ch/twiki/bin/view/LHCOPN/OperationalModel Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009
WLCG Support • Processes were documented and disseminated • Several meeting and training sessions help the dissemination • Related tools were released, including a GGUS helpdesk tailored for the LHCOPN • Implementation is ongoing and will be ready for LHC start-up Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009
Operational tools and maintenanceRRC-KI • Trouble matching and correlation for the ENOC • Correlate tickets with monitoring data • Better assessment of the impact on the grid of trouble tickets Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009
Operational tools and maintenance • First stage of our study • The results are experimental and should improve • Future work plan includes: • production version setting up • automatic ticket ranking based on matching results • tuning of matching algorithm, possibly through more extensive use of the topology knowledge Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009
Network monitoring tools DFN • Network monitoring tools for efficient troubleshooting • PerfSONAR-LiteTroubleShooting System • Launch test on demand from a Grid site under central server control: ping, traceroute, DNS lookup, nmap and bandwidth measurements • Based on PerfSONAR-PS(Perl version) Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009
Network monitoring tools • First beta-release is foreseen for June • Beta-tester: CNRS, NorduNET, GARR. • First version Autumn 2009 Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009
Sites networking needs RedIRIS • Assess network requirements (bandwidth, delay, jitter, etc.) for a site within the Grid, according to the kind of site and VOs supported • Empirical approach • Deployment of perfSONAR at country scale • RedIRIS provides a much more bigger effort for this task than EGEE funds for that • First deployment in Europe over several domains (4 domains, 9 sites) of such solution (no appliance box is used) • PerfSONAR is deployed into EGEE sites and into networks used. • Issue about interoperability between perfSONAR's versions • First deployment end of September Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009
Sites networking needs Topology of the network monitored by this task Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009
Technical Network Liaison Committee TNLC (Technical Network Liaison Committee): • Set up during EGEE in order to ease the technical discussions between EGEE, the NRENs and the GÉANT2 project • Participants: EGEE SA2, GÉANT2 (represented by DANTE as coordinator of GÉANT2), some of the NRENs involved in the EGEE activities, the NREN PC and CERN. • 2 meetings Work mainly focused on: • Improvement of trouble ticket contents • Improve the assessment of the impact of problems on the Grid. • Monitoring • Design a solution for the Grid infrastructure Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009
Advanced network services GRNET Collaboration with AMPS team - Advanced Multi-domain Provisioning System – in order to automate SLA establishment Development of web interface to manage the EGEE SLA requests • Store and manage the EGEE users’ SLA requests • ENOC will act on behalf of the user • The user request is stored into the ENOC • The ENOC validate it and will then use the AMPS system to make the reservation AutoBAHN has also been studied but seems not mature at the moment Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009
Trouble ticket exchange standardization GRNET • GRNET and the ENOC team provide the ENOC with a central server translating NREN’s ticket into standard ticket • Designed and implemented with the open source software • Ticket normalization is very important to improve efficiency of project’s wide network operations • Dissemination was also made through a submission of a RFC about the normalization of the trouble tickets (“The Network Trouble Ticket Data Model”, Internet Draft) http://tools.ietf.org/html/draft-dzis-nwg-nttdm-00 Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009
IPv6 GARR/CNRS • Analysis of the gLite source code • Using the IPv6 metric (IPv6 code checker) in ETICS • Around 110 bugs on non-compliant function calls in the code reported • This analysis effectively incited developers to work on IPv6 • IPv6 compliance of external gLite dependencies • A new IPv6 code checker developed by SA2 IPV6 CARE http://sourceforge.net/projects/ipv6-care • It monitors the execution of any programs - even if you don’t have the source code -and detects networking function calls and provides the diagnosis • Many informative studieshttps://twiki.cern.ch/twiki/bin/view/EGEE/IPv6FollowUp • IPv6 programming method C/C++, Java, Python and Perl / IPv6 testing method • gSOAP/ Axis / Axis2 / Boost:asio / gridFTP/ PythonZSI / PerlSOAPLite • Assessment of the IPv6 compliance of gLite components: DPM & LFC • Dissemination: meetings, training session, demonstration, video Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009 18
Current stand on gLite and IPv6 IPv6 compliance Full IPv6 compliance – for the production version LFC DPM globus-url-copy/gridFTP Full IPv6 compliance – for a prototype version BDII(perl) IPv6 compliance to be tested/verified by SA2 – gLite part of the deployment module claimed to be IPv6 compliant WMproxy/Job submission CREAM BDII(python) blah IPv6 porting currently on-going gfal lcgutils VOMS WMS-server IPv6 porting plan exist FTS No porting plan yet (we are not aware of) PX VObox MON dCache Torque C/S MPIutils Condorutils AMGA Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009
Middleware Work plan and evolution VOMS Server BLAH RGMA GFAL LCG util VOMS Client and APIs WMS / WMproxy Job Management FTS Feb 09 Apr 09 Jun 09 Aug 09 Oct 09 Dec 09 Feb 10 Apr 10 • Workplan of JRA1 around IPv6 • Evolution obtained on the gLite repository of ETICS Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009
UREC/PARIS GARR/ROME VOMS Server Worker Node (Torque/PBS) VOMS .236 :d LB server VOMS2 .59 LB LB LCG Computing Element LB server WN1 :a CE LCG Computing Element WN1 .50 .233 CE Workload management server Workload management server WMS WMS .227 UI2 :4 .226 .23 Worker Node (Torque/PBS) :3 WN2 .21 CREAM WN2 .22 .232 User Interface :9 BDII DEV .228 :5 .56 PX CREAM Computing Element .51 .27 .34 .234 BDII :b MyProxy server SA2 top level BD-II Grid Job monitoring DB DPM1 SE .24 UREC site BD-II .30 .229 .231 :6 RGMA-BDII :8 Storage Element DPM Storage Element GARR site BD-II UI LFC .29 .29 LFC .235 .230 LFC File Catalog .11 :c Gateway IPv6 LFC File Catalog :7 Gateway User Interface 2001:760::159:242/64 IPv4/IPv6 Internet: Renater/GEANT/GARR 2001:660:3302:7006::1 Integration of IPv6 within EGEE Demonstration of the 2 first dual stack IPv4/IPV6 sites of EGEE at User Forum 09 Grid job over IPv6 A distributed testbed Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009 21 21
Issues & future plans ENOC • Issue: lack of monitoring data and troubleshooting tools deployed in the end sites and available for the ENOC • Deployment of PerSONAR-LiteTroubleShooting System • SA2 is providing an extra effort to design network monitoring with NREN support • Spanish EGEE sites network monitoring • Issue: impact assessment of trouble ticket • Collaboration with NRENs • Transition toward EGI-NGI Second step of trouble matching and correlation work Improvement of trouble ticket standardization software Advanced network services • Make the SLA installation procedure more automatic IPv6 • Integration into EGEE validation process • Testing new gLite IPv6 modules Issue: network activity understaffed within the EGI-NGI project Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009
Main achievements ENOC running https://ccenoc.in2p3.fr/ • First version of the trouble tickets model has been implemented WLCG / LHCOPN • Design the OPN operational model https://twiki.cern.ch/twiki/bin/view/LHCOPN/OperationalModel Monitoring • Beta-version of PerfSONAR-Lite TSS IPv6 https://twiki.cern.ch/twiki/bin/view/EGEE/IPv6FollowUp • IPv6 care, informative reports • Status and work plan of the middleware • Demonstration of the 2 first dual stack IPv4/IPV6 sites of EGEE Trouble ticket exchange standardization • Submission of a RFC, the normalization of the trouble tickets (“The Network Trouble Ticket Data Model”, Internet Draft) TNLC • EGEE 09 - Terena NRENs & Grid joint meeting, Barcelona Sept. 2009 Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009