1 / 24

LHCC Comprehensive Review November 2007

LHCC Comprehensive Review November 2007. LHCOPN Networking Status. David Foster Head, Network and Communications Systems Group CERN IT-CS. Information. All technical content is on the LHCOPN Twiki : http://lhcopn.cern.ch Coordination Process LHCOPN Meetings (every 3 months)

riosm
Download Presentation

LHCC Comprehensive Review November 2007

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LHCC Comprehensive ReviewNovember 2007 LHCOPN Networking Status David Foster Head, Network and Communications Systems Group CERN IT-CS

  2. Information • All technical content is on the LHCOPN Twiki: http://lhcopn.cern.ch • Coordination Process • LHCOPN Meetings (every 3 months) • Active Working Groups • Routing • Monitoring • Operations • Active Interfaces to External Networking Activities • European Network Policy Groups • US Research Networking • Grid Deployment Board • LCG Management Board • EGEE

  3. Overview • LHC Wide Area Networking • LHCOPN Mission • Current Status • Production • Issues and Risks • Not Covered • CERN General Purpose Networking • Accelerator and Experiment Networks • Other Communications Systems

  4. Mission • To assure the T0-T1 transfer capability. • Essential for the Grid to distribute data out to the T1’s. • Capacity must be large enough to deal with most situation including “Catch up” • The excess capacity can be used for T1-T1 transfers. • Lower priority than T0-T1 • May not be sufficient for all T1-T1 requirements • Resiliency Objective • No single failure should cause a T1 to be isolated.

  5. GÉANT2: Consortium of 34 NRENs 22 PoPs, ~200 Sites38k km Leased Services, 12k km Dark Fiber Supporting Light Paths for LHC, eVLBI, et al. • Dark Fiber Core Among16 Countries: • Austria • Belgium • Bosnia-Herzegovina • Czech Republic • Denmark • France • Germany • Hungary • Ireland • Italy, • Netherland • Slovakia • Slovenia • Spain • Switzerland • United Kingdom H. Doebbeling Multi-Wavelength Core (to 40) + 0.6-10G Loops

  6. OPN Status SummaryNovember 2007

  7. USLHCNetNovember 2007

  8. USLHCNet • A number of links providing alternate routing for primary traffic. • Relationship with ESNet (and DOE approval) to provide capacity (O(5G)) on the ManLan – AMS link for additional ESNet-GEANT peering • This helps for US Tier-1 to EU Tier-2 connectivity. • US Tier-2 to EU Tier-1 will require additional peering I2-GEANT. Discussions are ongoing.

  9. CBF Status SummaryNovember 2007

  10. Connect. Communicate. Collaborate DK ES IT SURFnet T1 T1 T1 T1 T1 T1 T1 T1 T1 T1 T1 NL UK CERN-TRIUMF CERN-ASGC FR CH T0 NY T0-T1 Lambda routing (schematic) Copenhagen ASGC Via SMW-3 or 4 (?) TRIUMF NDGF T0-T1s: CERN-RAL ??? BNL CERN-PIC CERN-IN2P3 Hamburg RAL SARA CERN-CNAF CERN-GRIDKA MAN LAN London CERN-NDGF Amsterdam CERN-SARA Frankfurt AC-2/Yellow DE USLHCNET NY (AC-2) VSNL N USLHCNET NY (VSNL N) USLHCNET Chicago (VSNL S) VSNL S Paris GRIDKA Starlight Strasbourg/Kehl Stuttgart Atlantic Ocean FNAL Zurich Basel Lyon Madrid Barcelona Milan GENEVA IN2P3 CNAF PIC From Michael Enrico, DANTE

  11. Connect. Communicate. Collaborate DK ES IT SURFnet T1 T1 T1 T1 T1 T1 T1 T1 T1 T1 T1 NL UK FR CH T0 NY T1-T1 Lambda routing (schematic) Copenhagen ASGC TRIUMF Via SMW-3 or 4 (?) NDGF T1-T1s: ??? BNL GRIDKA-CNAF Hamburg RAL SARA GRIDKA-IN2P3 GRIDKA-SARA MAN LAN London SARA-NDGF Frankfurt AC-2/Yellow DE VSNL N VSNL S Paris GRIDKA Starlight Strasbourg/Kehl Stuttgart Atlantic Ocean FNAL Zurich Basel Lyon Madrid Barcelona Milan GENEVA IN2P3 CNAF PIC From Michael Enrico, DANTE

  12. Connect. Communicate. Collaborate DK ES IT KEY GEANT2 SURFnet NREN T1 T1 T1 T1 T1 T1 T1 T1 T1 T1 T1 USLHCNET NL UK Via SURFnet T1-T1 (CBF) FR CH T0 NY Some Initial Observations Copenhagen ASGC TRIUMF Via SMW-3 or 4 (?) NDGF ??? BNL Hamburg RAL SARA (Between CERN and BASEL) Following lambdas run in same fibre pair: CERN-GRIDKA CERN-NDGF CERN-SARA CERN-SURFnet-TRIUMF/ASGC (x2) USLHCNET NY (AC-2) Following lambdas run in same (sub-)duct/trench: (all above +) CERN-CNAF USLHCNET NY (VSNL N) [supplier is COLT] Following lambda MAY run in same (sub-)duct/trench as all above: USLHCNET Chicago (VSNL S) [awaiting info from Qwest…] MAN LAN London Frankfurt AC-2/Yellow DE (Between BASEL and Zurich) Following lambdas run in same trench: CERN-CNAF GRIDKA-CNAF (T1-T1) Following lambda MAY run in same trench as all above: USLHCNET Chicago (VSNL S) [awaiting info from Qwest…] VSNL N VSNL S Paris GRIDKA Starlight Strasbourg/Kehl Stuttgart Atlantic Ocean FNAL Zurich Basel Lyon Madrid Barcelona Milan GENEVA IN2P3 CNAF PIC From Michael Enrico, DANTE

  13. Result • SARA-CERN lambda has been rerouted • 4th diverse USLHCNET lambda will be added • RAL & PIC still need backups • CNAF needs a 3rd route into CERN • Long route around “eastern ring” OR • New CBF solution(s)… • Further investigations required in particular concerning: • Physical routing of GRIDKA-IN2P3 in Paris area • Leased lambdas passing through UK • Further analysis is on-going • May be some layer-1 switching solutions (LCAS) that could help on the GEANT footprint. • Can do “LCAS protected 10GE” for ASGC • Tests are on-going on the USLHCNet footprint

  14. Link Layer Monitoring • Perfsonar very well advanced in deployment (but not yet complete). Monitors the “up/down” status of the links. • Integrated into the “End to End Coordination Unit” (E2ECU) run by DANTE • Provides simple indications of “hard” faults. • Insufficient to understand the quality of the connectivity

  15. Initial Active Measurements • One Way Latency • To measure network Reliability & detect Congestion • Between • Tier0 to Tier1 • Tier1 to Tier1 • Bandwidth • To detect & quantify service degradation • Between • Tier0 and Tier1 • Tier1 to Tier1 • ICMP based Latency • To measure Reliability & Congestion • Between • LHCOPN Edge into Tier1 facility

  16. Active Monitoring Deployment • It is a small number of servers at each Tier-1 • Dante proposes to deploy this as a “service”. Managed and maintained by them. • Mainly funded by the GEANT project as part of the “transition to service” activity. • Major advantages in terms of measurement quality and consistency. • Will be presented at the next OB • Documents in preparation to cover requirements from the T1’s and a “security plan”.

  17. Operational Procedures • Have to be finalised but need to deal with change and incident management. • Many parties involved. • Have to agree on the real processes involved (activity being lead by Mathieu Goutelle) • Recent Operations workshop made some progress • Try to avoid, wherever possible, too many “coordination units”. • All parties agreed we need some centralised information to have a global view of the network and incidents. • Further workshop planned to quantify this. • We also need to understand existing processes used by T1’s.

  18. Resiliency Issues • The physical fiber path considerations continue • Some lambdas have been re-routed. Others still may be. • Layer3 backup paths for RAL and PIC are still an issue. • In the case of RAL, excessive costs seem to be a problem. • For PIC, still some hope of a CBF between RedIris and Renater • Overall the situation is quite good with the CBF links, but can still be improved. • Most major “single” failures are protected against.

  19. Bigger Issues • Will be important to get some agreements from the T1’s • Active Monitoring • Operational Management – in progress • GEANT-2 will end (March 2009), GEANT-3 is being planned. GN-4 and beyond? • Assumption is that GEANT will continue ad-infinitum • What will follow from EGEE-III in terms of network management resources? • Dante may be able to take over most of the responsibility • Funding for USLHCNet assumed to continue.

More Related