440 likes | 545 Views
ESnet New Architecture Customer Empowered Fibre Networks (CEF) Prague, May 17, 2005. William E. Johnston ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory wej@es.net. ESnet Serves DOE Office of Science Sites. Office of Science (OSC) has 10 National Labs (blue)
E N D
ESnet New ArchitectureCustomer Empowered Fibre Networks (CEF) Prague, May 17, 2005 William E. Johnston ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory wej@es.net
ESnet Serves DOE Office of Science Sites • Office of Science (OSC) has 10 National Labs (blue) • 7 other DOE Labs also have major OSC programs
ESnet • ESnet’s mission to support the large-scale science of the U.S. DOE Office of Science results in a unique network • ESnet currently transports about 400-450 Terabytes/month • Top 100 data flows each month account for about25-40% of the total monthly network traffic • These top 100 flows represent massive data flows from science experiments to analysis sites and back • At the same time ESnet supports all of the other DOE collaborative science and the Lab operations • The other 60-75% of the ESnet monthly traffic is in 6,000,000,000 flows
ESnet Provides a High-Speed, full Internet Services Networkfor DOE Facilities and Collaborators (Summer 2005 status) SINet (Japan) Japan – Russia (BINP) Australia CA*net4 Taiwan (TANet2) Singaren CA*net4 France GLORIAD Kreonet2 MREN Netherlands StarTap TANet2 Taiwan (ASCC) CERN (LHCnet – partDOE funded) GEANT - Germany, France, Italy, UK, etc PNNL PNWGPoP NERSC SLAC BNL MIT ANL INEEL LIGO LBNL LLNL MAN LANAbilene SNLL JGI TWC Starlight 4xLAB-DC GTN&NNSA INEEL-DC ORAU-DC LLNL/LANL-DC Chi NAP JLAB FNAL AMES PPPL ORNL SRS SNV SDN HUB SNV SDN HUB LANL SNLA DOE-ALB PANTEX NOAA ORAU OSTI ARM YUCCA MT BECHTEL-NV GA Abilene Abilene Abilene Abilene MAXGPoP Allied Signal KCP SDSC HUB ALB HUB ELP HUB ATL HUB DC HUB NYC HUB CHI HUB SoXGPoP NREL ESnet Science Data Network (SDN) core SEA HUB ~2600 miles (4200 km) ESnet IP core CHI-SL HUB QWEST ATM MAE-E SNV HUB Equinix PAIX-PA Equinix, etc. 42 end user sites Office Of Science Sponsored (22) NNSA Sponsored (12) International (high speed) 10 Gb/s SDN core 10G/s IP core 2.5 Gb/s IP core MAN rings (≥ 10 G/s) OC12 ATM (622 Mb/s) OC12 / GigEthernet OC3 (155 Mb/s) 45 Mb/s and less Joint Sponsored (3) Other Sponsored (NSF LIGO, NOAA) ESnet IP core: Packet over SONET Optical Ring and Hubs Laboratory Sponsored (6) peering points SND core hubs IP core hubs SNV HUB high-speed peering points
ESnet Logical Connectivity: Peering and Routing Infrastructure NY-NAP STARLIGHT CHI NAP MAE-E EQX-SJ EQX-ASH GA Direct (core-core) ESnet peering points (connections to other networks) Australia CA*net4 Taiwan (TANet2) Singaren SINet (Japan) KEK Japan – Russia (BINP) University CA*net4 CERN France GLORIAD Kreonet2 MREN Netherlands StarTap Taiwan (ASCC) TANet2 International Commercial GEANT - Germany - France - Italy - UK - etc PNW-GPOP SEA HUB 2 PEERS Distributed 6TAP 18 Peers Abilene 6 PEERS 1 PEER CalREN2 NYC HUBS 1 PEER 6 Universities LBNL SNV HUB 10 PEERS Abilene 2 PEERS Abilene 16 PEERS MAX GPOP PAIX-W MAE-W 36 PEERS 13 PEERS 28 PEERS TECHnet 2 PEERS FIX-W NGIX 14 PEERS 2 PEERS LANL CENIC SDSC Abilene ATL HUB • ESnet supports science collaboration by providing full Internet access • manages the full complement of Global Internet routes (about 160,000 IPv4 routes from 180 peers) at 40 general peering points in order to provide DOE scientists access to all Internet sites • high-speed peerings w/ Abilene and the international R&E networks • This is a lot of work and is very visible.
Observed Drivers for the Evolution of ESnet ESnet is Currently Transporting About 430 Terabytes/mo.(=430,000 Gigabytes/mo. = 430,000,000 Megabytes/mo.)and this volume is increasing exponentially ESnet Monthly Accepted Traffic Feb., 1990 – Feb. 2005 TBytes/Month
Observed Drivers for the Evolution of ESnet ESnet traffic has increased by 10X every 46 months, on average, since 1990 Dec., 2001 TBytes/Month 42 months Jul., 1998 57 months Oct., 1993 Aug., 1990 39 months
Source and Destination of the Top 30 Flows, Feb. 2005 DOE Lab-International R&E Lab-U.S. R&E (domestic) 12 SLAC (US) RAL (UK) Lab-Lab (domestic) Fermilab (US) WestGrid (CA) Lab-Comm. (domestic) 10 Terabytes/Month 8 SLAC (US) IN2P3 (FR) LIGO (US) Caltech (US) 6 SLAC (US) Karlsruhe (DE) Fermilab (US) U. Texas, Austin (US) SLAC (US) INFN CNAF (IT) LLNL (US) NCAR (US) Fermilab (US) Johns Hopkins Fermilab (US) Karlsruhe (DE) Fermilab (US) UC Davis (US) Fermilab (US) SDSC (US) Fermilab (US) U. Toronto (CA) IN2P3 (FR) Fermilab (US) U. Toronto (CA) Fermilab (US) Fermilab (US) MIT (US) LBNL (US) U. Wisc. (US) 4 Qwest (US) ESnet (US) DOE/GTN (US) JLab (US) NERSC (US) LBNL (US) CERN (CH) Fermilab (US) NERSC (US) LBNL (US) NERSC (US) LBNL (US) NERSC (US) LBNL (US) NERSC (US) LBNL (US) BNL (US) LLNL (US) BNL (US) LLNL (US) CERN (CH) BNL (US) BNL (US) LLNL (US) BNL (US) LLNL (US) 2 0
Science Requirements for Networking August, 2002 Workshop Organized by Office of Science Mary Anne Scott, Chair, Dave Bader,Steve Eckstrand. Marvin Frazier, Dale Koelling, Vicky White Workshop Panel Chairs: Ray Bair, Deb Agarwal, Bill Johnston, Mike Wilde, Rick Stevens, Ian Foster, Dennis Gannon, Linda Winkler, Brian Tierney, Sandy Merola, and Charlie Catlett • The network and middleware requirements to support DOE science were developed by the OSC science community representing major DOE science disciplines: • Climate simulation • Spallation Neutron Source facility • Macromolecular Crystallography • High Energy Physics experiments • Magnetic Fusion Energy Sciences • Chemical Sciences • Bioinformatics • The major supercomputing facilities and Nuclear Physics were considered separately • Conclusions: the network is essential for • long term (final stage) data analysis and collaboration • “control loop” data analysis (influence an experiment in progress) • distributed, multidisciplinary simulation • Available at www.es.net/#research
HPSS HPSS HPSS HPSS Tier2 Center Tier2 Center Tier2 Center Tier2 Center Tier2 Center CERN / LHC High Energy Physics Data Provides One ofScience’s Most Challenging Data Management Problems (CMS is one of several experiments at LHC) ~100 MBytes/sec event simulation Online System ~PByte/sec Tier 0 +1 eventreconstruction HPSS human CERN LHC CMS detector 15m X 15m X 22m, 12,500 tons, $700M. 2.5-40 Gbits/sec Tier 1 German Regional Center French Regional Center Fermilab, USA Regional Center Italian Center ~0.6-2.5 Gbps analysis Tier 2 ~0.6-2.5 Gbps Tier 3 • 2000 physicists in 31 countries are involved in this 20-year experiment in which DOE is a major player. • Grid infrastructure spread over the US and Europe coordinates the data analysis Institute ~0.25TIPS Institute Institute Institute Physics data cache 100 - 1000 Mbits/sec Tier 4 Courtesy Harvey Newman, CalTech Workstations
The DOE Participation in the LHC is the ImmediateSource of Requirements for Changes to ESnet • Both LHC tier 1 data centers in the U.S. are at DOE Office of Science Labs – Fermilab (Chicago) andBrookhaven Lab (Long Island (New York)) • Data from the two major LHC experiments – CMS and Atlas – will be stored at these centers for analysis by groups at US universities • As LHC (CERN high energy physics accelerator) data starts to move, the large science flows in ESnet will increase a lot (200-2000 times)
The DOE Participation in the LHC is the ImmediateSource of Requirements for Changes to ESnet • CERN and DOE will bring 10G circuits from CERN to Chicago/Starlight and MANLAN (New York)for moving LHC data to these centers • Each path will go 20+Gb/s by 2008 • Full bandwidth backup must be provided • Similar aggregate bandwidth will be required out of the Tier 1 centers to the 15 U. S. Tier 2 (analysis) sites (universities) • Progress in the network configuration is driven by progressively more realistic experiments – “service challenges” (formerly “mock data challenges”)
SC2 met its throughput targets • Service Challenge 2 • Throughput test from Tier-0 to Tier-1 sites • Started 14th March • Set up Infrastructure to 7 Sites • BNL (Upton, NY), CCIN2P3 (Lyon), CNAF (Bologna), FNAL (Chicago), GridKa (Karlsruhe), RAL (Didcot, UK), SARA (Amsterdam) • ~100MB/s to each site • At least 500MB/s combined out of CERN at same time • 500MB/s to a few sites individually • Two weeks sustained 500 MB/s out of CERN Kors Bos, NIKHEF, Amsterdam
SC2 Tier0/1 Network Topology CNAF CERN Tier-0 1G BNL 10G GridKa 10G 1G shared 1G shared IN2P3 UKLight ESNet StarLight GEANT Nether Light 10G 10G 3x1G 2x1G 2x1G 2x1G 10G 3x1G RAL SARA FNAL Kors Bos, NIKHEF, Amsterdam
SC2 met its throughput targets • >600MB/s daily average for 10 days was achieved - Midday 23rd March to Midday 2nd April • Not without outages, but system showed it could recover rate again from outages • Load reasonable evenly divided over sites (given network bandwidth constraints of Tier-1 sites) Kors Bos, NIKHEF, Amsterdam
LHC high-level network architecture Erik-Jan Bos Director of Network Services SURFnet, The Netherlands T0/T1 network meeting NIKHEF/SARA, Amsterdam, The Netherlands; April 8, 2005 • Optical Private Network, consisting of dedicated 10G paths between T0 and each T1, two flavors: • “Light path T1” • “Routed T1” • Special measures for back-up for T0-T1, to be filled-in later • T0 preferred interface is 10Gbps Ethernet LAN-PHY
ESnet Approach for LHC Requirements FNAL FNAL LHC SL GEANT MAN LAN ESnet ESnet SDN SDN SDN IP IP IP CANARIE TRIUMF CERN CERN BNL 32 AoA Starlight 60 Hudson(or 32 AoA) QwestChicago FNAL All paths are 10 Gb/s Sunnyvale
ESnet Evolution Chicago (CHI) New York (AOA) ESnetCore DOE sites Washington, DC (DC) Sunnyvale (SNV) Atlanta (ATL) El Paso (ELP) • With the old architecture (to 2004) ESnet can not meet the new requirements • The current core ring cannot handle the anticipated large science data flows at affordable cost • The current point-to-point tail circuits to sites are neither reliable nor scalable to the required bandwidth
ESnet’s Evolution – The Requirements • In order to accommodate the growth, and the change in the types of traffic, the architecture of the network must change to support the general requirements of 1) High-speed, scalable, and reliable production IP networking • University and international collaborator and general science connectivity • Highly reliable site connectivity to support Lab operations • Global Internet connectivity 2) Support for the high bandwidth data flows of large-scale science • Very high-speed network connectivity to specific sites • Scalable, reliable, and very high bandwidth site connectivity • Also, provisioned circuits with guaranteed quality of service(e.g. dedicated bandwidth) and for traffic isolation
ESnet’s Evolution – The Requirements • The general requirements, then, are • Fully redundant connectivity for every site • High-speed access to the core for every site • at least 20 Gb/s, generally, and 40-100 Gb/s for some sites • 100 Gbps national core/backbone bandwidth by 2008 in two independent backbones
ESnet Strategy For A New Architecture Three part strategy 1) Metropolitan Area Network (MAN) rings to provide • dual site connectivity for reliability • much higher site-to-core bandwidth • support for both production IP and circuit-based traffic 2) A Science Data Network (SDN) core for • provisioned, guaranteed bandwidth circuits to support large, high-speed science data flows • very high total bandwidth • multiply connecting MAN rings for protection against hub failure • alternate path for production IP traffic 3) A High-reliability IP core (e.g. the current ESnet core) to address • general science requirements • Lab operational requirements • Backup for the SDN core • vehicle for science services
ESnet Target Architecture: IP Core + Science Data Network + MANs CERN Asia-Pacific GEANT (Europe) ESnet Science Data Network (2nd Core - NLR) Seattle Aus. Chicago New York Sunnyvale Washington, DC MetropolitanArea Rings ESnetIP Core Aus. LA Atlanta (ATL) San Diego Albuquerque (ALB) El Paso (ELP) IP core hubs Production IP core Science Data Network core Metropolitan Area Networks Lab supplied International connections SDN/NLR hubs Primary DOE Labs Possible new hubs
ESnet New Architecture - Tactics • How does ESnet get to the 100 Gbps backbone and the20-40 Gbps redundant site connectivity that is needed by the OSC community in the 3-5 yr time frame? • Only a hybrid approach is affordable • The core IP network that carries the general science and Lab enterprise traffic should be provided by a commercial telecom carrier in the wide area in order to get the >99.9% reliability that certain types of science use and the Lab CIOs demand • Part, or even most, of the wide area bandwidth for the high impact science networking will be provided by National Lambda Rail (NLR) – an R&E network that is much less expensive than commercial telecoms (98% reliable) • The Metropolitan Area Networks that get the Labs to the ESnet cores are a mixed bag and somewhat opportunistic – a combination of R&E networks, dark fiber networks, and commercial managed lambda circuits will be used
ESnet New Architecture – Risk Mitigation • NLR today is about 98% reliable* which is not sufficient for some applications, however risk mitigation is provided by the new architecture • For bulk data transfer the requirement is typically that enough data reach the analysis systems to keep them operating at full speed because if they fall behind they cannot catch up • The risk mitigation strategy is • enough buffering at the analysis site to tolerate short outages • over provisioning the network so that data transfer bandwidth can be increased after a network failure in order to refill the buffers *Estimate based on observation by Steve Corbato, Internet2/Abilene
ESnet New Architecture – Risk Mitigation • For experiments requiring “real time” guarantees – e.g. Magnetic Fusion experiment data analysis during an experiment (as described in ref. 1) – the requirement is typically for high reliability • The risk mitigation strategy is a “hot” backup path via the production IP network • The backup path would be configured so that it did not consume bandwidth unless it was brought into use by failover from the primary path • The general strategy would work for any application whose network connectivity does not require a significant fraction of the production IP network as backup • this is true for all of real time application examined in the workshop • it might not be true for a large-scale, Grid based workflow system
ESnet Strategy: MANs • The MAN (Metropolitan Area Networks) architecture is designed to provide • At least 2x10 Gb/s access to every site • 10 Gb/s production IP traffic and backup for large science data • 10 Gb/s for circuit based transport services for large-scale science • At least one redundant path from sites to ESnet core • Scalable bandwidth options from sites to ESnet core • The first step in point-to-point provisioned circuits • Tactics • Build MAN rings from managed lambda services • The 10 Gb/s Ethernet ring for virtual circuits and the 10Gb/s IP ring are not commercially available services
ESnet MAN Architecture (e.g. Chicago) T320 T320 monitor monitor core router R&E peerings ESnet production IP core International peerings core router ESnet SDN core switches managingmultiple lambdas Qwest Starlight ESnet managedλ / circuit services ESnet managedλ / circuit services tunneled through the IP backbone ESnet management and monitoring 2-4 x 10 Gbps channels ESnet production IP service ANL FNAL site equip. Site gateway router site equip. Site gateway router Site LAN Site LAN
San Francisco Bay Area – the First ESnet MAN ~46 miles (74 km)
The First ESnet MAN: SF Bay Area (Sept., 2005) Seattle and Chicago (NLR) • 2 λs (2 X 10 Gb/s channels) in a ring configuration, and delivered as 10 GigEther circuits • Dual site connection (independent “east” and “west” connections) to each site • Will be used as a 10 Gb/s production IP ring and2 X 10 Gb/s paths (for circuit services) to each site • Qwest contract signed for two lambdas 2/2005 with options on two more • One link every month - completion date is 9/2005 Chicago(Qwest hub) ESnet MAN ring (~46 miles (74km) dia.) λ4 future Joint Genome Institute LBNL λ3 future NERSC λ2 SDN/circuits λ1 production IP SF Bay Area LLNL SNLL SLAC ESnet hubs and sites Qwest /ESnet hub Level 3hub Qwest-ESnetnational core ring National Lambda Rail circuits NASAAmes LA andSan Diego El Paso
ESnet New Architecture - Tactics • Science Data Network (SDN) • Most of the bandwidth is needed along the West and East coasts and across the northern part of the country • Use multiple National Lambda Rail* (NLR) lambdas to provide 30-50 Gbps by 2008 • Close the SDN ring in the south to provide resilience at 10 Gbps • Funding has been requested for this upgrade * NLR is a consortium of US R&E institutions that operate a national, optical fiber network
ESnet Goal – 2007/2008 Major DOE Office of Science Sites AsiaPac • 10 Gbps enterprise IP traffic • 40-60 Gbps circuit based transport SEA Europe CERN CERN Aus. Europe ESnet Science Data Network (2nd Core – 30-50 Gbps,National Lambda Rail) Japan Japan CHI SNV Europe NYC DEN DC MetropolitanAreaRings Aus. ESnet IP Core (≥10 Gbps) ALB ATL SDG ESnet hubs New ESnet hubs ELP Metropolitan Area Rings High-speed cross connects with Internet2/Abilene 10Gb/s 10Gb/s 30Gb/s40Gb/s Production IP ESnet core Science Data Network core Lab supplied Major international
Proposed ESnet Lambda InfrastructureBased on National Lambda Rail – FY08 Seattle Boise Clev Chicago New York Denver Sunnyvale KC Pitts Wash DC Raleigh Tulsa LA Albuq. Phoenix San Diego Atlanta Dallas Jacksonville El Paso - Las Cruces Pensacola Baton Rouge Houston San Ant. NLR regeneration / OADM sites NLR wavegear sites
New Network Services • New network services are also critical for ESnet to meet the needs of large-scale science • Most important new network service is dynamically provisioned virtual circuits that provide • Traffic isolation • will enable the use of non-standard transport mechanisms that cannot co-exist with TCP based transport • Guaranteed bandwidth • the only way that we have currently to address deadline scheduling – e.g. where fixed amounts of data have to reach sites on a fixed schedule in order that the processing does not fall behind far enough so that it could never catch up – very important for experiment data analysis • Control plane is being jointly developed with Internet2/HOPI
OSCARS: Guaranteed Bandwidth Service bandwidthbroker allocationmanager authorization resource manager policer usersystem1 shaper site A resource manager usersystem2 resource manager policer resource manager site B
References – DOE Network Related Planning Workshops • 1) High Performance Network Planning Workshop, August 2002 http://www.doecollaboratory.org/meetings/hpnpw • 2) DOE Science Networking Roadmap Meeting, June 2003 http://www.es.net/hypertext/welcome/pr/Roadmap/index.html 3) DOE Workshop on Ultra High-Speed Transport Protocols and Network Provisioning for Large-Scale Science Applications, April 2003 http://www.csm.ornl.gov/ghpn/wk2003 4) Science Case for Large Scale Simulation, June 2003 http://www.pnl.gov/scales/ 5) Workshop on the Road Map for the Revitalization of High End Computing, June 2003 http://www.cra.org/Activities/workshops/nitrd http://www.sc.doe.gov/ascr/20040510_hecrtf.pdf (public report) 6) ASCR Strategic Planning Workshop, July 2003 http://www.fp-mcs.anl.gov/ascr-july03spw 7) Planning Workshops-Office of Science Data-Management Strategy, March & May 2004 • http://www-conf.slac.stanford.edu/dmw2004