430 likes | 565 Views
The Energy Sciences Network BESAC August 2004. Mary Anne Scott Program Manager Advanced Scientific Computing Research Office of Science Department of Energy. William E. Johnston, ESnet Dept. Head and Senior Scientist R. P. Singh, Federal Project Manager
E N D
The Energy Sciences NetworkBESAC August 2004 Mary Anne Scott Program Manager Advanced Scientific Computing Research Office of Science Department of Energy William E. Johnston, ESnet Dept. Head and Senior Scientist R. P. Singh, Federal Project Manager Michael S. Collins, Stan Kluz,Joseph Burrescia, and James V. Gagliardi, ESnet Leads Gizella Kapus, Resource Manager and the ESnet Team Lawrence Berkeley National Laboratory
What is ESnet? • Mission: • Provide, interoperable, effective and reliable communications infrastructure and leading-edge network services that support missions of the Department of Energy, especially the Office of Science • Vision: • Provide seamless and ubiquitous access, via shared collaborative information and computational environments, to the facilities, data, and colleagues needed to accomplish their goals. • Role: • A component of the Office of Science infrastructure critical to the success of its research programs (program funded through ASCR/MICS; managed and operated by ESnet staff at LBNL).
Why is ESnet important? • Enables thousands of DOE, university and industry scientists and collaborators worldwide to make effective use of unique DOE research facilities and computing resources independent of time and geographic location • Direct connections to all major DOE sites • Access to the global Internet (managing 150,000 routes at 10 commercial peering points) • User demand has grown by a factor of more than 10,000 since its inception in the mid 1990’s—a 100 percent increase every year since 1990 • Capabilities not available through commercial networks • Architected to move huge amounts of data between a small number of sites • High bandwidth peering to provide access to US, European, Asia-Pacific, and other research and education networks. Objective: Support scientific research by providing seamless and ubiquitous access to the facilities, data, and colleagues
How is ESnet Managed? • A community endeavor • Strategic guidance from the OSC programs • Energy Science Network Steering Committee (ESSC) • BES represented by Nestor Zaluzec, ANL and Jeff Nichols, ORNL • Network operation is a shared activity with the community • ESnet Site Coordinators Committee • Ensures the right operational “sociology” for success • Complex and specialized – both in the network engineering and the network management – in order to provide its services to the laboratories in an integrated support environment • Extremely reliable in several dimensions • Taken together these points make ESnet a unique facility supporting DOE science that is quite different from a commercial ISP or University network
…what now??? VISION - A scalable, secure, integrated network environment for ultra-scale distributed science is being developed to make it possible to combine resources and expertise to address complex questions that no single institution could manage alone. • Network Strategy Production network • Base TCP/IP services; +99.9% reliable High-impact network • Increments of 10 Gbps; switched lambdas (other solutions); 99% reliable Research network • Interfaces with production, high-impact and other research networks; start electronic and advance towards optical switching; very flexible [UltraScience Net] • Revisit governance model • SC-wide coordination • Advisory Committee involvement
Where do you come in? • Early identification of requirements • Evolving programs • New facilities • Participation in management activities • Interaction with BES representatives on ESSC • Next ESSC meeting on Oct 13-15 in DC area
What Does ESnet Provide? • A network connecting DOE Labs and their collaborators that is critical to the future process of science • An architecture tailored to accommodate DOE’s large-scale science • move huge amounts of data between a small number of sites • High bandwidth access to DOE’s primary science collaborators: Research and Education institutions in the US, Europe, Asia Pacific, and elsewhere • Full access to the global Internet for DOE Labs • Comprehensive user support, including “owning” all trouble tickets involving ESnet users (including problems at the far end of an ESnet connection) until they are resolved – 24x7 coverage • Grid middleware and collaboration services supporting collaborative science • trust, persistence, and science oriented policy
What is ESnet Today? • ESnet builds a comprehensive IP network infrastructure (routing, IPv6, and IP multicast) on commercial circuits • ESnet purchases telecommunications services ranging from T1 (1 Mb/s) to OC192 SONET (10 Gb/s) and uses these to connect core routers and sites to form the ESnet IP network • ESnet purchases peering access to commercial networks to provide full Internet connectivity • Essentially all of the national data traffic supporting US science is carried by two networks –ESnet and Internet-2 / Abilene (which plays a similar role for the university community)
How Do Networks Work? • Accessing a service, Grid or otherwise, such as a Web server, FTP server, etc., from a client computer and client application (e.g. a Web browser_ involves • Target host names • Host addresses • Service identification • Routing
How Do Networks Work? • core routers • focus on high-speed packet forwarding LBNL ESnet (Core network) router core router router • peering routers • Exchange reachability information (“routes”) • implement/enforce routing policy for each provider • provide cyberdefense border router core router gateway router peering router DNS • border/gateway routers • implement separate site and network provider policy (including site firewall policy) peeringrouter router router Big ISP(e.g. SprintLink) router router router Google, Inc. router
ESnet Core is a High-Speed Optical Network RTR RTR RTR RTR RTR RTR ESnet site site LAN Site – ESnet network policy demarcation (“DMZ”) Site IP router ESnet IP router ESnet hub • Wave division multiplexing • today typically 64 x 10 Gb/s optical channels per fiber • channels (referred to as “lambdas”) are usually used in bi-directional pairs • Lambda channels are converted to electrical channels • usually SONET data framing or Ethernet data framing • can be clear digital channels (no framing – e.g. for digital HDTV) 10GE 10GE ESnet core optical fiber ring A ring topology network is inherently reliable – all single point failures are mitigated by routing traffic in the other direction around the ring.
ESnet Provides Full Internet Serviceto DOE Facilities and Collaboratorswith High-Speed Access to all Major Science Collaborators GEANT - Germany - France - Italy - UK - etc Sinet (Japan) Japan – Russia(BINP) CA*net4 KDDI (Japan) France Switzerland Taiwan (TANet2) Australia CA*net4 Taiwan (TANet2) Singaren CERN PNNL NERSC SLAC PNWG ANL BNL MIT INEEL LIGO LLNL LBNL SNLL TWC JGI Starlight 4xLAB-DC GTN&NNSA ANL-DC INEEL-DC ORAU-DC LLNL/LANL-DC JLAB PPPL FNAL AMES ORNL SRS SNLA LANL DOE-ALB PANTEX SDSC ORAU NOAA OSTI ARM ALB HUB YUCCA MT BECHTEL GA Abilene Abilene Abilene Abilene MAN LANAbilene KCP Allied Signal ELP HUB CHI HUB ATL HUB DC HUB NYC HUB NREL CA*net4 MREN Netherlands Russia StarTap Taiwan (ASCC) SEA HUB ESnet IP Japan Chi NAP NY-NAP QWEST ATM MAE-E SNV HUB PAIX-E MAE-W Fix-W PAIX-W Euqinix 42 end user sites Office Of Science Sponsored (22) International (high speed) OC192 (10G/s optical) OC48 (2.5 Gb/s optical) Gigabit Ethernet (1 Gb/s) OC12 ATM (622 Mb/s) OC12 OC3 (155 Mb/s) T3 (45 Mb/s) T1-T3 T1 (1 Mb/s) NNSA Sponsored (12) Joint Sponsored (3) Other Sponsored (NSF LIGO, NOAA) Laboratory Sponsored (6) peering points ESnet core: Packet over SONET Optical Ring and Hubs hubs SNV HUB high-speed peering points
ESnet’s Peering InfrastructureConnects the DOE Community With its Collaborators NY-NAP STARLIGHT CHI NAP EQX-ASH EQX-SJ MAE-E PAIX-E GA CA*net4 CERN MREN Netherlands Russia StarTap Taiwan (ASCC) Australia CA*net4 Taiwan (TANet2) Singaren GEANT - Germany - France - Italy - UK - etc SInet (Japan) KEK Japan – Russia (BINP) KDDI (Japan) France PNW-GPOP SEA HUB 2 PEERS Distributed 6TAP 19 Peers Abilene Japan 1 PEER CalREN2 NYC HUBS 1 PEER LBNL Abilene + 7 Universities SNV HUB 5 PEERS Abilene 2 PEERS PAIX-W 26 PEERS MAX GPOP MAE-W 22 PEERS 39 PEERS 20 PEERS FIX-W 6 PEERS 3 PEERS LANL CENIC SDSC Abilene ATL HUB TECHnet ESnet provides access to all of the Internet by managing the full complement of Global Internet routes (about 150,000) at 10 general/commercial peering points + high-speed peerings w/ Abilene and the international R&E networks. This is a lot of work, and is very visible, but provides full access for DOE. ESnet Peering (connections to other networks) University International Commercial
What is Peering? • Peering points exchange routing information that says “which packets I can get closer to their destination” • ESnet daily peeringreport(top 20 of about 100) • This is a lot of work peering with this outfitis not random, it carriesroutes that ESnet needs(e.g. to the Russian Backbone Net)
What is Peering? • Why so many routes? So that when I want to get to someplace out of the ordinary, I can get there. For example:http://www-sbras.nsc.ru/eng/sbras/copan/microel_main.html (Technological Design Institute of Applied Microelectronics, Novosibirsk, Russia)
Predictive Drivers for the Evolution of ESnet August 13-15, 2002 Organized by Office of Science Mary Anne Scott, Chair Dave Bader Steve Eckstrand Marvin Frazier Dale Koelling Vicky White Workshop Panel Chairs Ray Bair and Deb Agarwal Bill Johnston and Mike Wilde Rick Stevens Ian Foster and Dennis Gannon Linda Winkler and Brian Tierney Sandy Merola and Charlie Catlett • The network is needed for: • long term (final stage) data analysis • “control loop” data analysis (influence an experiment in progress) • distributed, multidisciplinary simulation • The network and middleware requirements to support DOE science were developed by the OSC science community representing major DOE science disciplines • Climate • Spallation Neutron Source • Macromolecular Crystallography • High Energy Physics • Magnetic Fusion Energy Sciences • Chemical Sciences • Bioinformatics • Available at www.es.net/#research
The Analysis was Driven by the Evolving Process of Science analysis was driven by
Observed Drivers for ESnet Evolution • Are we seeing the predictions of two years ago come true? • Yes!
OSC Traffic Increases by 1.9-2.0 X Annually ESnet is currently transporting about 250 terabytes/mo.(250,000,000 MBy/mo.) ESnet Monthly Accepted Traffic TBytes/Month Annual growth in the past five years has increased from 1.7x annually to just over 2.0x annually.
ESnet is Engineered to Move a Lot of Data ESnet Top 20 Data Flows, 24 hr. avg., 2004-04-20 A small number of science users account for a significant fraction of all ESnet traffic SLAC (US) IN2P3 (FR) 1 Terabyte/day Fermilab (US) CERN SLAC (US) INFN Padva (IT) Fermilab (US) U. Chicago (US) U. Toronto (CA) Fermilab (US) Helmholtz-Karlsruhe (DE) SLAC (US) CEBAF (US) IN2P3 (FR) INFN Padva (IT) SLAC (US) Fermilab (US) JANET (UK) SLAC (US) JANET (UK) DOE Lab DOE Lab Argonne (US) Level3 (US) DOE Lab DOE Lab Fermilab (US) INFN Padva (IT) Argonne SURFnet (NL) IN2P3 (FR) SLAC (US) • Since BaBar data analysis started, the top 20 ESnet flows have consistently accounted for > 50% of ESnet’s monthly total traffic (~130 of 250 TBy/mo)
ESnet Top 10 Data Flows, 1 week avg., 2004-07-01 • The traffic is not transient: Daily and weekly averages are about the same. • SLAC is a prototype for what will happen when Climate, Fusion, SNS, Astrophysics, etc., start to ramp up the next generation science SLAC (US) INFN Padua (IT)5.9 Terabytes SLAC (US) IN2P3 (FR) 5.3 Terabytes FNAL (US) IN2P3 (FR)2.2 Terabytes FNAL (US) U. Nijmegen (NL)1.0 Terabytes SLAC (US) Helmholtz-Karlsruhe (DE) 0.9 Terabytes CERN FNAL (US)1.3 Terabytes U. Toronto (CA) Fermilab (US)0.9 Terabytes FNAL (US) Helmholtz-Karlsruhe (DE) 0.6 Terabytes U. Wisc. (US) FNAL (US) 0.6 Terabytes FNAL (US) SDSC (US) 0.6 Terabytes
ESnet is a Critical Element of Large-Scale Science • ESnet is a critical part of the large-scale science infrastructure of high energy physics experiments, climate modeling, magnetic fusion experiments, astrophysics data analysis, etc. • As other large-scale facilities – such as SNS – turn on, this will be true across DOE
Science Mission Critical Infrastructure • ESnet is a visible and critical piece of general DOE science infrastructure • if ESnet fails, tens of thousands of DOE and University users know it within minutes if not seconds • Requires high reliability and high operational security in the • network operations, and • ESnet infrastructure support – the systems that support the operation and management of the network and services • Secure and redundant mail and Web systems are central to the operation and security of ESnet • trouble tickets are by email • engineering communication by email • engineering database interface is via Web • Secure network access to Hub equipment • Backup secure telephony access to all routers • 24x7 help desk (joint w/ NERSC) and 24x7 on-call network engineers
Automated, real-time monitoring of traffic levels and operating state of some 4400 network entities is the primary network operational and diagnosis tool Network Configuration Performance OSPF Metrics(routing and connectivity) SecureNet IBGP Mesh(routing and connectivity) Hardware Configuration
ESnet’s Physical Infrastructure Equipment rack detail at NYC Hub, 32 Avenue of the Americas (one of ESnet’s core optical ring sites) Picture detail
Typical Equipment of an ESnet Core Network Hub Qwest DS3 DCX Sentry power 48v 30/60 amp panel ($3900 list) AOA Performance Tester ($4800 list) Sentry power 48v 10/25 amp panel ($3350 list) DC / AC Converter ($2200 list) Cisco 7206 AOA-AR1 (low speed links to MIT & PPPL) ($38,150 list) Lightwave Secure Terminal Server ($4800 list) ESnet core equipment @ Qwest 32 AofA HUB NYC, NY (~$1.8M, list) Juniper T320 AOA-CR1 (Core router) ($1,133,000 list) Juniper OC192 Optical Ring Interface (the AOA end of the OC192 to CHI ($195,000 list) Juniper OC48 Optical Ring Interface (the AOA end of the OC48 to DC-HUB ($65,000 list) Juniper M20 AOA-PR1 (peering RTR) ($353,000 list)
Disaster Recovery and Stability BNL LBNL TWC PPPL AMES ELP HUB SNV HUB CHI HUB NYC HUBS DC HUB ATL HUB SEA HUB ALB HUB • Engineers, 24x7 Network Operations Center, generator backed power • Spectrum (net mgmt system) • DNS (name – IP address translation) • Eng database • Load database • Config database • Public and private Web • E-mail (server and archive) • PKI cert. repository and revocation lists • collaboratory authorization service • Remote Engineer • partial duplicate infrastructure DNS Remote Engineer Duplicate Infrastructure Currently deploying full replication of the NOC databases and servers and Science Services databases in the NYC Qwest carrier hub • Remote Engineer • partial duplicate infrastructure • The network must be kept available even if, e.g., the West Coast is disabled by a massive earthquake, etc. • Reliable operation of the network involves • remote NOCs • replicated support infrastructure • generator backed UPS power at all critical network and infrastructure locations • non-interruptible core - ESnet core operated without interruption through • N. Calif. Power blackout of 2000 • the 9/11/2001 attacks, and • the Sept., 2003 NE States power blackout
ESnet WAN Security and Cyberattack Defense • Cyber defense is a new dimension of ESnet security • Security is now inherently a global problem • As the entity with a global view of the network, ESnet has an important role in overall security 30 minutes after the Sapphire/Slammer worm was released, 75,000 hosts running Microsoft's SQL Server (port 1434) were infected. (“The Spread of the Sapphire/Slammer Worm,” David Moore (CAIDA & UCSD CSE), Vern Paxson (ICIR & LBNL), Stefan Savage (UCSD CSE), Colleen Shannon (CAIDA), Stuart Staniford (Silicon Defense), Nicholas Weaver (Silicon Defense & UC Berkeley EECS) http://www.cs.berkeley.edu/~nweaver/sapphire ) Jan., 2003
ESnet and Cyberattack Defense Sapphire/Slammer worm infection hits creating almost a full Gb/s (1000 megabit/sec.) traffic spike on the ESnet backbone
Cyberattack Defense ESnet third response – shut down the main peering paths and provide only limited bandwidth paths for specific “lifeline” services ESnet first response – filters to assist a site ESnet second response – filter traffic from outside of ESnet peeringrouter X X router ESnet router LBNL attack traffic router X borderrouter Lab first response – filter incoming traffic at their ESnet gateway router gatewayrouter peeringrouter border router Lab gatewayrouter Lab • Sapphire/Slammer worm infection created a Gb/s of traffic on the ESnet core until filters were put in place (both into and out of sites) to damp it out.
Science Services: Support for Shared, Collaborative Science Environments • X.509 identity certificates and Public Key Infrastructure provides the basis of secure, cross-site authentication of people and systems (www.doegrids.org) • ESnet negotiates the cross-site, cross-organization, and international trust relationships to provide policies that are tailored to collaborative science in order to permit sharing computing and data resources, and other Grid services • Certification Authority (CA) issues certificates after validating request against policy • This service was the basis of the first routine sharing of HEP computing resources between US and Europe
Science Services: Public Key Infrastructure * Report as of July 15,2004
Voice, Video, and Data Tele-Collaboration Service • Another highly successful ESnet Science Service is the audio, video, and data teleconferencing service to support human collaboration • Seamless voice, video, and data teleconferencing is important for geographically dispersed scientific collaborators • ESnet currently provides to more than a thousand DOE researchers and collaborators worldwide • H.323 (IP) videoconferences (4000 port hours per month and rising) • audio conferencing (2500 port hours per month) (constant) • data conferencing (150 port hours per month) • Web-based, automated registration and scheduling for all of these services • Huge cost savings for the Labs
ESnet’s Evolution over the Next 10-20 Years • Upgrading ESnet to accommodate the anticipated increase from the current 100%/yr traffic growth to 300%/yr over the next 5-10 years is priority number 7 out of 20 in DOE’s “Facilities for the Future of Science – A Twenty Year Outlook” • Based on the requirements of the OSC Network Workshops, ESnet must address • Capable, scalable, and reliable production IP networking • University and international collaborator connectivity • Scalable, reliable, and high bandwidth site connectivity • Network support of high-impact science • provisioned circuits with guaranteed quality of service(e.g. dedicated bandwidth) • Science Services to support Grids, collaboratories, etc
New ESnet Architecture to Accommodate OSC • The future requirements cannot be met with the current, telecom provided, hub and spoke architecture of ESnet Chicago (CHI) New York (AOA) ESnetCore DOE sites Washington, DC (DC) Sunnyvale (SNV) Atlanta (ATL) El Paso (ELP) • The core ring has good capacity and resiliency against single point failures, but the point-to-point tail circuits are neither reliable nor scalable to the required bandwidth
Evolving Requirements for DOE Science Network Infrastructure S C C&C C&C I C&C C&C C&C C C&C C S S C S C guaranteedbandwidthpaths I 1-40 Gb/s,end-to-end I 2-4 yr Requirements 1-3 yr Requirements C C C C storage S S S compute C instrument I cache &compute C&C S C C&C C&C I C&C C&C C&C C 3-5 yr Requirements 4-7 yr Requirements C&C 100-200 Gb/s,end-to-end C S
A New Architecture • With the current architecture ESnet cannot address • the increasing reliability requirements • the long-term bandwidth needs(incrementally increasing tail circuit bandwidth is too expensive – it will not scale to what OSC needs) • LHC will need dedicated 10 Gb/s into and out of FNAL and BNL • ESnet can benefit from • Engaging the research and education networking community for advanced technology • Leveraging the R&E community investment in fiber and networks
A New Architecture • ESnet new architecture goals: full redundant connectivity for every site and high-speed access for every site (at least 10 Gb/s) • Three part strategy 1) MAN rings provide dual site connectivity and much higher site-to-core bandwidth 2) A second core will provide • multiply connected MAN rings for protection against hub failure • extra core capacity • a platform for provisioned, guaranteed bandwidth circuits • alternate path for production IP traffic • carrier neutral hubs 3) a high-reliability IP core (like the current ESnet core)
A New ESnet Architecture Asia-Pacific Europe 2nd Core (e.g. NLR) Chicago (CHI) New York(AOA) ESnetExistingCore MetropolitanAreaRings Washington, DC (DC) Sunnyvale(SNV) Atlanta (ATL) Existing hubs El Paso (ELP) New hubs DOE/OSC Labs Possible new hubs
ESnet Beyond FY07 AsiaPac SEA CERN Europe Europe Japan Japan CHI SNV NYC DEN DC Japan ALB ATL SDG MANs Qwest – ESnet hubs ELP NLR – ESnet hubs High-speed cross connects with Internet2/Abilene Major DOE Office of Science Sites Production IP ESnet core 10Gb/s 30Bg/s40Gb/s High-impact science core 2.5 Gbs10 Gbs Lab supplied Future phases Major international
Conclusions • ESnet is an infrastructure that is critical to DOE’s science mission • Focused on the Office of Science Labs, but serves many other parts of DOE • ESnet is working hard to meet the current and future networking need of DOE mission science in several ways: • Evolving a new high speed, high reliability, leveraged architecture • Championing several new initiatives which will keep ESnet’s contributions relevant to the needs of our community
Reference -- Planning Workshops • High Performance Network Planning Workshop, August 2002 http://www.doecollaboratory.org/meetings/hpnpw • DOE Workshop on Ultra High-Speed Transport Protocols and Network Provisioning for Large-Scale Science Applications, April 2003 http://www.csm.ornl.gov/ghpn/wk2003 • Science Case for Large Scale Simulation, June 2003 http://www.pnl.gov/scales/ • DOE Science Networking Roadmap Meeting, June 2003 http://www.es.net/hypertext/welcome/pr/Roadmap/index.html • Workshop on the Road Map for the Revitalization of High End Computing, June 2003 http://www.cra.org/Activities/workshops/nitrd http://www.sc.doe.gov/ascr/20040510_hecrtf.pdf (public report) • ASCR Strategic Planning Workshop, July 2003 http://www.fp-mcs.anl.gov/ascr-july03spw • Planning Workshops-Office of Science Data-Management Strategy, March & May 2004 • http://www-conf.slac.stanford.edu/dmw2004 (report coming soon)