Connecting USA DOE Labs to Science World: ESnet

ESnet - Connecting the USA DOE Labs to the World of Science Eli Dart, Network Engineer Network Engineering Group Energy Sciences Network Lawrence Berkeley National Laboratory Chinese American Network Symposium Indianapolis, Indiana October 20, 2008 Networking for the Future of Science

Overview • ESnet and the DOE Office of Science need for high performance networks • ESnet4 architecture • The network as a tool for science – performance • Enabling Chinese-American science collaborations

DOE Office of Science and ESnet – the ESnet Mission • ESnet’s primary mission is to enable the large-scale science that is the mission of the Office of Science (SC) and that depends on: • Sharing of massive amounts of data • Supporting thousands of collaborators world-wide • Distributed data processing • Distributed data management • Distributed simulation, visualization, and computational steering • Collaboration with the US and International Research and Education community • ESnet provides network and collaboration services to Office of Science laboratories and many other DOE programs in order to accomplish its mission • ESnet is the sole provider of high-speed connectivity to most DOE national laboratories

The “New Era” of Scientific Data • Modern science is completely dependent on high-speed networking • As the instruments of science get larger and more sophisticated, the cost goes up to the point where only a very few are built (e.g. one LHC, one ITER, one James Webb Space Telescope, etc.) • The volume of data generated by these instruments is going up exponentially • These instruments are mostly based on solid state sensors and so follow the same Moore’s Law as do computer CPUs, though the technology refresh cycle for instruments is 10-20 years rather than 1.5 years for CPUs • The data volume is at the point where modern computing and storage technology are at their very limits trying to manage the data • It takes world-wide collaborations of large numbers of scientists to conduct the science and analyze the data from a single instrument, and so the data from the instrument must be distributed all over the world • The volume of data generated by such instruments has reached the level of many petabytes/year – the point where dedicated 10 – 100 Gb/s networks that span the country and internationally are required to distribute the data

Networks for The “New Era” of Scientific Data • Designing and building networks and providing suitable network services to support science data movement has pushed R&E networks to the forefront of network technology: There are currently no commercial networks that handle the size of the individual data flows generated by modern science • The aggregate of small flows in commercial networks is, of course, much larger – but not by as much as one might think – the Google networks only transport about 1000x the amount of data the ESnet transports • What do the modern systems of science look like? • They are highly distributed and bandwidth intensive

LHC will be the largest scientific experiment and generate the most data that the scientific community has ever tried to manage.The data management model involves a world-wide collection of data centers that store, manage, and analyze the data and that are integrated through network connections with typical speeds in the 10+ Gbps range. CMS is one of two major experiments – each generates comparable amounts of data closely coordinated and interdependent distributed systems that must have predictable intercommunication for effective functioning

The “new era” of science data will likely tax network technology • Individual Labs now fill 10G links – Fermilab (an LHC Tier 1 Data Center) has 5 X 10Gb/s links to ESnet hubs in Chicago and can easily fill one or more of them for sustained periods of time • The “casual” increases in overall network capacity are less likely to easily meet future needs estimated historical 1 Exabyte 1 Petabyte Experiment Generated Data, Bytes Data courtesy of Harvey Newman, Caltech, and Richard Mount, SLAC

Planning the Future Network 1)Data characteristics of instruments and facilities • What data will be generated by instruments coming on-line over the next 5-10 years (including supercomputers)? 2) Examining the future process of science • How and where will the new data be analyzed and used – that is, how will the process of doing science change over 5-10 years? 3) Observing traffic patterns • What do the trends in network patterns predict for future network needs?

Motivation for Overall Capacity: ESnet Traffic has Increased by10X Every 47 Months, on Average, Since 1990 Apr., 2006 1 PBy/mo. Nov., 2001 100 TBy/mo. Jul., 1998 10 TBy/mo. 53 months Oct., 1993 1 TBy/mo. 40 months Terabytes / month Aug., 1990 100 GBy/mo. 57 months 38 months Log Plot of ESnet Monthly Accepted Traffic, January, 1990 – January 2008

The International Collaborators of DOE’s Office of Science Drive ESnet Design for International Connectivity = the R&E source or destination of ESnet’s top 100 sites (all R&E) (the DOE Lab destination or source of each flow is not shown) Currently most of ESnet’s traffic (>85%) goes to and comes from outside of ESnet. This reflects the highly collaborative nature of large-scale science (which is one of the main focuses of DOE’s Office of Science).

A small number of large data flows now dominate the network traffic – this motivates virtual circuits as a key network service ESnet total traffic, TBy/mo, January 1990 – April 2008 LHC Tier 1site FNAL Outbound Traffic (courtesy Phil DeMar, Fermilab)

Requirements from Scientific Instruments and Facilities • Bandwidth • Adequate network capacity to ensure timely movement of data produced by the facilities • Connectivity • Geographic reach sufficient to connect users and analysis systems to SC facilities • Services • Guaranteed bandwidth, traffic isolation, end-to-end monitoring • Network service delivery architecture • Service Oriented Architecture / Grid / “Systems of Systems”

ESnet Architecture - ESnet4 • ESnet4 was built to address specific Office of Science program requirements. The result is a much more complex and much higher capacity network. • ESnet3 2000 to 2005: • A routed IP network with sites singly attached to a national core ring • Very little peering redundancy • ESnet4 in 2008: • The new Science Data Network (blue) is a switched network providing guaranteed bandwidth for large data movement • All large science sites are dually connected on metro area rings or dually connected directly to core ring for reliability • Rich topology increases the reliability of the network

ESnet4 – IP and SDN • ESnet4 is one network with two “sides” • The IP network is a high capacity (10G) best-effort routed infrastructure • Rich commodity peering infrastructure ensures global connectivity • Diverse R&E peering infrastructure provides full global high-bandwidth connectivity for scientific collaboration • High performance – 10G of bandwidth is adequate for many scientific collaborations • Services such as native IPv6 and multicast • Science Data Network (SDN) is a virtual circuit infrastructure with bandwidth guarantees and traffic engineering capabilities • Highly scalable – just add more physical circuits as demand increases • Interoperable – compatible with virtual circuit infrastructures deployed by Internet2, CANARIE, GEANT and others • Guaranteed bandwidth • Interdomain demark is a VLAN tag – virtual circuits can be delivered to sites or other networks even when end2end reservations are not possible

ESnet4 Backbone Projected for December 2008 ESnet IP switch only hubs ESnet IP core ESnet Science Data Network core ESnet SDN core, NLR links (existing) Lab supplied link MAN link International IP Connections ESnet SDN switch hubs Layer 1 optical nodes not currently in ESnet plans Layer 1 optical nodes - eventual ESnet Points of Presence LHC/CERN Seattle PNNL Port. USLHC MAN LAN(AofA) Boston Boise USLHC StarLight Chicago Clev. 20G 20G Sunnyvale Phil Denver NYC BNL KC SLC Pitts. 20G Wash. DC FNAL 20G LLNL ORNL Raleigh Las Vegas LANL 20G Tulsa LA Nashville Albuq. GA SDSC Atlanta San Diego Jacksonville El Paso ESnet IP switch/router hubs BatonRouge Houston Lab site Lab site – independent dual connect.

ESnet4 As Planned for 2010 ESnet IP switch only hubs ESnet IP core ESnet Science Data Network core ESnet SDN core, NLR links (existing) Lab supplied link MAN link International IP Connections ESnet SDN switch hubs Layer 1 optical nodes not currently in ESnet plans Layer 1 optical nodes - eventual ESnet Points of Presence LHC/CERN Seattle PNNL Port. USLHC MAN LAN(AofA) Boston Boise 40G USLHC StarLight 50G 40G Chicago Clev. 50G 50G Sunnyvale Phil Denver NYC BNL KC SLC Pitts. 50G 40G 50G Wash. DC FNAL 50G 30G LLNL 40G 40G ORNL Raleigh Las Vegas LANL 50G Tulsa LA Nashville Albuq. GA 30G 30G 40G SDSC Atlanta San Diego Jacksonville 40G El Paso 40G ESnet IP switch/router hubs BatonRouge Houston Lab site Lab site – independent dual connect.

Traffic Engineering on SDN – OSCARS • ESnet On-demand Secure Circuits and Advance Reservation System (OSCARS) http://www.es.net/oscars/ • Provides edge to edge layer 2 or layer 3 virtual circuits across ESnet • Guaranteed bandwidth • Advance reservation • Interoperates with many other virtual circuit infrastructures to provide end2end guaranteed bandwidth service for geographically dispersed scientific collaborations (see next slide) • Interoperability is critical, since science traffic flows cross many administrative domains in the general case

OSCARS Interdomain Collaborative Efforts • Terapaths • Inter-domain interoperability for layer 3 virtual circuits demonstrated (3Q06) • Inter-domain interoperability for layer 2 virtual circuits demonstrated at SC07 (4Q07) • LambdaStation • Inter-domain interoperability for layer 2 virtual circuits demonstrated at SC07 (4Q07) • Internet2 DCN/DRAGON • Inter-domain exchange of control messages demonstrated (1Q07) • Integration of OSCARS and DRAGON has been successful (1Q07) • GEANT2 AutoBAHN • Inter-domain reservation demonstrated at SC07 (4Q07) • DICE • First draft of topology exchange schema has been formalized (in collaboration with NMWG) (2Q07), interoperability test demonstrated 3Q07 • Initial implementation of reservation and signaling messages demonstrated at SC07 (4Q07) • Nortel • Topology exchange demonstrated successfully 3Q07 • Inter-domain interoperability for layer 2 virtual circuits demonstrated at SC07 (4Q07) • UVA • Demonstrated token based authorization concept with OSCARS at SC07 (4Q07) • OGF NML-WG • Actively working to combine work from NMWG and NDL • Documents and UML diagram for base concepts have been drafted (2Q08) • GLIF GNI-API WG • In process of designing common API and reference middleware implementation

The Network As A Tool For Science • Science is becoming much more data intensive • Data movement is one of the great challenges facing many scientific collaborations • Getting the data to the right place is important • Scientific productivity follows data locality • Therefore, a high-performance network that enables high-speed data movement as a functional service is a tool for enhancing scientific productivity and enabling new scientific paradigms • Users often do not know how to use the network effectively without help – in order to be successful, networks must provide usable services to scientists

Some user groups need more help than others • Collaborations with a small number of scientists typically do not have network tuning expertise • They rely on their local system and network admins (or grad students) • They often don’t have much data to move (typically <1TB) • Therefore, they avoid using the network for data transfer if possible • Mid-sized collaborations have a lot more data, but similar expertise limitations • More scientists per collaboration, much larger data sets (10s to 100s of terabytes) • Most mid-sized collaborations still rely on local system and networking staff, or supercomputer center system and networking staff • Large collaborations (HEP, NP) are big enough to have their own internal software shops • Dedicated people for networking, performance tuning, etc • Typically need much less help • Often held up (erroneously) as an example to smaller collaborations • These groupings are arbitrary and approximate, but this taxonomy illustrates some points of leverage (e.g. data sources, supercomputer centers)

Rough user grouping by collaboration data set size High Low High Small data instrument science (Light Source users, Nanoscience Centers, Microscopy) Scientists per collaboration Number of collaborations Supercomputer simulations (Climate, Fusion, Bioinformatics) Large data instrument science (HEP, NP) A few large collaborations have internal software and networking groups High Low Low Approximate data set size

Bandwidth necessary to transfer Y bytes in X time

How Can Networks Enable Science? • Build the network infrastructure with throughput in mind • Cheap switches often have tiny internal buffers and cannot reliably carry high-speed flows over long distances • Fan-in is a significant problem that must be accounted for • Every device in the path matters – routers, switches, firewalls, whatever • Firewalls often cause problems that are hard to diagnose (in many cases, routers can provide equivalent security without degrading performance) • Provide visibility into the network • Test and measurement hosts are critical • Many test points in the network  better problem isolation • If possible, buy routers that can count packets reliably because sometimes this is the only way to find the problem • PerfSONAR is being widely deployed for end-to-end network monitoring • Work with the science community • Don’t wait for users to figure it out on their own • Work with major resources to help tune data movement services between dedicated hosts • Remember that data transfer infrastructures are systems of systems – success usually requires collaboration between LAN, WAN, Storage and Security • Provide information to help users – e.g. http://fasterdata.es.net/

Enabling Chinese-American Science Collaborations • There are several current collaborations between US DOE laboratories and Chinese institutions • LHC/CMS requires data movement between IHEP and Fermilab • Daya Bay Neutrino Experiment requires data movement between detectors at Daya Bay and NERSC at Lawrence Berkeley National Laboratory and Brookhaven National Laboratory • EAST Tokamak – collaboration with US Fusion Energy Sciences sites such as General Atomics • Others to come, I’m sure • Getting data across the Pacific can be difficult (250 millisecond round trip times are common) • However, we know this can be done because others have succeeded • 1Gbps host to host network throughput between Brookhaven and KISTI in South Korea – this is expected to be 3-5 hosts wide in production • 60MB/sec per data mover from Brookhaven to CCJ in Japan (typically 6 hosts wide, for a total of 360MB/sec or 2.8Gbps) • We look forward to working together to enable the scientific collaborations of our constituents!

Questions? • http://www.es.net/ • http://fasterdata.es.net/

Connecting USA DOE Labs to Science World: ESnet

Connecting USA DOE Labs to Science World: ESnet

Presentation Transcript

Connecting the World

Connecting The World

Connecting the Science to the Power System

Simply Connecting the World

Connecting Inverness to the World

Connecting lives around the world

Connecting to the IP Multicast World

Unit 18: Connecting ourselves to the world

Connecting the MEAP to the Real World

Connecting the world of public transport

Connecting with… the World

Welcome to the World of Science

Connecting the Australian desert to the rest of the world

Committed to connecting the world

DOE Office of Science

ITU …connecting the world

CPCNet Connecting the World to China

DOE Office of Science and ESnet

ESnet USA

Connecting the best of England’s tourism suppliers to the world

ITU …connecting the world