240 likes | 395 Views
The Design and Demonstration of the UltraLight Network Testbed. http://ultralight.caltech.edu Presented by Xun Su xsu@hep.caltech.edu. GridNets 2006, Oct 2 nd , 2006. Long Term Trends in Network Traffic Volumes: 300-1000X/10Yrs.
E N D
The Design and Demonstration of the UltraLight Network Testbed http://ultralight.caltech.edu Presented by Xun Su xsu@hep.caltech.edu GridNets 2006, Oct 2nd, 2006
Long Term Trends in Network Traffic Volumes: 300-1000X/10Yrs ESnet Accepted Traffic 1990 – 2005Exponential Growth: Avg. +82%/Year for the Last 15 Years L. Cottrell 10 Gbit/s W. Johnston 600 500 400 TERABYTES Per Month 300 Progressin Steps 200 100 • SLAC Traffic ~400 Mbps; Growth in Steps (ESNet Limit): ~ 10X/4 Years. • Summer ‘05: 2x10 Gbps links: one for production, one for R&D • Projected: ~2 Terabits/s by ~2014
Motivation • Provide the network advances required to enable petabyte-scale analysis of globally distributed data. • Current Grid-based infrastructures provide massive computing and storage resources, but are currently limited by their treatment ofthe network as anexternal, passive, and largely unmanaged resource. • The mission of UltraLight is to: • Develop and deploy prototype global services which broaden existing Grid computing systems by promoting the network as an actively managed component. • Integrate and test UltraLight in Grid-based physics production and analysis systems currently under development in ATLAS and CMS. • Engineer and operate a trans- and intercontinental optical network testbed for broader community
UltraLight Backbone • The UltraLight testbed is a non-standard core network with dynamic links and varying bandwidth inter-connecting our nodes. • The core of UltraLight is dynamically evolving as function of available resources on other backbones such as NLR, HOPI, Abilene and ESnet. • The main resources for UltraLight: • US LHCnet (IP, L2VPN, CCC) • Abilene (IP, L2VPN) • ESnet (IP, L2VPN) • UltraScienceNet (L2) • Cisco Research Wave (10 Gb Ethernet over NLR) • NLR Layer 3 Service • HOPI NLR waves (Ethernet; provisioned on demand) • UltraLight nodes: Caltech, SLAC, FNAL, UF, UM, StarLight, CENIC PoP at LA, CERN, Seattle
UltraLight Network Engineering • GOAL: Determine an effective mix of bandwidth-management techniques for this application-space, particularly: • Best-effort and “scavenger” using “effective” protocols • MPLS with QOS-enabledpacket switching • Dedicated paths provisioned with TL1 commands, GMPLS • PLAN: Develop, Test the most cost-effective integrated combination of network technologies on our unique testbed: • Exercise UltraLight applicationson NLR, Abilene and campus networks, as well as LHCNet, and our international partners • Deploy and systematically study ultrascale protocol stacks (such as FAST) addressing issues of performance & fairness • Use MPLS/QoS and other forms of BW management, to optimize end-to-end performance among a set of virtualized disk servers • Address “end-to-end” issues, including monitoring and end-hosts
UltraLight: Effective Protocols • The protocols used to reliably move data are a critical component of Physics “end-to-end” use of the network • TCP is the most widely used protocol for reliable data transport, but is becoming ever more ineffective for higher and higher bandwidth-delay networks. • UltraLight is exploring extensions to TCP (HSTCP, Westwood+, HTCP, FAST, MaxNet) designed to maintain fair-sharing of networks and, at the same time, to allow efficient, effective use of these networks.
FAST others FAST Protocol Comparisons Gigabit WAN • 5x higher utilization • Small delay Random packet loss • 10x higher throughput • Resilient to random loss FAST: 95% Reno: 19%
Optical Path Developments • Emerging “light path” technologies are arriving: • They can extend and augment existing grid computing infrastructures, currently focused on CPU/storage, to include the network as an integral Grid component. • Those technologies seem to be the most effective way to offer network resource provisioning on-demand between end-systems. • We are developing a multi-agent system for secure light path provisioning based on dynamic discovery of the topology in distributed networks (VINCI) • We are working to further develop this distributed agent system and to provide integrated network services capable of efficiently using and coordinating shared, hybrid networks, improving the performance and throughput for data intensive grid applications. • This includes services able to dynamically configure routers and to aggregate local traffic on dynamically created optical connections.
GMPLS Optical Path Provisioning • Collaboration efforts between UltraLight and Enlightened Computing. • Interconnecting Calient switches across the US for the purpose of unified GMPLS control plane. • Control Plane: IPv4 connectivity between site for control messages • Data Plane: • Cisco Research wave: between LA and Starlight • EnLIGHTened wave: between StarLight and MCNC Raleigh • LONI wave: between Starlight and LSU Baton Rouge over LONI DWDM.
Monitoring for UltraLight • Realtime end-to-end Network monitoring is essentialfor UltraLight. • We need to understand our network infrastructure and track its performance both historically and in real-time to enable the network as a managed robust component of our infrastructure. Caltech’s MonALISA: http://monalisa.cern.ch SLAC’s IEPM: http://www-iepm.slac.stanford.edu/bw/ • We have a new effort to push monitoring to the “ends” of the network: the hosts involved in providing services or user workstations.
MonALISA UltraLight Repository The UL repository: http://monalisa-ul.caltech.edu:8080/
MonALISA MonALISA MonALISA ML Agent ML Agent ML Agent ML Agent ML Agent ML Agent The Functionality of the VINCI System ML proxy services Layer 3 ROUTERS Agent ETHERNET LAN-PHYor WAN-PHY Layer 2 Agent Agent DWDM FIBER Layer 1 Agent Agent Site A Site B Site C
SC|05 Global Lambdas for Particle Physics • We previewed the global-scale data analysis of the LHC Era Using a realistic mixture of streams: Organized transfer of multi-TB event datasets; plus Numerous smaller flows of physics data that absorb the remaining capacity • We used Twenty Two [*] 10 Gbps waves to carry bidirectional traffic between Fermilab, Caltech, SLAC, BNL, CERN and other partner Grid sites including: Michigan, Florida, Manchester, Rio de Janeiro (UERJ) and Sao Paulo (UNESP) in Brazil, Korea (KNU), and Japan (KEK) • The analysis software suites are based on the Grid-enabled UltraLight Analysis Environment (UAE) developed at Caltech and Florida, as well as the bbcp and Xrootd applications from SLAC, and dcache/SRM from FNAL • Monitored by Caltech’s MonALISAglobal monitoring and control system [*] 15 at the Caltech/CACR Booth and 7 at the FNAL/SLAC Booth
Switch and Server Interconnections at the Caltech Booth • 15 10G Waves • 64 10G Switch Ports: 2 Fully Populated Cisco 6509Es • 43 Neterion 10 GbE NICs • 70 nodes with 280 Cores • 200 SATA Disks • 40 Gbps (20 HBAs) to StorCloud • Thursday - Sunday
HEP at SC2005Global Lambdas for Particle Physics Monitoring NLR, Abilene/HOPI, LHCNet, USNet,TeraGrid, PWave, SCInet, Gloriad, JGN2, WHREN, other Int’l R&E Nets, and 14000+ Grid Nodes at 250 Sites (250k Paramters) Simultaneously I. Legrand
Global Lambdas for Particle PhysicsCaltech/CACR and FNAL/SLAC Booths RESULTS • 151 Gbps peak, 100+ Gbps of throughput sustained for hours: 475 Terabytes of physics data transported in < 24 hours • 131 Gbps measured by SCInet BWC team on 17 of our waves • Sustained rate of 100+ Gbps translates to > 1 Petayte per day • Linux kernel optimized for TCP-based protocols, including Caltech’s FAST • Surpassing our previous SC2004 BWC Record of 101 Gbps
475 TBytes Transported in < 24 Hours Sustained Peak Projects to > 1 Petabyte Per Day
It was the first time: a struggle for the equipment and the team We will stabilize, package and more widely deploy these methods and tools in 2006
SC05 BWC Lessons Learned • Take-aways from this Marathon exercise: • An optimized Linux kernel (2.6.12 + FAST-TCP + NFSv4) for data transport; after 7 full kernel-build cycles in 4 days • Scaling up SRM/gridftp to near 10 Gbps per wave, using Fermilab’s production clusters • A newly optimized application-level copy program, bbcp, that matches the performance of iperf under some conditions • Extensions of SLAC’s Xrootd, an optimized low-latency file access application for clusters, across the wide area • Understanding of the limits of 10 Gbps-capable computer systems, network switches and interfaces under stress