140 likes | 259 Views
PRISM: High-Capacity Networks that Augment Campus’ General Utility Production Infrastructure. Philip Papadopoulos, PhD. Calit2 and SDSC. Some Perspective on 100Gbps. DDR3 1600MHz Memory DIMM = 12.8GB/s (102.4Gbps)
E N D
PRISM: High-Capacity Networks that Augment Campus’ General Utility Production Infrastructure Philip Papadopoulos, PhD. Calit2 and SDSC
Some Perspective on 100Gbps • DDR3 1600MHz Memory DIMM = 12.8GB/s (102.4Gbps) • Triton Compute nodes (24GB/node) enough memory capacity to source 100Gbps for ~2 seconds • High-performance Flash drive @ 500MB/sec, about 24 Flash Drives to fill 100Gbps • @ 250GB each (6TB total) ~ 8 minutes @ 100Gbps • Data Oasis High-Performance Parallel File System @ SDSC (all 10GbE) • 64 Servers @ 72TB each, 2GB/sec Disk-to-network • 4.6PB (102 hours/4.25 Days @ 100Gbps) • 100Gbps is really big from some perspectives, not so from others.
Terminating 100Gbps • You land 100Gbps @ your campus, where does it go from there? • What kinds of devices need to be connected?
Some history at UCSD: A Decade of Leading-edge Research Networks • 2002. ITR: The OptIPuter, $15M • Smarr, PI. Papadopoulos, Ellisman UCSD Co-PIs. DeFanti, Leigh UIC Co-PIs • “If the network ceases to become a bottleneck how does that change the design of distributed programs” • 2004, Quartzite: MRI:Development of Quartzite, a Campus-wide, Terabit-Class, Field-Programmable, Hybrid Switching Instrument for Comparative Studies, $1.48M • Papadopoulos, PI. Smarr, Fainman, Ford, Co-PIs • “Make the network real for OptIPuter experiments”
0.320 Tbps Backplane Bandwidth Juniper T320 20X 6.4 Tbps Backplane Bandwidth Chiaro Estara ½ Mile OptIPuter Network(2005) To CENIC and NLR Dedicated Fibers Between Sites Link Linux Clusters SDSC SDSC SDSC Annex SDSCAnnex Preuss High School JSOE Engineering CRCA SOM Medicine 6thCollege Phys. Sci -Keck Collocation Node M Earth Sciences SIO Source: Phil Papadopoulos, SDSC; Greg Hidley, Cal-(IT)2
Technology Motion • Chiaro (out of business) • Replaced capability with Force10 E1200 • Moved physical center of network to Atkinson Hall (Calit2) • Juniper T320 (Retired) – Upgraded by Campus/SDSC with pair of MX960s • Endpoints replaced/upgraded over time at all sites • Quartzite Introduced DWDM, all-optical, and Wavelength switching • What was constant? • Fiber plant (how we utilized it moved over time) • What was growing • Bigger Data at an increasing number of labs. Instrument capacity.
PRISM@UCSD: Next Generation(NSF Award# OCI-1246396) • NSF Campus Cyberinfrastructure Program (CC-NIE), $500K, 1/1/2013 start date, Papadopoulos. PI. Smarr Co-PI • Replace Quartzite Core • Packet switch only (hybrid not required) • 10GbE, 40GbE, 100GbE Capability • “Small” switch – 11.5Tbit/s full-bisection, 1+Tbit/sec terminated in phase0 • Expansion to more sites on/off campus • Widen the freeway between SDSC and Calit2 • Access to SDSC/XSEDE resources • Campus has committed to 100Gb/s Internet2 connection. Prism is the natural termination network.
Prism@UCSD: Expanding Network Reach for Big Data Users Phil Papadopoulos, SDSC, Calit2, PI
Prism Core Switch – Arista Networks Next Gen 7504: What 11.5Tb/s looks like (< 3KW) This is the Prism core switch (Delivery in March 2013). Will have 10GbE (48 ports), 40GbE (36 ports), and 100GbE short-reach (2 ports). 2 Slots empty for expansion.
Physical Connections • A variety of Transceiver Tech • Copper 10Gbit and 40Gbit for in machine room • SR, LR SFP+ 10GbE, in building and cross-campus • 10GbE DWDM 40KM + Passive Multiplexers • Fiber conservation. • Re-use of Optics for Quartzite • Requires media conversion (DWDM XFPs) • VERY reliable. No multiplexer failures in 5+ years. 1 Transceiver • 10GbE CWDM + Passive multiplexers • SFP+ form factors (direct plug into 7504) • 40GbE LR4, QSFP+. (internally is CWDM). • Choice of transceiver depends on where we are going, how much bandwidth is needed, and the connection point • E.g., Calit2 – SDSC: 12 x 10GbE (2 x LR + 10 DWDM), 2 Fiber pair. • SDSC landing is 10GbE only (today).
What is our Rationale in Prism • Big Data Labs have particular burst bandwidth needs • At UCSD. Number of labs today is roughly 20-25 • Campus backbone is 10GbE/20GbE and serves 50,000 users on a daily basis with ~80K IP addresses • One data burst data transfer on Prism would saturate the campus backbone • Protect the campus network from big data freeway users. • Provide massive network capability in a cost-effective manner • Software defined networking (SDN) is emerging technology to better handle configuration • SDN via OpenFlow will be supported on Prism • Combine ability to experiment while reducing risk of complete network disruption • Easily Bridge to Identified networks • Prism UCSD Production Network (20GbE bridge == Campus Backbone) • Prism XSEDE Resources (Direct connect in SDSC 7508s) • Prism Off-campus, high-capacity (e.g. ESNET, 100GbE Internet2, NLR) • Prism Biotech Mesa surrounding UCSD.
Optiputer/Quartzite Enabled SDSC to Build Low-Cost High-Performance Storage 120Gbps Prism Core
Really Pushing Data from Storage(what 800+ Gbps/sec looks like) Jun 2012 MLAG 485Gb/s + 350Gb/s • Saturation test: IOR testing through Lustre: 835 Gb/s = 104GB/sec • OASIS designed to NOT be an Island. This is why we chose 10GbE instead of IB • Papadopoulos set performance target of 100+GB/sec for Gordon Track 2 Proposal (submitted in 2010). Most people at SDSC thought it was “crazy”
Summary • Big Data + High Capacity inexpensive switching + High Throughput Instruments + Significant Computing and Data Analysis Capacity all form a “perfect storm” • OptIPuter predicted this in 2002, Quartzite amplified that prediction in 2004. We are now here. • You have to work on multiple ends of the problem – Devices, Networks, Cost$ • Key insight: Recognize the fundamental differences between scaling challenges (e.g. Campus 50K users vs. Prism’s 500 Users (the 1%)) • Build for Burst capacity