160 likes | 283 Views
UltraLight: Network & Applications Research at UF. Dimitri Bourilkov University of Florida CISCO - UF Collaborative Team Meeting Gainesville, FL, September 12, 2006. Overview a NSF Project. The UltraLight Team.
E N D
UltraLight: Network & Applications Research at UF Dimitri Bourilkov University of Florida CISCO - UF Collaborative Team Meeting Gainesville, FL, September 12, 2006
Overview a NSF Project UltraLight
The UltraLight Team • Steering Group: H. Newman (Caltech, PI), P. Avery (U. Florida), J. Ibarra (FIU), S. McKee (U. Michigan) • Project Management: Richard Cavanaugh (Project Coordinator), PI and Working Group Coordinators: • Network Engineering: Shawn McKee (Michigan);+ S. Ravot (LHCNet), R. Summerhill (Abilene/HOPI), D. Pokorney (FLR), J. Ibarra (WHREN, AW), C. Guok (ESnet), L. Cottrell (SLAC), D. Petravick, M. Crawford (FNAL), S. Bradley, J. Bigrow (BNL), et al. • Applications Integration: Frank Van Lingen (Caltech);+ I. Legrand (MonALISA), J. Bunn (GAE + TG); C. Steenberg, M. Thomas (GAE), Sanjay Ranka (Sphinx) et al. • Physics Analysis User Group: Dimitri Bourilkov (UF; CAVES, Codesh) • Network Research, Wan In Lab Liaison: Steven Low (Caltech) • Education and Outreach: Laird Kramer (FIU), + H. Alvarez, J. Ibarra, H. Newman UltraLight
Large Hadron Collider CERN, Geneva: 2007 Start • pp s =14 TeV L=1034 cm-2 s-1 • 27 km Tunnel in Switzerland & France CMS TOTEM pp, general purpose; HI 5000+ Physicists 250+ Institutes 60+ Countries Atlas ALICE : HI LHCb: B-physics Higgs, SUSY, Extra Dimensions, CP Violation, QG Plasma, … the Unexpected Challenges: Analyze petabytes of complex data cooperativelyHarness global computing, data &NETWORKresources UltraLight
DISUN: LHC Data Grid Hierarchy CERN/Outside Resource Ratio ~1:4Tier0/( Tier1)/( Tier2) ~1:2:2 10-40+ Gbps 2.5 - 30 Gbps 4 of 7 US CMS Tier2s ShownWith ~8 MSi2k; 1.5 PB Disk by 2007>100 Tier2s at LHC CERN/Outside Ratio Smaller; Expanded Role of Tier1s & Tier2s:Greater Reliance on Networks UltraLight
Tier-2s ~100 Identified – Number still growing UltraLight
HENP Bandwidth Roadmap for Major Links (in Gbps) Continuing Trend: ~1000 Times Bandwidth Growth Per Decade;HEP: Co-Developer as well as Application Driver of Global Nets UltraLight
Data Samples and Transport Scenarios • 107 Events is a typical data sample for analysis or reconstruction development [Ref.: MONARC]; equivalent to just ~1 day’s running • Transporting datasets with quantifiable high performance is needed for efficient workflow, and thus efficient use of CPU and storage resources • One can only transmit ~2 RAW + REC or MC samples per day on a 10G path • Movement of 108 event samples (e.g. after re-reconstruction) will take ~1 day (RECO) to ~1 week (RAW, MC) with a 10G link at high occupancy • Transport of significant data samples will require one, or multiple 10G links UltraLight
Goal: Enable the network as an integrated managed resource Meta-Goal:Enable physics analysis & discoveries which otherwise could not be achieved Caltech, Florida, Michigan, FNAL, SLAC, CERN, BNL, Internet2/HOPI UERJ (Rio), USP(Sao Paulo), FIU, KNU (Korea), KEK (Japan),TIFR (India), PERN (Pakistan) NLR, ESnet, CENIC, FLR, MiLR, US Net, Abilene, JGN2, GLORIAD, RNP, CA*net4; UKLight, Netherlight, Taiwan Cisco, Neterion, Sun … UltraLight Goals • Next generation Information System, with the network as an integrated, actively managed subsystem in a global Grid • Hybrid network infrastructure: packet-switched + dynamic optical paths • End-to-end monitoring; Realtime tracking and optimization • Dynamic bandwidth provisioning; Agent-based services spanning all layers UltraLight
Large Scale Data Transfers • Network aspect: Bandwidth*Delay Product (BDP); we have to use TCP windows matching it in the kernel AND the application • On a local connection with 1GbE and RTT 0.19 ms, to fill the pipe we need around 2*BDP 2*BDP = 2*1Gb/s*0.00019s = ~ 48 KBytes Or, for a 10 Gb/s LAN: 2*BDP = ~ 480 KBytes • Now on the WAN: from Florida to Caltech the RTT is 115 ms. So for 1 Gb/s to fill the pipe we need 2*BDP = 2*1Gb/s*0.115s = ~ 28.8 MBytes etc. • User aspect: are the servers on both ends capable of matching these rates for useful disk-to-disk? Tune kernels, get highest possible disk read/write speed etc. Tables turned: WAN outperforms disk speeds! UltraLight
bbcp Tests bbcp was selected as a starting tool for data transfers on the WAN: • Supports multiple streams, highly tunable (window size etc), peer-to-peer type • Well supported by Andy Hanushevsky from SLAC • Is used successfully in BaBar • I have used it in 2002 for CMS production: massive data transfers from Florida to CERN; the only limit observed at the time was disk writing speed (LAN), network (WAN) • Starting point Florida Caltech: < 0.5 MB/s on the WAN, very poor performance UltraLight
Evolution of Tests Leading to SC|05 • End points in Florida (uflight1) and Caltech (nw1): AMD Opterons over UL network • Tuning of Linux kernels (2.6.x) and bbcp window sizes – coordinated iterative procedure • Current status (for file sizes ~ 2GB): • 6-6.5 Gb/s with iperf • up to 6 Gb/s memory to memory • 2.2 Gb/s ramdisk remote disk write • the speed was the same writing to SCSI disk which is supposedly less than 80 MB/s or writing to a raid array, so de facto it always goes first to memory cache (the Caltech node has 16 GB ram) • Used successfully with up to 8 bbcp processes in parallel from Florida to the show floor in Seattle; CPU load still OK UltraLight
bbcp Examples Florida Caltech [bourilkov@uflight1 data]$ iperf -i 5 -c 192.84.86.66 -t 60 ------------------------------------------------------------ Client connecting to 192.84.86.66, TCP port 5001 TCP window size: 256 MByte (default) ------------------------------------------------------------ [ 3] local 192.84.86.179 port 33221 connected with 192.84.86.66 port 5001 [ 3] 0.0- 5.0 sec 2.73 GBytes 4.68 Gbits/sec [ 3] 5.0-10.0 sec 3.73 GBytes 6.41 Gbits/sec [ 3] 10.0-15.0 sec 3.73 GBytes 6.40 Gbits/sec [ 3] 15.0-20.0 sec 3.73 GBytes 6.40 Gbits/sec bbcp: uflight1.ultralight.org kernel using a send window size of 20971584 not 10485792 bbcp -s 8 -f -V -P 10 -w 10m big2.root uldemo@192.84.86.66:/dev/null bbcp: Sink I/O buffers (245760K) > 25% of available free memory (231836K); copy may be slow bbcp: Creating /dev/null/big2.root Source cpu=5.654 mem=0K pflt=0 swap=0 File /dev/null/big2.root created; 1826311140 bytes at 432995.1 KB/s 24 buffers used with 0 reorders; peaking at 0. Target cpu=3.768 mem=0K pflt=0 swap=0 1 file copied at effectively 260594.2 KB/s bbcp -s 8 -f -V -P 10 -w 10m big2.root uldemo@192.84.86.66:dimitri bbcp: uflight1.ultralight.org kernel using a send window size of 20971584 not 10485792 bbcp: Creating ./dimitri/big2.root Source cpu=5.455 mem=0K pflt=0 swap=0 File ./dimitri/big2.root created; 1826311140 bytes at 279678.1 KB/s 24 buffers used with 0 reorders; peaking at 0. Target cpu=10.065 mem=0K pflt=0 swap=0 1 file copied at effectively 150063.7 KB/s UltraLight
bbcp Examples Caltech Florida [uldemo@nw1 dimitri]$ iperf -s -w 256m -i 5 -p 5001 -l 8960 ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 512 MByte (WARNING: requested 256 MByte) ------------------------------------------------------------ [ 4] local 192.84.86.66 port 5001 connected with 192.84.86.179 port 33221 [ 4] 0.0- 5.0 sec 2.72 GBytes 4.68 Gbits/sec [ 4] 5.0-10.0 sec 3.73 GBytes 6.41 Gbits/sec [ 4] 10.0-15.0 sec 3.73 GBytes 6.40 Gbits/sec [ 4] 15.0-20.0 sec 3.73 GBytes 6.40 Gbits/sec [ 4] 20.0-25.0 sec 3.73 GBytes 6.40 Gbits/sec bbcp -s 8 -f -V -P 10 -w 10m big2.root uldemo@192.84.86.179:/dev/null bbcp: Sink I/O buffers (245760K) > 25% of available free memory (853312K); copy may be slow bbcp: Source I/O buffers (245760K) > 25% of available free memory (839628K); copy may be slow bbcp: nw1.caltech.edu kernel using a send window size of 20971584 not 10485792 bbcp: Creating /dev/null/big2.root Source cpu=5.962 mem=0K pflt=0 swap=0 File /dev/null/big2.root created; 1826311140 bytes at 470086.2 KB/s 24 buffers used with 0 reorders; peaking at 0. Target cpu=4.053 mem=0K pflt=0 swap=0 1 file copied at effectively 263793.4 KB/s UltraLight
SuperComputing 05 Bandwidth Challenge Above 100 Gbps for Hours 475 TBytes Transported in < 24 h UltraLight
Outlook • The UltraLight network is already very performant • SC|05 was a big success • The hard problem from the user perspective now is to match it with servers capable of sustained rates for large files > 20 GB (when the memory caches are exhausted); fast disk writes are key (raid arrays) • To fill 10 Gb/s pipes we need several pairs (3-4) of servers • Next step: disk-to-disk transfers between Florida, Caltech, Michigan, FNAL, BNL, CERN, preparations for SC|06 (next talk) • More info: http://ultralight.caltech.edu UltraLight