1 / 16

UltraLight: Network & Applications Research at UF

UltraLight: Network & Applications Research at UF. Dimitri Bourilkov University of Florida CISCO - UF Collaborative Team Meeting Gainesville, FL, September 12, 2006. Overview a NSF Project. The UltraLight Team.

ajaxe
Download Presentation

UltraLight: Network & Applications Research at UF

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. UltraLight: Network & Applications Research at UF Dimitri Bourilkov University of Florida CISCO - UF Collaborative Team Meeting Gainesville, FL, September 12, 2006

  2. Overview a NSF Project UltraLight

  3. The UltraLight Team • Steering Group: H. Newman (Caltech, PI), P. Avery (U. Florida), J. Ibarra (FIU), S. McKee (U. Michigan) • Project Management: Richard Cavanaugh (Project Coordinator), PI and Working Group Coordinators: • Network Engineering: Shawn McKee (Michigan);+ S. Ravot (LHCNet), R. Summerhill (Abilene/HOPI), D. Pokorney (FLR), J. Ibarra (WHREN, AW), C. Guok (ESnet), L. Cottrell (SLAC), D. Petravick, M. Crawford (FNAL), S. Bradley, J. Bigrow (BNL), et al. • Applications Integration: Frank Van Lingen (Caltech);+ I. Legrand (MonALISA), J. Bunn (GAE + TG); C. Steenberg, M. Thomas (GAE), Sanjay Ranka (Sphinx) et al. • Physics Analysis User Group: Dimitri Bourilkov (UF; CAVES, Codesh) • Network Research, Wan In Lab Liaison: Steven Low (Caltech) • Education and Outreach: Laird Kramer (FIU), + H. Alvarez, J. Ibarra, H. Newman UltraLight

  4. Large Hadron Collider CERN, Geneva: 2007 Start • pp s =14 TeV L=1034 cm-2 s-1 • 27 km Tunnel in Switzerland & France CMS TOTEM pp, general purpose; HI 5000+ Physicists 250+ Institutes 60+ Countries Atlas ALICE : HI LHCb: B-physics Higgs, SUSY, Extra Dimensions, CP Violation, QG Plasma, … the Unexpected Challenges: Analyze petabytes of complex data cooperativelyHarness global computing, data &NETWORKresources UltraLight

  5. DISUN: LHC Data Grid Hierarchy CERN/Outside Resource Ratio ~1:4Tier0/( Tier1)/( Tier2) ~1:2:2 10-40+ Gbps 2.5 - 30 Gbps 4 of 7 US CMS Tier2s ShownWith ~8 MSi2k; 1.5 PB Disk by 2007>100 Tier2s at LHC CERN/Outside Ratio Smaller; Expanded Role of Tier1s & Tier2s:Greater Reliance on Networks UltraLight

  6. Tier-2s ~100 Identified – Number still growing UltraLight

  7. HENP Bandwidth Roadmap for Major Links (in Gbps) Continuing Trend: ~1000 Times Bandwidth Growth Per Decade;HEP: Co-Developer as well as Application Driver of Global Nets UltraLight

  8. Data Samples and Transport Scenarios • 107 Events is a typical data sample for analysis or reconstruction development [Ref.: MONARC]; equivalent to just ~1 day’s running • Transporting datasets with quantifiable high performance is needed for efficient workflow, and thus efficient use of CPU and storage resources • One can only transmit ~2 RAW + REC or MC samples per day on a 10G path • Movement of 108 event samples (e.g. after re-reconstruction) will take ~1 day (RECO) to ~1 week (RAW, MC) with a 10G link at high occupancy • Transport of significant data samples will require one, or multiple 10G links UltraLight

  9. Goal: Enable the network as an integrated managed resource Meta-Goal:Enable physics analysis & discoveries which otherwise could not be achieved Caltech, Florida, Michigan, FNAL, SLAC, CERN, BNL, Internet2/HOPI UERJ (Rio), USP(Sao Paulo), FIU, KNU (Korea), KEK (Japan),TIFR (India), PERN (Pakistan) NLR, ESnet, CENIC, FLR, MiLR, US Net, Abilene, JGN2, GLORIAD, RNP, CA*net4; UKLight, Netherlight, Taiwan Cisco, Neterion, Sun … UltraLight Goals • Next generation Information System, with the network as an integrated, actively managed subsystem in a global Grid • Hybrid network infrastructure: packet-switched + dynamic optical paths • End-to-end monitoring; Realtime tracking and optimization • Dynamic bandwidth provisioning; Agent-based services spanning all layers UltraLight

  10. Large Scale Data Transfers • Network aspect: Bandwidth*Delay Product (BDP); we have to use TCP windows matching it in the kernel AND the application • On a local connection with 1GbE and RTT 0.19 ms, to fill the pipe we need around 2*BDP 2*BDP = 2*1Gb/s*0.00019s = ~ 48 KBytes Or, for a 10 Gb/s LAN: 2*BDP = ~ 480 KBytes • Now on the WAN: from Florida to Caltech the RTT is 115 ms. So for 1 Gb/s to fill the pipe we need 2*BDP = 2*1Gb/s*0.115s = ~ 28.8 MBytes etc. • User aspect: are the servers on both ends capable of matching these rates for useful disk-to-disk? Tune kernels, get highest possible disk read/write speed etc. Tables turned: WAN outperforms disk speeds! UltraLight

  11. bbcp Tests bbcp was selected as a starting tool for data transfers on the WAN: • Supports multiple streams, highly tunable (window size etc), peer-to-peer type • Well supported by Andy Hanushevsky from SLAC • Is used successfully in BaBar • I have used it in 2002 for CMS production: massive data transfers from Florida to CERN; the only limit observed at the time was disk writing speed (LAN), network (WAN) • Starting point Florida  Caltech: < 0.5 MB/s on the WAN, very poor performance UltraLight

  12. Evolution of Tests Leading to SC|05 • End points in Florida (uflight1) and Caltech (nw1): AMD Opterons over UL network • Tuning of Linux kernels (2.6.x) and bbcp window sizes – coordinated iterative procedure • Current status (for file sizes ~ 2GB): • 6-6.5 Gb/s with iperf • up to 6 Gb/s memory to memory • 2.2 Gb/s ramdisk  remote disk write • the speed was the same writing to SCSI disk which is supposedly less than 80 MB/s or writing to a raid array, so de facto it always goes first to memory cache (the Caltech node has 16 GB ram) • Used successfully with up to 8 bbcp processes in parallel from Florida to the show floor in Seattle; CPU load still OK UltraLight

  13. bbcp Examples Florida  Caltech [bourilkov@uflight1 data]$ iperf -i 5 -c 192.84.86.66 -t 60 ------------------------------------------------------------ Client connecting to 192.84.86.66, TCP port 5001 TCP window size: 256 MByte (default) ------------------------------------------------------------ [ 3] local 192.84.86.179 port 33221 connected with 192.84.86.66 port 5001 [ 3] 0.0- 5.0 sec 2.73 GBytes 4.68 Gbits/sec [ 3] 5.0-10.0 sec 3.73 GBytes 6.41 Gbits/sec [ 3] 10.0-15.0 sec 3.73 GBytes 6.40 Gbits/sec [ 3] 15.0-20.0 sec 3.73 GBytes 6.40 Gbits/sec bbcp: uflight1.ultralight.org kernel using a send window size of 20971584 not 10485792 bbcp -s 8 -f -V -P 10 -w 10m big2.root uldemo@192.84.86.66:/dev/null bbcp: Sink I/O buffers (245760K) > 25% of available free memory (231836K); copy may be slow bbcp: Creating /dev/null/big2.root Source cpu=5.654 mem=0K pflt=0 swap=0 File /dev/null/big2.root created; 1826311140 bytes at 432995.1 KB/s 24 buffers used with 0 reorders; peaking at 0. Target cpu=3.768 mem=0K pflt=0 swap=0 1 file copied at effectively 260594.2 KB/s bbcp -s 8 -f -V -P 10 -w 10m big2.root uldemo@192.84.86.66:dimitri bbcp: uflight1.ultralight.org kernel using a send window size of 20971584 not 10485792 bbcp: Creating ./dimitri/big2.root Source cpu=5.455 mem=0K pflt=0 swap=0 File ./dimitri/big2.root created; 1826311140 bytes at 279678.1 KB/s 24 buffers used with 0 reorders; peaking at 0. Target cpu=10.065 mem=0K pflt=0 swap=0 1 file copied at effectively 150063.7 KB/s UltraLight

  14. bbcp Examples Caltech  Florida [uldemo@nw1 dimitri]$ iperf -s -w 256m -i 5 -p 5001 -l 8960 ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 512 MByte (WARNING: requested 256 MByte) ------------------------------------------------------------ [ 4] local 192.84.86.66 port 5001 connected with 192.84.86.179 port 33221 [ 4] 0.0- 5.0 sec 2.72 GBytes 4.68 Gbits/sec [ 4] 5.0-10.0 sec 3.73 GBytes 6.41 Gbits/sec [ 4] 10.0-15.0 sec 3.73 GBytes 6.40 Gbits/sec [ 4] 15.0-20.0 sec 3.73 GBytes 6.40 Gbits/sec [ 4] 20.0-25.0 sec 3.73 GBytes 6.40 Gbits/sec bbcp -s 8 -f -V -P 10 -w 10m big2.root uldemo@192.84.86.179:/dev/null bbcp: Sink I/O buffers (245760K) > 25% of available free memory (853312K); copy may be slow bbcp: Source I/O buffers (245760K) > 25% of available free memory (839628K); copy may be slow bbcp: nw1.caltech.edu kernel using a send window size of 20971584 not 10485792 bbcp: Creating /dev/null/big2.root Source cpu=5.962 mem=0K pflt=0 swap=0 File /dev/null/big2.root created; 1826311140 bytes at 470086.2 KB/s 24 buffers used with 0 reorders; peaking at 0. Target cpu=4.053 mem=0K pflt=0 swap=0 1 file copied at effectively 263793.4 KB/s UltraLight

  15. SuperComputing 05 Bandwidth Challenge Above 100 Gbps for Hours 475 TBytes Transported in < 24 h UltraLight

  16. Outlook • The UltraLight network is already very performant • SC|05 was a big success • The hard problem from the user perspective now is to match it with servers capable of sustained rates for large files > 20 GB (when the memory caches are exhausted); fast disk writes are key (raid arrays) • To fill 10 Gb/s pipes we need several pairs (3-4) of servers • Next step: disk-to-disk transfers between Florida, Caltech, Michigan, FNAL, BNL, CERN, preparations for SC|06 (next talk) • More info: http://ultralight.caltech.edu UltraLight

More Related