100 likes | 309 Views
Agenda. NSF SDCI Project Review, Oct. 29, 2012 9:00-9:20: Overview, MV, UVA 9:20-9:50: Details, Zhenyang Liu, UVA 9:50-10:00: GUI, Tyler Clinch, UVA 10:00-10:10: Traffic isolation, Zhenzhen Yan, UVA 10:10-10:30: Bob Russell, UNH 10:30-10:40: Break 10:40-11:00: Tim Carlin, UNH
E N D
Agenda • NSF SDCI Project Review, Oct. 29, 2012 • 9:00-9:20: Overview, MV, UVA • 9:20-9:50: Details, Zhenyang Liu, UVA • 9:50-10:00: GUI, Tyler Clinch, UVA • 10:00-10:10: Traffic isolation, Zhenzhen Yan, UVA • 10:10-10:30: Bob Russell, UNH • 10:30-10:40: Break • 10:40-11:00: Tim Carlin, UNH • 11:00-11:20: John Dennis, NCAR • 11:20-12:00: Discussion • 12:00-1:30: Lunch/Break • 4:00–4:30: Diversity activities, Carolyn Vallas, CDE, UVA Supported by NSF grants: OCI-1127340,OCI-1127228, OCI-1127341 Questions on this slide set: Malathi Veeraraghavan, mv5g@virginia.edu
Year 1 Accomplishments • Wide-area data movement traffic characterized (UVA and NCAR) • GridFTP logs obtained and analyzed • Logins obtained on NERSC and SLAC data-transfer nodes and experiments conducted for throughput variance studies • Published an SC paper, and developing GUI for broader impact • Experiments on DOE ANI WAN 100GE and LIMAN testbeds • Develop tools for controlled data collection for variance studies • TCP behavior on 100 Gbps paths (impact of bit errors) • Compare RoCE over L2 circuit vs TCP over IP-routed path (UNH and UVA) • Engineering solutions: • GridFTP integrated with RoCE and IDC client (UNH and UVA) • Datacenter networking: b/g acqd. and prob. identified (all 3) • Established a wide network of collaborators
Acknowledgment • NSF OCI & co-PIs: Kevin Thompson, Bob Russell and John Dennis • ESnet: Chris Tracy, Brian Tierney, Joe Burrescia, Jon Dugan, Andy Lake, Tareq Saif, and Eric Pouyoul • ANL: Ian Foster, Raj Kettimuthu, and Linda Winkler • NERSC: Brent Draney, Jason Hick, Jason Lee • SLAC: Yee-Ting Li and Wei Yang • Internet2: Jason Zurawski and Eric Boyd • V. Tech: Jeff Crowder, John Nichols, John Lawson • UCAR: Pete Siemsen, Steve Emmerson, Marla Meehl • Boston U: Chuck Von Lichtenberg & David Starobinski • GridFTP data: BNL (Scott Bradley and John Bigrow), NICS (Victor Hazelwood), ORNL (Galen Shipman and Scott Atchley) • RoCE: Ezra Kissel, IU, D. K. Panda, OSU • LIGO FDT: Ashish Mahabal, Caltech
Wide-area data movement • Questions we asked (“science” phase): • are science data transfer rates high or still low? • is there is significant variance in throughput? • inspite of increasing rates (which means shorter transfer durations), are transfer sizes large enough to justify VC setup overhead? • Method used to answer • Obtained GridFTP logs from four sources • Wrote R statistical programs to analyze logs (supporting: shell, awk, Javascript, SQL) “Scientists discover that which is Engineers create that which never was”
Wide-area data movement • Answers: • Transfer rates: • on 4 analyzed paths, found max rate of 4.3 Gbps and on all 4 paths max rate was 2.5 Gbps • this is a significant fraction of link capacity (10 Gbps) • Throughput variance • significant: coefficient of variation: 30%-75% (4 paths) • causes? • transfer parameters (e.g., parallel streams, close and open TCP connections between files, striping) • competition for server resources by concurrent transfers • not the network (link util is low, and packet losses rare) “Scientists discover that which is Engineers create that which never was”
Answers • Inspite of increasing rates (which means shorter transfer durations), are transfer sizes large enough to justify VC setup overhead? • Yes, significant fraction of transfers occur in sessions whose durations are longer than 10 times VC setup delay • Used hypothetical third-quartile transfer rate to compute duration rather than using actual durations to make the above determination
Experiments on DOE ANI WAN 100GE and LIMAN testbeds • Leveraged our ESnet DOE project relationship to gain login access to ANI 100 GbE WAN and Long Island MAN testbeds • Experiments run: • Developed tools on LIMAN testbed for controlled data collection for variance studies • Impact of competition for CPU and packet losses • Deployed these tools on production NERSC and SLAC GridFTP servers and data collected for controlled transfers – analysis ongoing • Reserved whole 100 GE testbed (NERSC and ANL) for weekend and ran continuous GridFTP/TCP transfers • No losses! TCP throughput stayed close to 96 Gbps • Why? Bit errors corrected by FEC (thanks to Chris Tracy) • RoCE over L2 circuit vs TCP over IP-routed path • Results in UNH presentations
ANI 100G Testbed Brian Tierney DOE PI meeting, March 1-2, 2012
Engineering solutions • Moving forward with GridFTP+RoCE+IDC client integration • Testing on DYNES – Internet ION planned • UVA proposal for DYNES approved: awaiting delivery • Remote logins obtained: • FRGP (regional – so no FDT server, but IDC controller) • Boston University (CNS project collaboration) • Requests made: Julio Ibarra, Martin Swany • Plan is to give each other logins for wide-area tests • Leveraging collaboration with ESnet in DOE project • Testing IDC Java client with ANI testbed OSCARS IDC [Needs further discussion]
Intra-datacenter networking/apps • John Dennis has identified interesting topology/routing problems • With fewer than 1000 cores, application is CPU limited, but with more than 1000 cores, it is network limited • UNH and NCAR will run MPI apps and collect data • UVA will analyze and design datacenter networking solutions