1 / 10

Agenda

Agenda. NSF SDCI Project Review, Oct. 29, 2012 9:00-9:20: Overview, MV, UVA 9:20-9:50: Details, Zhenyang Liu, UVA 9:50-10:00: GUI, Tyler Clinch, UVA 10:00-10:10: Traffic isolation, Zhenzhen Yan, UVA 10:10-10:30: Bob Russell, UNH 10:30-10:40: Break 10:40-11:00: Tim Carlin, UNH

hamish
Download Presentation

Agenda

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Agenda • NSF SDCI Project Review, Oct. 29, 2012 • 9:00-9:20: Overview, MV, UVA • 9:20-9:50: Details, Zhenyang Liu, UVA • 9:50-10:00: GUI, Tyler Clinch, UVA • 10:00-10:10: Traffic isolation, Zhenzhen Yan, UVA • 10:10-10:30: Bob Russell, UNH • 10:30-10:40: Break • 10:40-11:00: Tim Carlin, UNH • 11:00-11:20: John Dennis, NCAR • 11:20-12:00: Discussion • 12:00-1:30: Lunch/Break • 4:00–4:30: Diversity activities, Carolyn Vallas, CDE, UVA Supported by NSF grants: OCI-1127340,OCI-1127228, OCI-1127341 Questions on this slide set: Malathi Veeraraghavan, mv5g@virginia.edu

  2. Year 1 Accomplishments • Wide-area data movement traffic characterized (UVA and NCAR) • GridFTP logs obtained and analyzed • Logins obtained on NERSC and SLAC data-transfer nodes and experiments conducted for throughput variance studies • Published an SC paper, and developing GUI for broader impact • Experiments on DOE ANI WAN 100GE and LIMAN testbeds • Develop tools for controlled data collection for variance studies • TCP behavior on 100 Gbps paths (impact of bit errors) • Compare RoCE over L2 circuit vs TCP over IP-routed path (UNH and UVA) • Engineering solutions: • GridFTP integrated with RoCE and IDC client (UNH and UVA) • Datacenter networking: b/g acqd. and prob. identified (all 3) • Established a wide network of collaborators

  3. Acknowledgment • NSF OCI & co-PIs: Kevin Thompson, Bob Russell and John Dennis • ESnet: Chris Tracy, Brian Tierney, Joe Burrescia, Jon Dugan, Andy Lake, Tareq Saif, and Eric Pouyoul • ANL: Ian Foster, Raj Kettimuthu, and Linda Winkler • NERSC: Brent Draney, Jason Hick, Jason Lee • SLAC: Yee-Ting Li and Wei Yang • Internet2: Jason Zurawski and Eric Boyd • V. Tech: Jeff Crowder, John Nichols, John Lawson • UCAR: Pete Siemsen, Steve Emmerson, Marla Meehl • Boston U: Chuck Von Lichtenberg & David Starobinski • GridFTP data: BNL (Scott Bradley and John Bigrow), NICS (Victor Hazelwood), ORNL (Galen Shipman and Scott Atchley) • RoCE: Ezra Kissel, IU, D. K. Panda, OSU • LIGO FDT: Ashish Mahabal, Caltech

  4. Wide-area data movement • Questions we asked (“science” phase): • are science data transfer rates high or still low? • is there is significant variance in throughput? • inspite of increasing rates (which means shorter transfer durations), are transfer sizes large enough to justify VC setup overhead? • Method used to answer • Obtained GridFTP logs from four sources • Wrote R statistical programs to analyze logs (supporting: shell, awk, Javascript, SQL) “Scientists discover that which is Engineers create that which never was”

  5. Wide-area data movement • Answers: • Transfer rates: • on 4 analyzed paths, found max rate of 4.3 Gbps and on all 4 paths max rate was 2.5 Gbps • this is a significant fraction of link capacity (10 Gbps) • Throughput variance • significant: coefficient of variation: 30%-75% (4 paths) • causes? • transfer parameters (e.g., parallel streams, close and open TCP connections between files, striping) • competition for server resources by concurrent transfers • not the network (link util is low, and packet losses rare) “Scientists discover that which is Engineers create that which never was”

  6. Answers • Inspite of increasing rates (which means shorter transfer durations), are transfer sizes large enough to justify VC setup overhead? • Yes, significant fraction of transfers occur in sessions whose durations are longer than 10 times VC setup delay • Used hypothetical third-quartile transfer rate to compute duration rather than using actual durations to make the above determination

  7. Experiments on DOE ANI WAN 100GE and LIMAN testbeds • Leveraged our ESnet DOE project relationship to gain login access to ANI 100 GbE WAN and Long Island MAN testbeds • Experiments run: • Developed tools on LIMAN testbed for controlled data collection for variance studies • Impact of competition for CPU and packet losses • Deployed these tools on production NERSC and SLAC GridFTP servers and data collected for controlled transfers – analysis ongoing • Reserved whole 100 GE testbed (NERSC and ANL) for weekend and ran continuous GridFTP/TCP transfers • No losses! TCP throughput stayed close to 96 Gbps • Why? Bit errors corrected by FEC (thanks to Chris Tracy) • RoCE over L2 circuit vs TCP over IP-routed path • Results in UNH presentations

  8. ANI 100G Testbed Brian Tierney DOE PI meeting, March 1-2, 2012

  9. Engineering solutions • Moving forward with GridFTP+RoCE+IDC client integration • Testing on DYNES – Internet ION planned • UVA proposal for DYNES approved: awaiting delivery • Remote logins obtained: • FRGP (regional – so no FDT server, but IDC controller) • Boston University (CNS project collaboration) • Requests made: Julio Ibarra, Martin Swany • Plan is to give each other logins for wide-area tests • Leveraging collaboration with ESnet in DOE project • Testing IDC Java client with ANI testbed OSCARS IDC [Needs further discussion]

  10. Intra-datacenter networking/apps • John Dennis has identified interesting topology/routing problems • With fewer than 1000 cores, application is CPU limited, but with more than 1000 cores, it is network limited • UNH and NCAR will run MPI apps and collect data • UVA will analyze and design datacenter networking solutions

More Related