120 likes | 316 Views
ORNL Net100 status. July 31, 2002. ORNL Net100. Focus Areas (first year) TCP optimizations (WAD, AIMD, VMSS, ns, atou) Network tool evaluation (iperf, webd, traced, java/web100) Bulk transfer optimization, GridFTP (LBL/NERSC/ORNL) Today’s agenda Activities since Denver meeting
E N D
ORNL Net100 status July 31, 2002
ORNL Net100 • Focus Areas (first year) • TCP optimizations (WAD, AIMD, VMSS, ns, atou) • Network tool evaluation (iperf, webd, traced, java/web100) • Bulk transfer optimization, GridFTP (LBL/NERSC/ORNL) • Today’s agenda • Activities since Denver meeting • Current activities / WAD status • Future work/needs
ORNL activities since Denver meeting • Web100 tuning (manual) of HSI transfer (NERSC/ORNL), SC2001 • NISTNet testbed • SCTP study/report • autotuning WAD (version 0) • TCP tuning (AIMD, VMSS) with ns and atou • Web100 event notification • Web100 tools (ttcp100/iperf100, traced, tracer.py,webd,java bw tester) • Evaluation of Linux 2.4 (tuning, caching, sendstall, del ACK, reorder) • DRS testing (ESnet, NISTNet), integration with Web100 (2.4.16) • TCP Vegas testing/porting (ns, Linux, atou) • WAD tuning (AIMD, slow-start, buffers, NTAF data) (SC02 paper) • GridFTP and parallel stream tuning • Analysis of parallel-TCP and dynamic right-sizing ( ICN02 paper) • Implementation of linear squares and nearest neighbor estimators with linear fuser • Outreach: Net100 talks, web pages, atou tech report, SC02 Net100 paper,ICN02 paper, interactions with Claffy, Feng, Cottrell, Floyd, SCNM, Internet2 e2epi
Web100 tools • Post-transfer statistics • Java bandwidth tester (53% have pkt loss) • ttcp100/iperf100 • Web100 daemon • avoid modifying applications • log designated paths/ports/variables • Tracer daemon • collect Web100 variables at 0.1 second intervals • config file specifies • source/port dest/port • web100 variables (current/delta) • log to disk with timestamp and CID • plot/analyze flows/aggregates • C and python (LBL-based) # traced config file #local lport remote rport 0.0.0.0 0 124.55.182.7 0 0.0.0.0 0 134.67.45.9 0 #v=value d=delta d PktsOut d PktsRetrans v CurrentCwnd v SampledRTT
WAD WAD config file [bob] src_addr: 0.0.0.0 src_port: 0 dst_addr: 10.5.128.74 dst_port: 0 mode: 1 sndbuf: 2000000 rcvbuf: 100000 wadai: 6 wadmd: 0.3 maxssth: 100 divide: 1 reorder: 9 delack: 0 floyd: 1 • Version 1 • event-based (socket open/close) • config file with “tuning info” • buffer sizes, AIMD, slow-start • periodic poll of NTAF (flakey) • static tuning -- value in config file • dynamic tuning • use buffer sizes from NTAF • divide buffer size among concurrent flows • tune AIMD with Floyd table based on buffer size, or periodically during flow • python WAD (based on LBL work) • polling
WAD tuning results (your mileage may vary …) Classic buffer tuning: ORNL to PSC, OC12, 80ms RTT network-challenged app. gets 10 Mbs same app., WAD/NTAF tuned buffer get 143 Mbs Is there a buffer size where you don’t get loss? … NOT Virtual MSS tune TCP’s additive increase (WAD_AI) add K segments per RTT during recovery k=6 like GigE jumboframe
WAD tuning Modified slow-start and VMSS ORNL to NERSC, OC12, 80 ms RTT often losses in slow start WAD tunes Floyd slowstart (WAD_MaxThresh) and AI (6) Floyd s-slittle improvement under heavy congestion…. WAD tuned AIMD and slow start ORNL to CERN, OC?, 150ms RTT parallel streams AIMD (1/(2k),k) WAD tune single stream (0.125,4)
WAD tuning: Floyd AIMD Floyd AIMD adjust AIMD as function of cwnd (loss assumption) bigger cwnd: bigger increment, smaller reduction tested with ns and atou (continuous) WAD implementation pre-tune based on target buffer size (aggressive) continuous tuning (0.1 second) discrete rather than continuous add to Linux 2.4 (soon) How to select AIMD? Jumbo, parallel equivalent, Floyd, others ?
GridFTP tuning Can tuned single stream compete with parallel streams? Mostly not with “equivalence” tuning, but sometimes…. Testing on the real net is problematic. WAD can divide buffer among concurrent flows, tests inconclusive so far…. Is there a “congestion metric”? Per unit of time? Flow Mbs congestion re-xmits untuned 28 4 30 tuned 74 5 295 parallel 52 30 401 untuned 25 7 25 tuned 67 2 420 parallel 88 17 440 Buffers: 64K I/O, 4MB TCP (untuned 64K TCP: 8 mbs, 200s) Data/plots from web100 tracer
Net100 TCP tuning Reorder threshold seeing more out of order packets WAD tune a bigger reorder threshold Linux 2.4 does a good job already (caches reorder) LBL to ORNL (using our TCP-over-UDP) dup3 case had 289 retransmits, but all were unneeded! Delayed ACKs WAD could turn off delayed ACKs -- 2x improvement in recovery rate and slowstart linux 2.4 already turns off delayed ACKs for initial slow-start ns simulation: 500 mbs link, 80 ms RTT Packet loss early in slow start. Standard TCP with del ACK takes 10 minutes to recover!
In progress ... • WAD enhancement and testing • delayed ACK • Floyd AIMD (WAD and/or kernel) • tuning with NTAF data • distribution to other Net100 sites • GridFTP tuning (ORNL/PSC/LBL) • python WAD with netlink • parallel stream tuning • TCP optimization studies (AIMD, Vegas) • Addition of postdoc
ORNL Yr 1 Milestones - - - - • Deploy Web100 at ORNL and NERSC nodes to develop Net100 expertise • Develop and demonstrate Web100-aware data transfer application for Probe/HPSS • testing between NERSC and ORNL • 3. Contribute to test and evaluation of existing end-to-end tools • Get access to ESnet ORNL and NERSC routers and investigate possible realtime feedback to application (e.g. using SNMP) • 5. Explore transport optimizations for single TCP flows • 6. Develop file transfer application/protocol to support out-of-order packet arrivals • 7. Deploy a small emulator testbed to test transport protocol modifications and out-of-order resilient protocols/applications • 8. Explore tuning the IBM/AIX 5.1 TCP stack and investigate extending it with Net100 mods • 9. Test net100 tools on ESnet's OC48 testbed • 10. Publish tools and tips on web page and in formal publications and presentations