NET100 … as seen from ORNL

NET100… as seen from ORNL Tom Dunigan thd@ornl.gov November 8, 2001

Net100 at ORNL • Motivation • ORNL objectives • Accomplishments to date • Ongoing work • ORNL team • Tom Dunigan • Florence Fowler • Nagi Rao

ORNL’s motivation • ORNL/NERSC Probe project • wide-area distributed storage testbed (HPSS) • investigate protocols, software, devices • climate model data transfers were slow • OC3/OC12 with 80 ms RTT • classic TCP tuning problem • also broken TCP stacks • How to tune TCP? • Web100 a potential solution

TCP losses Packet losses during startup, linear recovery 0.5 Mbs instantaneous Packet loss average Early packet drops

ORNL Net100 objectives (yr 1) • Optimize wide-area bulk transfer • understand HPSS WAN transfers • characterize ESnet OC12 links • Optimize/tune TCP • TCP parameters that affect performance • avoid loss and speed recovery • Develop network tools • develop/deploy/evaluate probes/sensors • use data to tune applications • archive data for broader analysis

Progress: bulk transfer study • Characterizing ESnet links • tcpdump/tcptrace/xplot, iperf/netperf, pipechar, • router stats • probes at ORNL/NERSC/LANL/LBL/ANL • Understanding HPSS transfers • HSI, pftp (and ftp/bbftp) • used Web100 gui to tune HSI transfers • OS TCP tuning/debugging, I/O limits • To do: • netlogger • jumbo frames, ECN

Progress: TCP optimization • What can be tuned in TCP? • Window size, del ACK, AIMD values,idle-restart, burst limit, ssthresh, dup limit • What data to retain to for tuning decisions? • RTT vars, cwnd/ssthresh, retransmit/timeout, D-SACKs, • Experiments using ns • Experiments with web100 • Experiments using almost-TCP-over-UDP • Experiments with SCTP • out-of-order delivery

Tuning TCP • Avoid losses • retain/probe for “optimal” buffer sizes • autotuning (Web100/Net100) • ECN capable routers/hosts • reduce bursts • Faster recovery • shorter RTT (“fix” routes) • no delayed ACK • bigger MSS (jumbo frames) • speculative recovery, D-SACK • modified congestion avoidance

Almost TCP over UDP (atou ) • Test harness to modify TCP-like parameters over real net • no kernel mods or root access • uses UDP (simple/client server) • instrumented and tunable • window size, segment size • delayed ACK • AIMD parameters (backoff/recovery) • RENO, NewReno, SACK/FACK (w/rampdown) • dup/timeout threshold • burst limit • drop list

Future work (yr 2 and 3) • Parallel streams (psockets) • how to choose number of streams, buffer sizes? • Testing with iperf and bbftp • Web100 autotune ? • Application routing daemons • indirect TCP • alternate path (Wolski, UCSB) • multipath (Rao, ORNL) • Non-TCP solutions • rate-based datagrams, TCP-like, DCP • SCTP, out-of-order delivery • Are these fair?

Progress: network tools • Web100 test and evaluation • GigE Web100 nodes at ORNL/NERSC (+UT/LBL/NCAR/NCSA/java/web100 bandwidth/config applet • ttcp100 • web100d • Deploy Net100 tools • Enhance/Netlogger • NWS ?

Net100 questions • What/how do we auto-tune? • How do we tune both ends? • What/how do we measure? • Active probes (what/when/where) • Passive (web100, router/snmp?) • How do we save/access our measurement data? • How do we measure “success”? http://www.csm.ornl.gov/~dunigan/net100

NET100 … as seen from ORNL