120 likes | 135 Views
Explore the motivation, accomplishments, and ongoing work of ORNL's Net100 project. Objectives include optimizing bulk transfers, tuning TCP parameters, and developing network tools to enhance performance. Learn about progress made, future work planned, and innovative experiments with protocol tuning and network optimization, all aimed at improving data transfer efficiency. Follow the ORNL team's efforts to address challenges in wide-area distributed storage and enhance network performance through advanced techniques.
E N D
NET100… as seen from ORNL Tom Dunigan thd@ornl.gov November 8, 2001
Net100 at ORNL • Motivation • ORNL objectives • Accomplishments to date • Ongoing work • ORNL team • Tom Dunigan • Florence Fowler • Nagi Rao
ORNL’s motivation • ORNL/NERSC Probe project • wide-area distributed storage testbed (HPSS) • investigate protocols, software, devices • climate model data transfers were slow • OC3/OC12 with 80 ms RTT • classic TCP tuning problem • also broken TCP stacks • How to tune TCP? • Web100 a potential solution
TCP losses Packet losses during startup, linear recovery 0.5 Mbs instantaneous Packet loss average Early packet drops
ORNL Net100 objectives (yr 1) • Optimize wide-area bulk transfer • understand HPSS WAN transfers • characterize ESnet OC12 links • Optimize/tune TCP • TCP parameters that affect performance • avoid loss and speed recovery • Develop network tools • develop/deploy/evaluate probes/sensors • use data to tune applications • archive data for broader analysis
Progress: bulk transfer study • Characterizing ESnet links • tcpdump/tcptrace/xplot, iperf/netperf, pipechar, • router stats • probes at ORNL/NERSC/LANL/LBL/ANL • Understanding HPSS transfers • HSI, pftp (and ftp/bbftp) • used Web100 gui to tune HSI transfers • OS TCP tuning/debugging, I/O limits • To do: • netlogger • jumbo frames, ECN
Progress: TCP optimization • What can be tuned in TCP? • Window size, del ACK, AIMD values,idle-restart, burst limit, ssthresh, dup limit • What data to retain to for tuning decisions? • RTT vars, cwnd/ssthresh, retransmit/timeout, D-SACKs, • Experiments using ns • Experiments with web100 • Experiments using almost-TCP-over-UDP • Experiments with SCTP • out-of-order delivery
Tuning TCP • Avoid losses • retain/probe for “optimal” buffer sizes • autotuning (Web100/Net100) • ECN capable routers/hosts • reduce bursts • Faster recovery • shorter RTT (“fix” routes) • no delayed ACK • bigger MSS (jumbo frames) • speculative recovery, D-SACK • modified congestion avoidance
Almost TCP over UDP (atou ) • Test harness to modify TCP-like parameters over real net • no kernel mods or root access • uses UDP (simple/client server) • instrumented and tunable • window size, segment size • delayed ACK • AIMD parameters (backoff/recovery) • RENO, NewReno, SACK/FACK (w/rampdown) • dup/timeout threshold • burst limit • drop list
Future work (yr 2 and 3) • Parallel streams (psockets) • how to choose number of streams, buffer sizes? • Testing with iperf and bbftp • Web100 autotune ? • Application routing daemons • indirect TCP • alternate path (Wolski, UCSB) • multipath (Rao, ORNL) • Non-TCP solutions • rate-based datagrams, TCP-like, DCP • SCTP, out-of-order delivery • Are these fair?
Progress: network tools • Web100 test and evaluation • GigE Web100 nodes at ORNL/NERSC (+UT/LBL/NCAR/NCSA/java/web100 bandwidth/config applet • ttcp100 • web100d • Deploy Net100 tools • Enhance/Netlogger • NWS ?
Net100 questions • What/how do we auto-tune? • How do we tune both ends? • What/how do we measure? • Active probes (what/when/where) • Passive (web100, router/snmp?) • How do we save/access our measurement data? • How do we measure “success”? http://www.csm.ornl.gov/~dunigan/net100