90 likes | 191 Views
Network-aware OS. DOE/MICS ORNL site visit January 8, 2004. ORNL team: Tom Dunigan, Nagi Rao, Florence Fowler, Steven Carter Matt Mathis mathis@psc.edu Brian Tierney bltierney@lbl.gov. Roadmap. Net100 overview ORNL contributions Ongoing ORNL work Future research.
E N D
Network-aware OS DOE/MICS ORNL site visit January 8, 2004 ORNL team: Tom Dunigan, Nagi Rao, Florence Fowler, Steven Carter Matt Mathis mathis@psc.edu Brian Tierney bltierney@lbl.gov
Roadmap • Net100 overview • ORNL contributions • Ongoing ORNL work • Future research www.net100.orgmore details at www.csm.ornl.gov/~dunigan/net100 • DOE-funded project (Office of Science) • $2.6M, 3 yrs beginning 9/01 • LBNL, ORNL, PSC, NCAR • Net100 project objectives: (network-aware operating systems) • measure, understand, and improve end-to-end network/application performance • tune network protocols and applications (grid and bulk transfer) • emphasis: TCP bulk transfer over high delay/bandwidth nets
Net100 Objective: speedup network applications • “enable” high speed • need buffer = bandwidth*RTT - autotuneORNL/NERSC (80 ms, OC12) need 6 MB • faster slow-start • avoid losses • modified slow-start • reduce bursts • anticipate loss (ECN,Vegas?) • reorder threshold • speed recovery • bigger MTU or “virtual MSS” • modified AIMD (0.5,1) (Floyd, Kelly) • delayed ACKs, initial window, slow-start increment • avoid congestion collapse, be fair (?) … intranets, QoS ns simulation: 500 mbs link, 80 ms RTT Packet loss early in slow start. Standard TCP with del ACK takes 10 minutes to recover!
Net100 methodology • Web100 Linux kernel (NSF) • instrumented TCP stack (IETF MIB draft) • Path characterization • Network Tuning and Analysis Framework (NTAF) • both active and passive measurement tools • data base of measurements • TCP protocol analysis and tuning (primarily ORNL) • simulation/emulation • ns • TCP-over-UDP (atou) • NISTNet • kernel tuning extensions • tuning daemon • evaluation tests
Net100 results • Novel approaches • non-invasive dynamic tuning of legacy applications • out-of-kernel tuning • using TCP to tune TCP • tuning on a per flow/destination based on recent path metrics or policy (QoS) • Effective evaluation framework • protocol analysis and tuning • network/application/OS debugging • path characterization tools, archive, and visualization tools • Performance improvements • WAD tuned: • buffers : 10x • AIMD : 2x to 10x • delayed ACK : 2x • slowstart : 3x • reorder : 40x • Papers and software available
User tools Iperf100/ttcp100 Applet bandwidth tester atou (tcp over udp) Net100 daemon (WAD), traced Kernel extensions Event notification AIMD and virtual MSS knobs HS TCP (Floyd) Scalable TCP (Kelly) TCP Vegas Cray X1 SGI Altix Evaluations Emulation – NISTNet testbed Simulation (ns) Parallel streams, bbftp, pftp, gridFTP FAST Non-TCP SCTP UDP (SABUL, Tsunami, FOBS) Nets: ESnet, Internet2, Europe, GigE cable, ATM, wireless, dialup Interactions HPSS/Probe (NERSC) Climate SLAC Vendors – Cray, SGI, IBM Talks/papers, software distribution Net100 – ORNL contributions
Ongoing Net100 ORNL research • more user-friendly WAD – WAD-lite • daemon’s interact and measure to select transport parameters • plus optional system manager configuration file (policy) • No NTAF • Working with NERSC/ORNL HPSS on Net100 support • Vendor collaboration for Net100 on Cray X1 and SGI Altix • TCP Vegas testing • Delay-based congestion avoidance • can be configured to compete with standard TCP (Feng) • CalTech’s FAST • comparison with other “work arounds” • parallel streams • non-TCP (SABUL, FOBS, TSUNAMI, RBUDP, UDT, SCTP) • Dedicated optical path transports • User-mode Linux and Net100
TCP Vegas • Added Vegas to Linux/Net100 kernel, hires timer • Tunable with WAD (alpha/beta) • Delay-based congestion avoidance • Limits buffer growth • Reduces loss ? • Compare with FAST • More tests ……
Planned Net100 research • improve ease of use (WAD-lite) • analyze effectiveness/fairness of current tuning options • simulation • emulation • on the net (systematic tests) • additional tuning algorithms • slow-start accelerants • identify non-congestive loss, ECN? • Tuning for dedicated path (lambda/10GigE) • parallel/multipath selection/tuning • 10GigE tests • FreeBSD ports ??? • jumbo frame experiments… the quest for bigger and bigger MTUs