70 likes | 153 Views
Net100: developing network-aware operating systems. New (9/01) DOE-funded (Office of Science) project ($1M/yr, 3 yrs) Principal investigators Matt Mathis, PSC ( mathis@psc.edu ) Brian Tierney, LBNL ( bltierney@lbl.gov ) Tom Dunigan, ORNL ( thd@ornl.gov ) Objective:
E N D
Net100: developing network-aware operating systems • New (9/01) DOE-funded (Office of Science) project ($1M/yr, 3 yrs) • Principal investigators • Matt Mathis, PSC (mathis@psc.edu) • Brian Tierney, LBNL (bltierney@lbl.gov) • Tom Dunigan, ORNL (thd@ornl.gov) • Objective: • measure and understand end-to-end network and application performance • tune network applications (grid and bulk transfer) • Components • active network probes and passive sensors (leverage Web100) • network metrics data base • tuning daemon (WAD) to tune network flows based on network metrics www.net100.org
Net100: applied Web100 • Web100 • Linux 2.4 kernel mods • 100+ TCP variables per flow • Net100 • Add Web100 to iperf/ttcp • Monitoring/tuning daemon • Java applet bandwidth/client tester • fake WWW server provides html and applet • applet connects to bwserver • 3 sockets (control, bwin, bwout) • server reports Web100 variables to applet (window sizes, losses, RTT) • Try it http://firebird.ccs.ornl.gov:7123
Net100 network measurement • Active measurement • Net100 probes at LBL, ORNL, NCAR, PSC, NERSC • scheduled set of path probes (iperf with Web100 mods, traceroute, pipechar) • local and centralized database (netlogger) • interface to other probers (NIMI, surveyor, Pinger, ?) • Passive measurement • Web100 daemon records TCP info on designated flows • Web100 data collected when flow terminates • Web100 TCP info: losses, timeouts, reordering, cwnd, ssthresh, RTT,… • use netlogger to report to central data base • other passive sensors (SNMP data, LBL’s tcpdump monitor, ?) • Query tools • for dynamic application tuning • for network engineering and statistical studies
Net100: tuning • Work-around Daemon (WAD) Version 0 • use network performance data to tune flows • tune unknowing sender/receiver • config file with “tuning info” ? • Based on Web100/Linux 2.4 • To be done • “applying” measurement info • adding more knobs to kernel • tune on non-Linux OS • Related work • Feng’s Dynamic Right Sizing • Linux 2.4 auto-tuning/caching • Mathis TCP buffer tunning
TCP losses • TCP is lossy by design • Changing: bandwidths • 9.6 Kbs… 1.5 Mbs ..45 …100…1000…? Mbs • Unchanging: • speed of light (RTT) • MTU (still 1500 bytes) • TCP congestion avoidance • recovery after a loss can be very slow on today’s high delay/bandwidth links • proportional to MSS/RTT2 Linear recovery at 0.5 Mb/s! Instantaneous bandwidth Early startup losses Average bandwidth
Net100 tuning • Avoid losses • use “optimal” buffer sizes determined from network measurements • ECN capable routers/hosts • reduce bursts (TCP Vegas, ?) • Faster recovery • bigger MSS (jumbo frames) • speculative recovery (D-SACK) • modified congestion avoidance? • Autotune (WAD variables) • Buffer sizes • Dupthresh (reordering resilience) • Del ACK, Nagle • AIMD • Virtual MSS • initial window, ssthresh • non-TCP solutions (rate-based, ?) (tests with TCP-over-UDP, atou, NERSC to ORNL)
Net100 status • Completed • network probes at ORNL, PSC, NCAR, LBL, NERSC • preliminary schema for network data • initial Web100 sensor daemon and tuning daemon • In progress • TCP tuning extensions to Linux/Web100 kernel • analysis of TCP tuning options • deriving tuning info from network measurements • Future • interactions with other network measurement sources • multipath/parallel path selection/tuning www.net100.org