Summer 2002 at SLAC

Summer 2002 at SLAC Ajay Tirumala

Main Projects • Measuring disk throughputs on remote hosts • considering parameters like • File System • Read[write]-block size • Sequential/random reads[writes] • Committing sequence for writes • File sizes • Iperf QUICK mode • A new algorithm which reduces the time for measuring end-to-end bandwidth • And thus also the network traffic generated Summer 2002 at SLAC – Ajay Tirumala

Disk Throughputs • File Systems • NFS uses client’s main-memory as cache. • Data can be lost during reads/writes. So, need to perform small sized reads and commit often. • AFS uses session semantics • Local disk is the cache • UFS – default file system for Solaris • fwrites write to the disk buffer, committed to disk on fsync, buffer is full or when disk caching is disabled • EXT – most popular file system for Linux • Layer below the VFS • Has the concept of pre-allocation (allotting upto 8 adjacent file blocks when a block is requested). • Mount option available for greater write speeds (with lesser consistency). Summer 2002 at SLAC – Ajay Tirumala

Disk Reads • First read will necessitate a disk-read in most cases • A memory read will indicate • minimal memory activity • a very large memory since the tests are performed with an interval of days. • Second read (performed immediately after first read) • will generally be read from memory • unless disk caching is disabled • Since there is a good probability that even the first read can be from memory, we consider disk writes as the primary metric for disk speeds. Summer 2002 at SLAC – Ajay Tirumala

Disk writes • Commit modes –used fsync to commit files to disk • Plain (no commit) • Commit each write • Commit at end – Most indicative of the disk bandwidth achievable • Block sizes • For local disks use large block sizes (1-2 MB) • For remote writes, 64KB/128KB will suffice • File sizes • Using a large file size (2GB) increased the throughput in some cases. Default was 64MB. • Caution: NFS may not return error during fwrites, it may return an error only on an fsync Summer 2002 at SLAC – Ajay Tirumala

Possible areas to investigate • Could consider different disk subsystems like RAID • Analysis of parallel disk-transfers using BBCP. • Initial tests have indicated that in cases where disk is the limiting factor, using single thread is the best option. • Algorithm to estimate disk speeds without using large writes*. • Manufacturers’ specs lose meaning with Network File Systems and even for local file systems with multiple disks. Summer 2002 at SLAC – Ajay Tirumala

Iperf QUICK Mode • Problem • Current TCP apps cannot detect when they are out of slow-start • Bandwidth measurement apps have to run for a considerable time to counter the effects of slow-start. • Solution • Use Web100 to detect the end of slow-start • Measure bandwidth for a small period after slowstart (say 1s). • This should save about 90% of estimation time and traffic generated. Summer 2002 at SLAC – Ajay Tirumala

Detecting end of Slow-start • Outline • Determine a sampling period for Congestion Window • Detect the absence of exponential increase every RTT • Handle pathological cases • Connection may not get out of slow-start • Multiple slow-starts • Connection may have a very small bandwidth-delay product. • E.g. localhost transfers, with latency in nano-seconds. • At present, it handles Reno and Vegas • It should handle Net100/Floyd stacks with minor modifications. Summer 2002 at SLAC – Ajay Tirumala

The Quick mode Algorithm • Initialize Iperf sockets and initialize Web100 connection for the for the Iperf socket. • Start Web100 data collection thread • This will indicate when the connection is definitely out of slow-start • Detect the end of slow-start in the data transfer thread • If congestion window does not stabilize, do NOT report QUICK mode results • Measure bandwidth for 1s (or user specified time) after slow-start Summer 2002 at SLAC – Ajay Tirumala

Salient results • Slow-starts can be • From 0.2 seconds for low-latency networks • Up to 5 sec for long haul high bandwidth networks. • Maximum gains here by using Iperf in QUICK mode. • Unless, we use it in quick mode, we can never be sure that the connection is out of slow-start • Differs with throughputs for running Iperf for 20s by less than 10% • Even performed some tests on dialup links (as receiver) with good results. Summer 2002 at SLAC – Ajay Tirumala

Web100 experiences • A must use tool (I’m a fan) • User-APIs can be improved • Behaves well for a sampling time of 20ms. Summer 2002 at SLAC – Ajay Tirumala

Possible areas to investigate • Integrate with BW tests. • Perform tests with slow-senders. • Empirical estimates immediately after slow-start : • Using RTT and rate of increase of congestion window. Summer 2002 at SLAC – Ajay Tirumala

Links • Disk : • http://www-iepm.slac.stanford.edu/bw/disk_res.html • Iperf Quick mode : • http://www-iepm.slac.stanford.edu/bw/iperf_res.html • Documentation and results of tests with all IEPM-BW managed nodes available from these links. Summer 2002 at SLAC – Ajay Tirumala

Other stuff… • Miniperf is a small Iperf-like program written to • Monitor user-specified Web100 variable(s) • Allows setting window sizes and test times • Can include parallel thread functionality • Generate graphs (rate based, sum based) • Generate HTML • Created a single Iperf version to run on IPv4/v6 (Web100)/(no Web1000). Summer 2002 at SLAC – Ajay Tirumala

Thank you!!!

Summer 2002 at SLAC

Summer 2002 at SLAC

Presentation Transcript

Scientific Computing at SLAC

Accelerator Research at SLAC

Accelerator Research at SLAC

Scientific Computing at SLAC

RTEMS Use at SLAC

MS Exchange at SLAC

Web100 at SLAC

Accelerator Plans at SLAC

March 19, 2002 SLAC

IPv6 deployment at SLAC

Scientific Computing at SLAC

Welcome at SLAC

RDB Issues at SLAC

Accelerator Plans at SLAC

RTEMS Use at SLAC

Accelerator Research at SLAC

ENGLAND 2002 – SUMMER TOUR

Magnetic Measurements At SLAC