150 likes | 280 Views
Summer 2002 at SLAC. Ajay Tirumala. Main Projects. Measuring disk throughputs on remote hosts considering parameters like File System Read[write]-block size Sequential/random reads[writes] Committing sequence for writes File sizes Iperf QUICK mode
E N D
Summer 2002 at SLAC Ajay Tirumala
Main Projects • Measuring disk throughputs on remote hosts • considering parameters like • File System • Read[write]-block size • Sequential/random reads[writes] • Committing sequence for writes • File sizes • Iperf QUICK mode • A new algorithm which reduces the time for measuring end-to-end bandwidth • And thus also the network traffic generated Summer 2002 at SLAC – Ajay Tirumala
Disk Throughputs • File Systems • NFS uses client’s main-memory as cache. • Data can be lost during reads/writes. So, need to perform small sized reads and commit often. • AFS uses session semantics • Local disk is the cache • UFS – default file system for Solaris • fwrites write to the disk buffer, committed to disk on fsync, buffer is full or when disk caching is disabled • EXT – most popular file system for Linux • Layer below the VFS • Has the concept of pre-allocation (allotting upto 8 adjacent file blocks when a block is requested). • Mount option available for greater write speeds (with lesser consistency). Summer 2002 at SLAC – Ajay Tirumala
Disk Reads • First read will necessitate a disk-read in most cases • A memory read will indicate • minimal memory activity • a very large memory since the tests are performed with an interval of days. • Second read (performed immediately after first read) • will generally be read from memory • unless disk caching is disabled • Since there is a good probability that even the first read can be from memory, we consider disk writes as the primary metric for disk speeds. Summer 2002 at SLAC – Ajay Tirumala
Disk writes • Commit modes –used fsync to commit files to disk • Plain (no commit) • Commit each write • Commit at end – Most indicative of the disk bandwidth achievable • Block sizes • For local disks use large block sizes (1-2 MB) • For remote writes, 64KB/128KB will suffice • File sizes • Using a large file size (2GB) increased the throughput in some cases. Default was 64MB. • Caution: NFS may not return error during fwrites, it may return an error only on an fsync Summer 2002 at SLAC – Ajay Tirumala
Possible areas to investigate • Could consider different disk subsystems like RAID • Analysis of parallel disk-transfers using BBCP. • Initial tests have indicated that in cases where disk is the limiting factor, using single thread is the best option. • Algorithm to estimate disk speeds without using large writes*. • Manufacturers’ specs lose meaning with Network File Systems and even for local file systems with multiple disks. Summer 2002 at SLAC – Ajay Tirumala
Iperf QUICK Mode • Problem • Current TCP apps cannot detect when they are out of slow-start • Bandwidth measurement apps have to run for a considerable time to counter the effects of slow-start. • Solution • Use Web100 to detect the end of slow-start • Measure bandwidth for a small period after slowstart (say 1s). • This should save about 90% of estimation time and traffic generated. Summer 2002 at SLAC – Ajay Tirumala
Detecting end of Slow-start • Outline • Determine a sampling period for Congestion Window • Detect the absence of exponential increase every RTT • Handle pathological cases • Connection may not get out of slow-start • Multiple slow-starts • Connection may have a very small bandwidth-delay product. • E.g. localhost transfers, with latency in nano-seconds. • At present, it handles Reno and Vegas • It should handle Net100/Floyd stacks with minor modifications. Summer 2002 at SLAC – Ajay Tirumala
The Quick mode Algorithm • Initialize Iperf sockets and initialize Web100 connection for the for the Iperf socket. • Start Web100 data collection thread • This will indicate when the connection is definitely out of slow-start • Detect the end of slow-start in the data transfer thread • If congestion window does not stabilize, do NOT report QUICK mode results • Measure bandwidth for 1s (or user specified time) after slow-start Summer 2002 at SLAC – Ajay Tirumala
Salient results • Slow-starts can be • From 0.2 seconds for low-latency networks • Up to 5 sec for long haul high bandwidth networks. • Maximum gains here by using Iperf in QUICK mode. • Unless, we use it in quick mode, we can never be sure that the connection is out of slow-start • Differs with throughputs for running Iperf for 20s by less than 10% • Even performed some tests on dialup links (as receiver) with good results. Summer 2002 at SLAC – Ajay Tirumala
Web100 experiences • A must use tool (I’m a fan) • User-APIs can be improved • Behaves well for a sampling time of 20ms. Summer 2002 at SLAC – Ajay Tirumala
Possible areas to investigate • Integrate with BW tests. • Perform tests with slow-senders. • Empirical estimates immediately after slow-start : • Using RTT and rate of increase of congestion window. Summer 2002 at SLAC – Ajay Tirumala
Links • Disk : • http://www-iepm.slac.stanford.edu/bw/disk_res.html • Iperf Quick mode : • http://www-iepm.slac.stanford.edu/bw/iperf_res.html • Documentation and results of tests with all IEPM-BW managed nodes available from these links. Summer 2002 at SLAC – Ajay Tirumala
Other stuff… • Miniperf is a small Iperf-like program written to • Monitor user-specified Web100 variable(s) • Allows setting window sizes and test times • Can include parallel thread functionality • Generate graphs (rate based, sum based) • Generate HTML • Created a single Iperf version to run on IPv4/v6 (Web100)/(no Web1000). Summer 2002 at SLAC – Ajay Tirumala