550 likes | 656 Views
File System Benchmarking. Advanced Research Computing. Outline. I O benchmarks What is benchmarked Micro-benchmarks Synthetic benchmarks Benchmarks results for Shelter NFS server, client on hokiespeed NetApp FAS 3240 server, client on hokiespeed and blueridge. IO BENCHMARKING.
E N D
File System Benchmarking Advanced Research Computing
Outline • IO benchmarks • What is benchmarked • Micro-benchmarks • Synthetic benchmarks • Benchmarks results for • Shelter NFS server, client on hokiespeed • NetApp FAS 3240 server, client on hokiespeed and blueridge
IO Benchmarks • Micro-benchmarks • Measure one basic operation in isolation • Read and write throughput: dd, IOzone, IOR • Metadata operations (file create, stat, remove): mdtest • Good for: tuning an operation, system acceptance • Synthetic benchmarks: • Mix of operations that model real applications • Useful if they are good models of real applications • Examples: • Kernel build, kernel tar and untar • NAS BT-IO
IO Benchmark pitfalls • Not measuring want you want to measure • masking of the results by various caching and buffering mechanisms • Examples of different behaviors • Sequential bandwidth vs random IO bandwidth; • Direct IO bandwidth vs bandwidth in the presence of the page cache (in the latter case an fsync is needed) • Caching of file attributes: stat-ing a file on the same node on which the file has been written
What is benchmarked • What we measure is the combined effect of: • native file system on the NFS server (shelter) • NFS server performance which depends on factors such as enabling/disabling write-delay and the number of server threads • Too few threads: client retries several times • Too many threads: server thrashing • network between the compute cluster and the NFS server • NFS client and mount options • Synchronous or asynchronous • Enable/disable attribute caching
Micro-benchmarks • IOZone – measure read/write bandwidth • Historical benchmark ability to test multiple readers/writers • dd – measure read/write bandwidth • Tests file write/read • mdtest – metadata operations per second • file/directory create/stat/remove
Mdtest – metada test • Measures the rate of the operations of file/directory • create, stat, remove • Mdtest creates a tree of files and directories • Parameters used • tree depthz = 1 • branching factor b = 3 • number of files/directories per tree node: I = 256 • Stat run by another node than the create node: N = 1 • Number of repeats of the run: i = 5
Synthetic benchmarks • tar-untar-rm – measure time • Test large number of small file creation/deletion • Test filesystem metadata creation/deletion • NAS BT-IO – bandwidth and time doing IO • Solve a block tri-diagonal linear system arising from the discretization of Navier-Stokes equations
Kernel source tar-untar-rm • Run on 1 to 32 nodes. • Tarball size: 890M • Total directories: 4732 • Max directory depth: 10 • Total files: 75984 • Max file size: 919 kB • <= 1k: 14490 • <= 10k: 40190 • <=100k: 20518 • <= 1M: 786
NAS BT-I/O • Test mechanism • BT is a simulated CFD application that uses an implicit algorithm to solve 3-dimensional compressible Navier-Stokes equations. The finite differences solution to the problem is based on an Alternating Direction Implicit (ADI) approximate factorization that decouples the x, y and z dimensions. The resulting systems are Block-Tridiagonal of 5x5 blocks and are solved sequentially along each dimension. • BT-I/O is test of different parallel I/O techniques in BT • Reference - http://www.nas.nasa.gov/assets/pdf/techreports/1999/nas-99-011.pdf • What it measures • Multiple cores I/O with a single large file (blocking MPI calls mpi_file_write_at_alland mpi_file_read_at_all) • I/O timing percentage, Total data written, I/O data rate
dd throughput (MB/sec) • Run on 1 to 32 nodes • Two block size – 1MB and 4MB • Three file sizes – 1GB, 5GB, 15GB
Server and Clients • NAS server: NetAppFAS 3240 • Clients running on two clusters • Hokiespeed • Blueridge • Hokiespeed: Linux kernel compile, tar-untar and rm tests have been run with: • nodes spread uniformly over racks, and • consecutive nodes (rack-packed) • Blueridge: Linux kernel compile, tar-untar, and rm tests have been run on consecutive nodes
IOzone read and write throughput (KB/s) Hokiespeed
dd bandwidth (MB/sec) • Two node placement policies • packed on a rack • spread across racks • Direct IO was used • Two operations: read and write • Two block sizes – 1MB and 4MB • Three file sizes – 1GB, 5GB, 15 GB • Results show throughput in MB/s
dd readthroughput (MB/sec), 1MB blocks BlueRidge Hokiespeed Nodes packed Nodes spread Nodes packed
dd readthroughput (MB/sec), 4 MB blocks Hokiespeed BlueRidge Nodes packed Nodes spread Nodes packed
dd write throughput (MB/sec), 1MB blocks BlueRidge Hokiespeed Nodes packed Nodes spread Nodes packed
dd write throughput (MB/sec), 4MB blocks Hokiespeed BlueRidge Nodes packed Nodes spread Nodes packed
Linux Kernel tests • Two node placement policies • packed on a rack • spread across racks • Operations • Compile: make –j 12 • Tar creations and extraction • Remove directory tree read and write • Results show throughput in MB/s
Linux Kernel compile time (sec) Hokiespeed BlueRidge Nodes packed Nodes packed Nodes spread
Tar extraction time (sec) Hokiespeed BlueRidge Nodes packed Nodes packed Nodes spread
Rm execution time (sec) Hokiespeed BlueRidge Nodes packed Nodes spread Nodes packed
Uplink switch traffic, runs on hokiespeed Nodes packed Nodes spread
Mdtest file/directory create rate IO ops/sec for mdtest –z 1 –b 3 –I 256 –i 10 –N 1 BlueRidge Hokiespeed
Mdtest file/directory remove rate IO ops/sec for mdtest –z 1 –b 3 –I 256 –i 10 –N 1 BlueRidge Hokiespeed
Mdtest file/directory stat rate IO ops/sec for mdtest –z 1 –b 3 –I 256 –i 10 –N 1 BlueRidge Hokiespeed
Uplink switch traffic for BT-IO on hokiespeed 1 2 • The boxes indicate the three NAS BT IO runs • Red is write • Green is read 3
dd bandwidth (MB/sec) • Runs on BlueRidge • no special node placement policy • Direct IO was used • Two operations: read and write • Two block sizes – 1MB and 4MB • Three file sizes – 1GB, 5GB, 15 GB • Results show throughput in MB/s
dd readthroughput (MB/sec), 1MB blocks EMC Isilon NetApp
dd readthroughput (MB/sec), 4 MB blocks Isilon NetApp
dd write throughput (MB/sec), 1MB blocks Isilon NetApp
dd write throughput (MB/sec), 4MB blocks Isilon NetApp
Linux Kernel tests • Runs on BlueRidge • no special node placement policy • Direct IO was used • Operations • Compile: make –j 12 • Tar creations and extraction • Remove directory tree read and write • Results show throughput in MB/s
Linux Kernel compile time (sec) Isilon NetApp
Tar creation time (sec) Isilon NetApp
Tar extraction time (sec) Isilon NetApp
Rm execution time (sec) Isilon NetApp