170 likes | 303 Views
High performance file I/O. Computer User Training Course 2004 Carsten Maaß User Support. Topics. introduction to General Parallel File System (GPFS) GPFS in the ECMWF High Performance Computing Facility (HPCF) staging data to an HPCF cluster retrieving results from an HPCF cluster
E N D
High performance file I/O Computer User Training Course 2004 Carsten Maaß User Support
Topics • introduction to General Parallel File System (GPFS) • GPFS in the ECMWF High Performance Computing Facility (HPCF) • staging data to an HPCF cluster • retrieving results from an HPCF cluster • maximizing file I/O performance in FORTRAN • maximizing file I/O performance in C and C++
The General Parallel File System (GPFS) • each GPFS file system can be shared simultaneously by multiple nodes • can be configured for high availability • performance generally much better than locally attached disks (due to parallel nature of the underlying file system) • provides Unix file system semantics with very minor exceptions • GPFS provides data coherency • modifications to file content by any node are immediately visible to all nodes
Metadata coherency • GPFS does NOT provide full metadata coherency • the stat system call might return incorrect values for atime, ctime and mtime (which can result in the ls command providing incorrect information) • metadata is coherent if all nodes have sync'd since the last metadata modification • Use gpfs_stat and/or gpfs_fstat if exact atime,ctime and mtime values are required.
File locking • GPFS supports all 'traditional' Unix file locking mechanisms: • lockf • flock • fcntl • Remember: Unix file locking is advisory, not mandatory
Comparison with NFS Concept GPFS NFS file systems shared between multiple nodes yes yes path names the same across different nodes yes* yes* data coherent across all nodes yes no different parts of a file can be simultaneously up- dated on different nodes yes no high performance yes no traditional Unix file locking semantics yes no * if configured appropriately
GPFS in an HPCF cluster . . . p690 128G p690 128G p690 128G p690 32G p690 32G p690 32G GPFS clients dual plane SP Switch 2 (dual ~350MB/sec switch) I/O node I/O node I/O node I/O node GPFS servers
HPCF file system setup (1/2) • all HPCF file systems are of the type GPFS • each GPFS file systems will be global • accessible from any node within a cluster • GPFS file systems are not shared between the two HPCF clusters
HPCF file system setup (2/2) ECMWF's file system locations are described by environment variables: • Do not rely on select/delete! • Clear your disk space as soon as possible!
Transferring data to/from an HPCF cluster depending on source and size of data • ecrcp from ecgate to hpca for larger transfers • NFS to facilitate commands like ls etc. on remote machines
Maximizing FORTRAN I/O performance • In roughly decreasing order of importance: • Use large record sizes • aim for at least 100K • multi-megabyte records are even better • use FORTRAN unformatted instead of formatted files • use FORTRAN direct files instead of sequential • reuse the kernel's I/O buffers: • if a file which was recently written sequentially is to be read, start at the end and work backwards
What's wrong with short record sizes? • each read call will typically result in: • sending a request message to an I/O node • waiting for the response • receiving the data from the I/O node • the time spent waiting for the response will be at least a few milliseconds regardless of the size of the request • data transfer rates for short requests can be as low as a few hundred thousand bytes per second • random access I/O on short records can be even slower!!! Reminder: At the HPCF's clock rate of 1.3 GHz, one millisecond spent waiting for data wastes over 1,000,000 CPU cycles!
FORTRAN direct I/O example 1 real*8 a(1000,1000) . . . open (21, file='input.dat',access='DIRECT',recl=8000000, - status='OLD') read(21,rec=1) a close(21) . . . open (22, file='output.dat',access='DIRECT',recl=8000000, - status='NEW') write(22,rec=1) a close(22) . . .
FORTRAN direct I/O example 2 real*8 a(40000), b(40000) open (21, file='input.dat',access='DIRECT',recl=320000, - status='OLD') open (22, file='output.dat',access='DIRECT',recl=320000, - status='NEW') . . . do i = 1, n read(21,rec=i) a do j = 1, 40000 b(j) = ... a(j) ... enddo write(22,rec=i) b enddo close(21) close(22) . . .
Maximizing C and C++ I/O performance • In roughly decreasing order of importance: • use large length parameters in read and write calls • aim for at least 100K • bigger is almost always better • use binary data formats (i.e. avoid printf, scanf etc.) • use open, close, read, write, etc (avoid stdio routines like fopen, fclose, fread and fwrite) • reuse the kernel's I/O buffers: • if a file which was recently written sequentially is to be read, start at the end and work backwards
What's wrong with the stdio routines? • underlying block size is quite small • use setbuf or setvbuf to increase buffer sizes • writing in parallel using fwrite risks data corruption • Although . . . • stdio is much better than short requests • e.g. fgetc, fputc
Unit summary • introduction to General Parallel File System (GPFS) • GPFS in the ECMWF High Performance Computing Facility (HPCF) • staging data to an HPCF cluster • retrieving results from an HPCF cluster • maximizing file I/O performance in FORTRAN • maximizing file I/O performance in C and C++