200 likes | 348 Views
File and Data Conversion. Jonathan Carter NERSC User Services jcarter@nersc.gov 510-486-7514. Introduction. Converting file and data for use on the IBM SP IBM uses IEEE data representation Industry standard Fortran unformatted file structure Tools available on the Cray systems
E N D
File and Data Conversion Jonathan Carter NERSC User Services jcarter@nersc.gov 510-486-7514
Introduction • Converting file and data for use on the IBM SP • IBM uses IEEE data representation • Industry standard Fortran unformatted file structure • Tools available on the Cray systems • Tools available on the IBM SP
Demand for File Conversion • Currently, CTSS text files • ctou, rlib will be available on the IBM SP • After decommissioning the Cray Systems in October 2002 • Cray Fortran unformatted files • Cray C binary files
Tools on the Cray Systems - FFIO • Flexible File I/O - general system of specifying how data should be written or read • Can be used without recompiling or linking (Fortran) • Can be changed at runtime • Various layers available to convert both file structure and data • Controlled via the assign command
assign Command • Can specify how I/O is done • On a Fortran unit basis: assign –F f77 u:10 • On a filename basis: assign –F f77 f:filename • Common options • Clear assigns: assign -R • See current assigns in effect: assign -V
Fortran Unformatted Sequential-access Files • Cray uses a vendor specific format called COS blocked, or simply blocked • IBM (and most Unix vendors) use f77 blocking • Use –F f77 option to have the FFIO f77 blocking layer used instead of the default COS blocking: assign –F f77 u:10 • T3E already uses IEEE arithmetic, so –F f77 is sufficient • Note that default real and integer data types on the T3E are 64 bit • SV1 data needs to be converted, so an IEEE conversion layer is needed • -N ieee performs basic conversion assign –F f77 -N ieee f:filename
Fortran Unformatted Direct-access Files • Files are not blocked on Cray or IBM • Data conversion layers can be used as in sequential-access files for the SV1 machines assign -N ieee u:20 • T3E files don’t need any conversion
C Binary Files • Files are not blocked on Cray or IBM • FFIO conversion layer not easy to use • Use library routines such as cry2cri
Using FFIO to Convert a File • Isolate I/O statements for the file from program to make a simple conversion program • Pair each read with a write • Use assignto have all written data converted, or use data conversion routines
Tools on the IBM SP - NCARU Library • Library developed by the SCD at NCAR • Read COS blocked file • Convert Cray data to IEEE data • Does not use Fortran API, so program modification is required • Basic calls are crayopen, crayread, crayrew, crayback, crayclose • Calls to crayread can convert data if record is composed of one data type only, otherwise user must handle explicitly • Conversion routines are ctodpf, ctospf, ctospi • Cray Fortran I/O sometimes inserts padding, user must handle explicitly
Using the NCARU Library • To use: module load ncaru xlf -o a.out b.f $NCARU • Limitations • 2GB limit for unblocked files • Currently no 64 bit address space support • Not thread-safe • No support for 128 bit data
Dealing with Different Files • Open using blocked option to crayopen for Fortran unformatted sequential access, open with unblocked option for Fortran unformatted direct access • If written on the SV1 use conversion option on read, or call conversion routines directly • C binary filescan be read by the unblocked I/O calls or by usual C I/O followed by data conversion routines
Records with Mixed Data Types • Read into a buffer and convert items one by one real x(50) integer n(50) real*8 buffer(100) ! open in blocked mode ifc = crayopen(‘filename’,10,0) ! read record without converting nwds = crayread(ifc,buffer,100,0) ! convert data call ctospf(buffer,x,50) call ctospi(buffer(51),n,50)
Data Padding • With Cray Fortran I/O, extra bytes are inserted into the user data. • In cases where padding occurs, bytes are inserted so that any datum of length 8 bytes is at a byte offset, which is measured from the beginning of the record, that is a multiple of 8 bytes. Then the end of the record is padded so that the whole record length is a multiple of 8. • Padding will only occur if you have used character variables that are not of lengths that are a multiple of 8 or have used real*4 or integer*4 data on the T3E (on the SV1 systems, 8 bytes are used).
Example A Fortran record is written on an SV1: real a(50) integer n(50) character*17 label write(50) n, a, label The lengths of n, a,and label are 8 bytes, 8 bytes, and 17 bytes respectively. Within the Fortran record, n starts at offset 0, a at offset 400, and label at offset 800. The only padding that occurs is at the end of the record, where 7 bytes are added to make the total record length 816 bytes, which is a multiple of 8.
Example A Fortran record is written on an SV1: real a(50) integer n(50) character*17 label write(50) label, n, a Without padding, the alignments are label at offset 0, a at offset 17, and n at offset 417. Since a has elements of length 8 bytes, it must be written at an offset that is a multiple of 8 bytes; therefore a pad of 7 bytes is inserted between the end of label and the beginning of a. In the record that is written to the file, the alignments are label at offset 0, a at offset 24, and n at offset 424.
Example A Fortran record is written on the T3E: real a(40), b(40) integer*4 n(13), m(13) character*12 label write(50) label, n, a, m, b The data has lengths: label 12 bytes, n and m 52 bytes, and a and b both 320 bytes. Without padding, the alignments are label at offset 0, n at offset 12, a at offset 64, m at offset 384, and b at offset 436. a and b need to be at offsets that are a multiple of 8 bytes; the offset of a is already correct, but 4 bytes must be inserted before b, so that it starts at offset 440.
crayconv Utility • crayconv automatically converts files written on the SV1 to IBM compatible format • Basic Fortran data types only • Sequential access unformatted files only • Possible problem if compiler option -Onofastint used, or integer*8 explicitly declared and written-- Integers over 246 not correctly interpreted • Pad data not removed • Extension to T3E data and direct access unformatted files planned
More Information • http://hpcf.nersc.gov/computers/SP/ffio.html -by Mike Stewart • http://hpcf.nersc.gov/computers/crayretire.html • man ncaru