1 / 20

File and Data Conversion

File and Data Conversion. Jonathan Carter NERSC User Services jcarter@nersc.gov 510-486-7514. Introduction. Converting file and data for use on the IBM SP IBM uses IEEE data representation Industry standard Fortran unformatted file structure Tools available on the Cray systems

greg
Download Presentation

File and Data Conversion

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. File and Data Conversion Jonathan Carter NERSC User Services jcarter@nersc.gov 510-486-7514

  2. Introduction • Converting file and data for use on the IBM SP • IBM uses IEEE data representation • Industry standard Fortran unformatted file structure • Tools available on the Cray systems • Tools available on the IBM SP

  3. Demand for File Conversion • Currently, CTSS text files • ctou, rlib will be available on the IBM SP • After decommissioning the Cray Systems in October 2002 • Cray Fortran unformatted files • Cray C binary files

  4. Tools on the Cray Systems - FFIO • Flexible File I/O - general system of specifying how data should be written or read • Can be used without recompiling or linking (Fortran) • Can be changed at runtime • Various layers available to convert both file structure and data • Controlled via the assign command

  5. assign Command • Can specify how I/O is done • On a Fortran unit basis: assign –F f77 u:10 • On a filename basis: assign –F f77 f:filename • Common options • Clear assigns: assign -R • See current assigns in effect: assign -V

  6. Fortran Unformatted Sequential-access Files • Cray uses a vendor specific format called COS blocked, or simply blocked • IBM (and most Unix vendors) use f77 blocking • Use –F f77 option to have the FFIO f77 blocking layer used instead of the default COS blocking: assign –F f77 u:10 • T3E already uses IEEE arithmetic, so –F f77 is sufficient • Note that default real and integer data types on the T3E are 64 bit • SV1 data needs to be converted, so an IEEE conversion layer is needed • -N ieee performs basic conversion assign –F f77 -N ieee f:filename

  7. Fortran Unformatted Direct-access Files • Files are not blocked on Cray or IBM • Data conversion layers can be used as in sequential-access files for the SV1 machines assign -N ieee u:20 • T3E files don’t need any conversion

  8. C Binary Files • Files are not blocked on Cray or IBM • FFIO conversion layer not easy to use • Use library routines such as cry2cri

  9. Using FFIO to Convert a File • Isolate I/O statements for the file from program to make a simple conversion program • Pair each read with a write • Use assignto have all written data converted, or use data conversion routines

  10. Tools on the IBM SP - NCARU Library • Library developed by the SCD at NCAR • Read COS blocked file • Convert Cray data to IEEE data • Does not use Fortran API, so program modification is required • Basic calls are crayopen, crayread, crayrew, crayback, crayclose • Calls to crayread can convert data if record is composed of one data type only, otherwise user must handle explicitly • Conversion routines are ctodpf, ctospf, ctospi • Cray Fortran I/O sometimes inserts padding, user must handle explicitly

  11. Using the NCARU Library • To use: module load ncaru xlf -o a.out b.f $NCARU • Limitations • 2GB limit for unblocked files • Currently no 64 bit address space support • Not thread-safe • No support for 128 bit data

  12. Dealing with Different Files • Open using blocked option to crayopen for Fortran unformatted sequential access, open with unblocked option for Fortran unformatted direct access • If written on the SV1 use conversion option on read, or call conversion routines directly • C binary filescan be read by the unblocked I/O calls or by usual C I/O followed by data conversion routines

  13. Records with Mixed Data Types • Read into a buffer and convert items one by one real x(50) integer n(50) real*8 buffer(100) ! open in blocked mode ifc = crayopen(‘filename’,10,0) ! read record without converting nwds = crayread(ifc,buffer,100,0) ! convert data call ctospf(buffer,x,50) call ctospi(buffer(51),n,50)

  14. Data Padding • With Cray Fortran I/O, extra bytes are inserted into the user data. • In cases where padding occurs, bytes are inserted so that any datum of length 8 bytes is at a byte offset, which is measured from the beginning of the record, that is a multiple of 8 bytes. Then the end of the record is padded so that the whole record length is a multiple of 8. • Padding will only occur if you have used character variables that are not of lengths that are a multiple of 8 or have used real*4 or integer*4 data on the T3E (on the SV1 systems, 8 bytes are used).

  15. Example A Fortran record is written on an SV1: real a(50) integer n(50) character*17 label write(50) n, a, label The lengths of n, a,and label are 8 bytes, 8 bytes, and 17 bytes respectively. Within the Fortran record, n starts at offset 0, a at offset 400, and label at offset 800. The only padding that occurs is at the end of the record, where 7 bytes are added to make the total record length 816 bytes, which is a multiple of 8.

  16. Example A Fortran record is written on an SV1: real a(50) integer n(50) character*17 label write(50) label, n, a Without padding, the alignments are label at offset 0, a at offset 17, and n at offset 417. Since a has elements of length 8 bytes, it must be written at an offset that is a multiple of 8 bytes; therefore a pad of 7 bytes is inserted between the end of label and the beginning of a. In the record that is written to the file, the alignments are label at offset 0, a at offset 24, and n at offset 424.

  17. Example A Fortran record is written on the T3E: real a(40), b(40) integer*4 n(13), m(13) character*12 label write(50) label, n, a, m, b The data has lengths: label 12 bytes, n and m 52 bytes, and a and b both 320 bytes. Without padding, the alignments are label at offset 0, n at offset 12, a at offset 64, m at offset 384, and b at offset 436. a and b need to be at offsets that are a multiple of 8 bytes; the offset of a is already correct, but 4 bytes must be inserted before b, so that it starts at offset 440.

  18. crayconv Utility • crayconv automatically converts files written on the SV1 to IBM compatible format • Basic Fortran data types only • Sequential access unformatted files only • Possible problem if compiler option -Onofastint used, or integer*8 explicitly declared and written-- Integers over 246 not correctly interpreted • Pad data not removed • Extension to T3E data and direct access unformatted files planned

  19. More Information • http://hpcf.nersc.gov/computers/SP/ffio.html -by Mike Stewart • http://hpcf.nersc.gov/computers/crayretire.html • man ncaru

More Related