HIGH PERFORMANCE COMPUTING : MODELS, METHODS, & MEANS PARALLEL FILE I/O 3 LIBRARIES 2

Prof. Thomas Sterling Prof. Hartmut Kaiser Department of Computer Science Louisiana State University April 12th, 2011 HIGH PERFORMANCE COMPUTING: MODELS, METHODS, & MEANSPARALLEL FILE I/O 3LIBRARIES 2

Puzzle of the Day I thought the following C program is perfectly valid (after reading about the comma operator in C). But there is a mistake in the following program, can you identify it? #include <stdio.h> int main() { int a = 1, 2; printf("a : %d\n",a); return 0; }

Puzzle of the Day I thought the following C program is perfectly valid (after reading about the comma operator in C). But there is a mistake in the following program, can you identify it? #include <stdio.h> int main() { int a = (1, 2); printf("a : %d\n",a); return 0; } Comma operator has lowest precendence, even lower than assignment.

Topics Introduction Scientific I/O Interface: netCDF Scientific Data Package: HDF5 Application domain specific libraries SMP programming support: CILK, TBB

NetCDF: Introduction Stands for Network Common Data Form Portable format to represent scientific data Developed at the Unidata Program Center in Boulder, Colorado, with many contributions from user community Project page hosted by the Unidata program at University Corporation for Atmospheric Research (UCAR): http://www.unidata.ucar.edu/software/netcdf/ Provides a set of interfaces for array-oriented data access and a collection of data access libraries for C, Fortan (77 and 90), C++, Java, Perl, Python, and other languages Available on UNIX and Windows platforms Features simple programming interface Supports large data files (and 64-bit offsets) Open source, freely available Commonly used file extension is “.nc” (changed from “.cdf” to avoid confusion with other formats) Current stable release is version 4.1.2 (released on March 29, 2011) Used extensively by a number of climate modeling, land and atmosphere, marine, naval data storage, satellite data processing, theoretical physics centers, geological institutes, commercial analysis, universities, as well as other research institutions in over 30 countries

NetCDF Rationale To facilitate the use of common datasets by distinct applications Permit datasets to be transported between or shared by dissimilar computers transparently, i.e., without translation (automatic handling of different data types, endian-ness, etc.) Reduce the programming effort usually spent interpreting formats Reduce errors arising from misinterpreting data and ancillary data Facilitate using output from one application as input to another Establish an interface standard which simplifies the inclusion of new software into already existing application set (originally: Unidata system) However: not another DBMS!

Key Properties of NetCDF Format • Self-describing • A netCDF file includes information about the data it contains • Portable • Files are accessible by computers that use different ways of representing and storing of integers, floating-point numbers and characters • Direct-access • Enabling an efficient access to small subsets of a large dataset without the need to read through all preceding data • Appendable • Additional data may be appended to a properly structured netCDF file without copying the dataset or redefining its structure • Sharable • One writer and multiple readers may simultaneously access the same netCDF file • Archivable • Access to all earlier forms of netCDF data will be supported by current and future versions of the software

NetCDF Dataset Building Blocks • Data in netCDF are represented as n-dimensional arrays, with n being 0, 1, 2, … (scalars are 0-dimensional arrays) • Array elements are of the same data type • Three basic entities: • Dimension: has name and length; one dimension per array may be UNLIMITED for unbounded arrays • Variable: identifies array of values of the same type (byte, character, short, int, float, or double) • In addition, coordinate variables may be named identically to dimensions, and by convention define physical coordinate set corresponding to that dimension • Attribute: provides additional information about a variable, or global properties of a dataset • There are established conventions for attribute names, e.g., unit, long_name, valid_range, etc. • Multiple attributes per dataset are allowed • The only kind of data structures supported by netCDF classic are collections of named arrays with attached vector attributes

Common Data Form Language (CDL) netcdf example_1 { // example of CDL notation for a netCDF dataset dimensions: // dimension names and lengths are declared first lat = 5, lon = 10, level = 4, time = unlimited; variables: // variable types, names, shapes, attributes float temp(time,level,lat,lon); temp:long_name = "temperature"; temp:units = "celsius"; int lat(lat), lon(lon), level(level); lat:units = "degrees_north"; lon:units = "degrees_east"; level:units = "millibars"; short time(time); time:units = "hours since 1996-1-1"; // global attributes :source = "Fictional Model Output"; data: // optional data assignments level = 1000, 850, 700, 500; lat = 20, 30, 40, 50, 60; lon = -160,-140,-118,-96,-84,-52,-45,-35,-25,-15; time = 12; } NetCDF uses CDL to provide a way to describe data model CDL represents the information stored in binary netCDF files in a human-readable form, e.g.:

NetCDF Utilities • ncgen • takes input in CDL format and creates a netCDF file, or a C or Fortran program that creates a netCDF dataset ncgen [-b] [-o netcdf-file] [-c] [-f] [-k kind] [-x] [input-file] • ncdump • generates the CDL text representation of a netCDF dataset on standard output, optionally including some or all variable data • Output from ncdump is an acceptable input to ncgen ncdump [-c|-h] [-v var1,…] [-b lang] [-f lang] [-l len] [-p fdig[,ddig]] [-n name] [-k] [input-file]

NetCDF API: Create a Dataset #include <netcdf.h> ... int status; intncid; ... status = nc_create("foo.nc", NC_NOCLOBBER, &ncid); if (status != NC_NOERR) handle_error(status);

NetCDF API: Open a Dataset #include <netcdf.h> ... int status; intncid; ... status = nc_open("foo.nc", 0, &ncid); if (status != NC_NOERR) handle_error(status);

NetCDF API: Create a Dimension #include <netcdf.h> ... int status, ncid, latid, recid; ... status = nc_create("foo.nc", NC_NOCLOBBER, &ncid); if (status != NC_NOERR) handle_error(status); ... status = nc_def_dim(ncid, "lat", 18L, &latid); if (status != NC_NOERR) handle_error(status); status = nc_def_dim(ncid, "rec", NC_UNLIMITED, &recid); if (status != NC_NOERR) handle_error(status);

NetCDF API: Create a Variable #include <netcdf.h> int status, ncid; /* error status and dataset ID */ intlat_dim, lon_dim, time_dim; /* dimension IDs */ intrh_id, rh_dimids[3]; /* variable ID and shape */ ... status = nc_create("foo.nc", NC_NOCLOBBER, &ncid); if (status != NC_NOERR) handle_error(status); /* define dimensions */ status = nc_def_dim(ncid, "lat", 5L, &lat_dim); if (status != NC_NOERR) handle_error(status); status = nc_def_dim(ncid, "lon", 10L, &lon_dim); if (status != NC_NOERR) handle_error(status); status = nc_def_dim(ncid, "time", NC_UNLIMITED, &time_dim); if (status != NC_NOERR) handle_error(status); /* define variable */ rh_dimids[0] = time_dim; rh_dimids[1] = lat_dim; rh_dimids[2] = lon_dim; status = nc_def_var(ncid, "rh", NC_DOUBLE, 3, rh_dimids, &rh_id); if (status != NC_NOERR) handle_error(status);

NetCDF API: Leave Define Mode #include <netcdf.h> ... int status; intncid; ... status = nc_create("foo.nc", NC_NOCLOBBER, &ncid); if (status != NC_NOERR) handle_error(status); ... /* create dimensions, variables, attributes */ ... status = nc_enddef(ncid); /*leave define mode*/ if (status != NC_NOERR) handle_error(status);

NetCDF API: Quering Variable Information

NetCDF API: Variable Information (Example) #include <netcdf.h> ... int status; /* error status */ intncid; /* netCDF ID */ intrh_id; /* variable ID */ nc_typerh_type; /* variable type */ intrh_ndims; /* number of dims */ intrh_dimids[NC_MAX_VAR_DIMS]; /* dimension ids */ intrh_natts; /* number of attributes */ ... status = nc_open("foo.nc", NC_NOWRITE, &ncid); if (status != NC_NOERR) handle_error(status); ... status = nc_inq_varid(ncid, "rh", &rh_id); if (status != NC_NOERR) handle_error(status); /* we don't need name, since we already know it */ status = nc_inq_var(ncid, rh_id, 0, &rh_type, &rh_ndims, rh_dimids, &rh_natts); if (status != NC_NOERR) handle_error(status);

NetCDF API: Read a Variable

NetCDF API: Read a Variable (Example) #include <netcdf.h> ... #define TIMES 3 #define LATS 5 #define LONS 10 int status; /* error status */ intncid; /* netCDF ID */ intrh_id; /* variable ID */ double rh_vals[TIMES*LATS*LONS]; /* array to hold values */ ... status = nc_open("foo.nc", NC_NOWRITE, &ncid); if (status != NC_NOERR) handle_error(status); ... status = nc_inq_varid(ncid, "rh", &rh_id); if (status != NC_NOERR) handle_error(status); ... /* read values from netCDF variable */ status = nc_get_var_double(ncid, rh_id, rh_vals); if (status != NC_NOERR) handle_error(status);

NetCDF API: Write a Variable

NetCDF API: Write a Variable (Example) #include <netcdf.h> ... #define TIMES 3 #define LATS 5 #define LONS 10 int status; /* error status */ intncid; /* netCDF ID */ intrh_id; /* variable ID */ double rh_vals[TIMES*LATS*LONS]; /* array to hold values */ inti; ... status = nc_open("foo.nc", NC_WRITE, &ncid); if (status != NC_NOERR) handle_error(status); ... status = nc_inq_varid(ncid, "rh", &rh_id); if (status != NC_NOERR) handle_error(status); ... for (i = 0; i < TIMES*LATS*LONS; i++) rh_vals[i] = 0.5; /* write values into netCDF variable */ status = nc_put_var_double(ncid, rh_id, rh_vals); if (status != NC_NOERR) handle_error(status);

NetCDF API: Close a Dataset #include <netcdf.h> ... int status; intncid; ... status = nc_create("foo.nc", NC_NOCLOBBER, &ncid); if (status != NC_NOERR) handle_error(status); ... /* create dimensions, variables, attributes */ ... status = nc_close(ncid); /* close netCDF dataset */ if (status != NC_NOERR) handle_error(status);

Parallel NetCDF Source: http://www-unix.mcs.anl.gov/parallel-netcdf/pnetcdf-sc2003.pdf Possible usage scenarios on parallel computers • Serial netCDF to access single files from a single process • Multiple files accessed concurrently and independently through serial netCDF API • Parallel netCDF API to access single files cooperatively or collectively

PnetCDF Implementation Source: http://www-unix.mcs.anl.gov/parallel-netcdf/pnetcdf-sc2003.pdf • Available from: http://trac.mcs.anl.gov/projects/parallel-netcdf • Library layer between user space and file system space • Processes parallel I/O requests from compute nodes, optimizes them, and passes them down to the MPI-IO library • Advantages: • Optimized for the netCDF file format • Regular and predictable data patterns in netCDF compatible with MPI-IO interface • Low overhead of header I/O (local header copies viable) • Well defined metadata creation phase • no need for collective I/O when accessing individual objects • Disadvantages: • No hierarchical data layout • Additions of data and header extensions are costly after file creation due to linear layout order • No support for combining of multiple files in memory (like HDF5 software mounting) • NetCDF source required for installation

PnetCDF Sample Calling Sequence Source: http://www-unix.mcs.anl.gov/parallel-netcdf/sc03_present.pdf

Topics Scientific I/O Interface: netCDF Scientific Data Package: HDF5 Application domain specific libraries SMP programming support: CILK, TBB

Introduction to HDF5 • Acronym for Hierarchical Data Format, a portable, freely distributable, and well supported library, file format, and set of utilities to manipulate it • Explicitly designed for use with scientific data and applications • Initial HDF version was created at NCSA/University of Illinois at Urbana-Champaign in 1988 • First revision in widespread use was HDF4 • Main HDF features include: • Versatility: supports different data models and associated metadata • Self-describing: allows an application to interpret the structure and contents of a file without any extraneous information • Flexibility: permits mixing and grouping various objects together in one file in a user-defined hierarchy • Extensibility: accommodates new data models, added both by the users and developers • Portability: can be shared across different platforms without preprocessing or modifications • HDF5 is the most recent incarnation of the format, adding support for new type and data models, parallel I/O and streaming, and removing a number of existing restrictions (maximal file size, number of objects per file, flexibility of type use, storage management configurability, etc.), as well as improving the performance

HDF5 File Layout Low-level organization User’s view Major object classes: groups and datasets Namespace resembles file system directory hierarchy (groups ≡ directories, datasets ≡ files) Alias creation supported through links (both soft and hard) Mounting of sub-hierachies is possible

HDF5 API & Tools Command-line utilities • h5cc, h5c++, h5fc: C, C++ and Fortran compiler wrappers • h5redeploy: updates compiler tools after installation in new location • h5ls, h5dump: lists hierarchy and contents of a HDF5 file • h5diff: compares two HDF5 files • h5repack, h5repart: rearranges or repartitions a file • h5toh4, h4toh5: converts between HDF5 and HDF4 formats • h5import: imports data into HDF5 file • gif2h5, h52gif: converts image data between gif and HDF5 formats Library functionality grouped by function name prefix H5: general purpose functions H5A: attribute interface H5D: dataset manipulation H5E: error handling H5F: file interface H5G: group creation and access H5I: object identifiers H5P: property lists H5R: references H5S: dataspace definition H5T: datatype manipulation H5Z: inline data filters and compression

Basic HDF5 Concepts • Group • Structure containing zero or more HDF5 objects (possibly other groups) • Provides a mechanism for mapping a name (path) to an object • “Root” group is a logical container of all other objects in a file • Dataset • A named array of data elements (possibly multi-dimensional) • Specifies the representation of the dataset the way it will be stored in HDF5 file through associated datatype and dataspace parameters • Dataspace • Defines dimensionality of a dataset (rank and dimension sizes) • Determines the effective subset of data to be stored or retrieved in subsequent file operations (aka selection) • Datatype • Describes atomically accessed element of a dataset • Permits construction of derived (compound) types, such as arrays, records, enumerations • Influences conversion of numeric values between different platforms or implementations • Attribute • A small, user-defined structure attached to a group, dataset or named datatype, providing additional information

HDF5 Spatial Subset Examples Source: http://hdf.ncsa.uiuc.edu/HDF5/RD100-2002/All_About_HDF5.pdf

HDF5 Virtual File Layer Source: http://hdf.ncsa.uiuc.edu/HDF5/RD100-2002/All_About_HDF5.pdf Developed to cope with large number of available storage subsystem variations Permits custom file driver implementations and related optimizations

Overview of Data Storage Options Source: http://hdf.ncsa.uiuc.edu/HDF5/RD100-2002/All_About_HDF5.pdf

Simultaneous Spatial and Type Transformation Example Source: http://hdf.ncsa.uiuc.edu/HDF5/RD100-2002/All_About_HDF5.pdf

Simple HDF5 Code Example /* Writing and reading an existing dataset. */ #include "hdf5.h" #define FILE "dset.h5" int main() { hid_tfile_id, dataset_id; /* identifiers */ herr_t status; inti, j, dset_data[4][6]; /* Initialize the dataset. */ for (i = 0; i < 4; i++) for (j = 0; j < 6; j++) dset_data[i][j] = i * 6 + j + 1; /* Open an existing file. */ file_id = H5Fopen(FILE, H5F_ACC_RDWR, H5P_DEFAULT); /* Open an existing dataset. */ dataset_id = H5Dopen(file_id, "/dset"); /* Write the dataset. */ status = H5Dwrite(dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, dset_data); status = H5Dread(dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, dset_data); /* Close the dataset. */ status = H5Dclose(dataset_id); /* Close the file. */ status = H5Fclose(file_id); }

Parallel HDF5 • Relies on MPI-IO as the file layer driver • Uses MPI for internal communications • Most of the functionality controlled through property lists (requires minimal HDF5 interface changes) • Supports both individual and collective file access • Three raw data storage layouts: contiguous, chunking and compact • Enables additional optimizations through derived MPI datatypes (esp. for regular collective accesses) • Limitations • Chunked storage with overlapping chunks (results non-deterministic) • Read-only compression • Writes with variable length datatypes not supported

Topics Scientific I/O Interface: netCDF Scientific Data Package: HDF5 Applicationdomainspecificlibraries SMP programming support: CILK, TBB

Application domains • Linear algebra • BLAS, ATLAS, LAPACK, ScaLAPACK, Slatec, pim • Ordinary and Partial Differential Equations • PETSc • Mesh Manipulation and Load Balancing • METIS, ParMETIS, CHACO, JOSTLE, PARTY • Graph Manipulation • Boost.Graph library • Vector/Signal/Image Processing • VSIPL, PESSL (IBMs Parallel Engineering Scientific Subroutine Library • General Parallelization • MPI, pthreads • Other Domain Specific Libraries • NAMD, NWChem, Fluent, Gaussian, LS-DYNA

Application Domain Overview • Linear Algebra Libraries • Provide optimized methods for constructing sets of linear equations, performing operations on them (matrix-matrix products, matrix-vector products) and solving them (factoring, forward & backward substitution. • Commonly used libraries include BLAS, ATLAS, LAPACK, ScaLAPACK, PaLAPACK • PDE Solvers: • General-purpose, parallel numerical PDE libraries • Usual toolsets include manipulation of sparse data structures, iterative linear system solvers, preconditioners, nonlinear solvers and time-stepping methods. • Commonly used libraries for solving PDEs include SAMRAI, PETSc, PARASOL, Overture, among others.

Application Domain Overview • Mesh manipulation and Load Balancing • These libraries help in partitioning meshes in roughly equal sizes across processors, thereby balancing the workload while minimizing size of separators and communication costs. • Commonly used libraries for this purpose include METIS, ParMetis, Chaco, JOSTLE among others. • Other packages: • FFTW: features highly optimized Fourier transform package including both real and complex multidimensional transforms in sequential, multithreaded, and parallel versions. • NAMD: molecular dynamics library available for Unix/Linux, Windows, OS X • Fluent: computational fluid dynamics package, used for such applications as environment control systems, propulsion, reactor modeling etc.

CBLAS Matrix Multiplication #include <stdio.h> #include <cblas.h> #define N (sizeof(x)/sizeof(x[0])) #define M ((sizeof(A)/sizeof(A[0]))/(N)) int main(intargc, char **argv) { unsigned inti; double A[] = {1.0, 0.1, 0.01, 0.0, 0.0, 10., 1.0, 0.1, 0.01, 0.0, 0.0, 10., 1.0, 0.1, 0.01, 0.0, 0.0, 10., 1.0, 0.1}; double x[] = {1, 2, 3, 4, 5}, y[M]; /* y = 2.0*A*x + 0.0*y */ cblas_dgemv(CblasRowMajor, CblasNoTrans, M, N, 2.0, A, N, x, 1, 0.0, y, 1); /* print out the result */ for (i = 0; i < M; i++) printf("y[%d] = %6.2lf\n", i, y[i]); }

uBLAS Matrix Multiplication 1 #include <iostream> #include <boost/numeric/ublas/blas.hpp> using namespace boost::numeric; int main(intargc, char **argv) { ublas::matrix<double> A (4, 5); A(0, 0) = 1.0; A(0, 1) = 0.1; A(0, 2) = 0.01; A(0, 3) = 0.0; A(0, 4) = 0.0; A(1, 0) = 10.0; A(1, 1) = 1.0; A(1, 2) = 0.1; A(1, 3) = 0.01; A(1, 4) = 0.0; A(2, 0) = 0.0; A(2, 1) = 10.0; A(2, 2) = 1.0; A(2, 3) = 0.1; A(2, 4) = 0.01; A(3, 0) = 0.0; A(3, 1) = 0.0; A(3, 2) = 10.0; A(3, 3) = 1.0; A(3, 4) = 0.1; ublas::vector<double> x(5); x[0] = 1; x[1] = 2; x[2] = 3; x[3] = 4; x[4] = 5; // y = 2.0*A*x ublas::vector<double> y = prod(2*A, x); for (unsigned inti = 0; i < y.size(); ++i) std::cout << "y[" << i << "] = " << y[i] << std::endl; return 0; }

uBLAS Matrix Multiplication 2 #include <iostream> #include <boost/numeric/ublas/blas.hpp> using namespace boost::numeric; int main(intargc, char **argv) { ublas::banded_matrix<double> A (4, 5, 1, 2); A(0, 0) = 1.0; A(0, 1) = 0.1; A(0, 2) = 0.01; A(1, 0) = 10.0; A(1, 1) = 1.0; A(1, 2) = 0.1; A(1, 3) = 0.01; A(2, 1) = 10.0; A(2, 2) = 1.0; A(2, 3) = 0.1; A(2, 4) = 0.01; A(3, 2) = 10.0; A(3, 3) = 1.0; A(3, 4) = 0.1; ublas::vector<double> x(5); x[0] = 1; x[1] = 2; x[2] = 3; x[3] = 4; x[4] = 5; // y = 2.0*A*x ublas::vector<double> y = prod(2*A, x); for (unsigned inti = 0; i < y.size(); ++i) std::cout << "y[" << i << "] = " << y[i] << std::endl; return 0; }

NewMath Matrix Multiplication #include <iostream> #include <newmatap.h> // need matrix applications using namespace NEWMAT; int main(intargc, char **argv) { Matrix A (4, 5); double d[] = { 1.0, 0.1, 0.01, 0.0, 0.0, 10.0, 1.0, 0.1, 0.01, 0.0, 0.0, 10.0, 1.0, 0.1, 0.01, 0.0, 0.0, 10.0, 1.0, 0.1, }; A << d; ColumnVector x(5); double xv[] = { 1.0, 2.0, 3.0, 4.0, 5.0 }; x << xv; // y = 2.0*A*x ColumnVector y = 2 * A * x; for (int i = 1; i <= y.Nrows(); ++i) std::cout << "y[" << i << "] = " << y(i) << std::endl; return 0; }

Blitz++ Blitz++ is a C++ class library for scientific computing which provides performance on par with Fortran 77/90. It uses template meta-programming techniques to achieve high performance The current versions provide dense arrays and vectors, random number generators, and small vectors and matrices. Blitz++ is distributed freely under an open source license

Blitz++ Example Transforms a vector y using a principal rotation about the third axis:

Blitz++ Example // Blitz++ code void transform(double alpha, TinyVector<double, 3>& x, const TinyVector<double, 3>& y) { double cosa = cos(alpha), sina = sin(alpha); // Create the principal rotation matrix C_3(alpha) TinyMatrix<double, 3, 3> C; C = cosa, -sina, 0.0, sina, cosa, 0.0, 0.0, 0.0, 1.0; x = C * y; }

Blitz++ Example (cont.) // low-level implementation void transform2(double alpha, double* x, double* y) { double c = cos(alpha), s = sin(alpha); x[0] = c * y[0] - s * y[1]; x[1] = s * y[0] + c * y[1]; x[2] = y[2]; }

Blitz++ Example: Results

Outlook Elliptic PDE discretized by Finite Volume Functional specification with a Domain Specific Embedded Language (DSEL) equation = sum<vertex_edge> [ sumf<edge_vertex>(0.0, _e) [ pot * orient(_e, _1) ] * A / d * eps ] - V * rho References: [1]

HIGH PERFORMANCE COMPUTING : MODELS, METHODS, & MEANS PARALLEL FILE I/O 3 LIBRARIES 2