60 likes | 176 Views
TAU’s MPI Wrapper Interposition Library. Uses standard MPI Profiling Interface Provides name shifted interface MPI_Send = PMPI_Send Weak bindings Interpose TAU’s MPI wrapper library between MPI and TAU -lmpi replaced by –lTauMpi –lpmpi –lmpi No change to the source code!
E N D
TAU’s MPI Wrapper Interposition Library • Uses standard MPI Profiling Interface • Provides name shifted interface • MPI_Send = PMPI_Send • Weak bindings • Interpose TAU’s MPI wrapper library between MPI and TAU • -lmpi replaced by –lTauMpi –lpmpi –lmpi • No change to the source code! • Just re-link the application to generate performance data • setenv TAU_MAKEFILE <dir>/<arch>/lib/Makefile.tau-mpi -[options] • Use tau_cxx.sh, tau_f90.sh and tau_cc.sh as compilers
Runtime MPI Shared Library Instrumentation • We can now interpose the MPI wrapper library for applications that have already been compiled • No re-compilation or re-linking necessary! • Uses LD_PRELOAD for Linux • On AIX, TAU uses MPI_EUILIB / MPI_EUILIBPATH • Simply compile TAU with MPI support and prefix your MPI program with tauex % mpirun -np 4 tauex a.out • Requires shared library MPI - does not work on XT3 • Approach will work with other shared libraries
TAU’s MPI Wrapper Interposition Library • Uses standard MPI Profiling Interface • Provides name shifted interface • MPI_Send = PMPI_Send • Weak bindings • Interpose TAU’s MPI wrapper library between MPI and TAU • -lmpi replaced by –lTauMpi –lpmpi –lmpi • No change to the source code! Just re-link the application to generate performance data • setenv TAU_MAKEFILE <dir>/<arch>/lib/Makefile.tau-mpi-[options] • Use tau_cxx.sh, tau_f90.sh and tau_cc.sh as compilers
Automatic Instrumentation • We now provide compiler wrapper scripts • Simply replace mpxlf90 with tau_f90.sh • Automatically instruments Fortran source code, links with TAU MPI Wrapper libraries. • Use tau_cc.sh and tau_cxx.sh for C/C++ Before CXX = mpCC F90 = mpxlf90_r CFLAGS = LIBS =-lm OBJS = f1.o f2.o f3.o … fn.o app: $(OBJS) $(CXX) $(LDFLAGS) $(OBJS) -o $@ $(LIBS) .cpp.o: $(CC) $(CFLAGS) -c $< After CXX = tau_cxx.sh F90 = tau_f90.sh CFLAGS = LIBS =-lm OBJS = f1.o f2.o f3.o … fn.o app: $(OBJS) $(CXX) $(LDFLAGS) $(OBJS) -o $@ $(LIBS) .cpp.o: $(CC) $(CFLAGS) -c $<
I/O notes • Application file I/O performance often highly variable • depends on load on shared filesystem/network resources • and application/system configuration at time of measurement • tuning requires very careful extensive benchmarking • worst care performance very different from typical case • current tools don't deal well with this • Optimal I/O is no I/O! • preferable to eliminate non-essential I/O during measurement • configure tools to avoid intermediate measurement I/O (e.g., trace buffer flushes) where appropriate • configure measurement or analysis to exclude I/O phases • typically part of one-off application initialization/finalization cost which would be amortized in long production execution