260 likes | 386 Views
Initial Design of a Test Suite for (Automatic) Performance Analysis Tools. Initial Design of a Test Suite for Automatic Performance Analysis Tools. Bernd Mohr Forschungszentrum Jülich John von Neumann - Institut für Computing Germany b.mohr@fz-juelich.de. Jesper Larsson Träff
E N D
Initial Design of a Test Suitefor (Automatic)Performance Analysis Tools Initial Design of a Test Suitefor Automatic Performance Analysis Tools Bernd Mohr Forschungszentrum Jülich John von Neumann - Institut für Computing Germany b.mohr@fz-juelich.de Jesper Larsson Träff NEC Europe Ltd. C&C Research Labs Germany traff@ccrl-nece.de
IST Working Group APART (since 1999) Automatic Performance Analysis: Resources and Tools • Forum for scientists and vendors • About 20 partners in Europe and the U.S. • http://www.fz-juelich.de/apart/ • Current Automatic Performance Tools Projects • Askalon http://www.par.univie.ac.at/project/askalon/ • Kappa-Pi http://www.caos.uab.es/kpi.html • KOJAK http://www.fz-juelich.de/zam/kojak • Paradyn http://www.cs.wisc.edu/~paradyn/ • Peridot http://wwwbode.cs.tum.edu/~gerndt/peridot/
(Full, Associated, and Former) Members • European Research Centers and Universities • U.S. Research Centers and Universities • Vendors
APART Terminologie • Performance Property • Aspect of performance behavior of an application • E.g., communication dominated by waiting time • Specified as condition referring to performance data • Quantified and normalized in terms ofbehavior-independent metric (severity) • Performance Problem • Performance property with “negative” implications • Performance Bottleneck • Performance Problem with highest severity
C SEND RECV Example: Performance Property “Message in Wrong Order” Location B SEND SEND wait RECV A Time
The APART Test Suite (ATS) • Users rely on correct working of tools • Tools need to be especially well tested • Systematic approach needed • APART Test Suite • Common project inside APART group • Every member needs this minimize resources • Ensures re-usability • Will also allow evaluation / comparison ofthe different member projects • Main focus:automatic performance analysis tools • But also useful for “regular” performance tools • http://www.fz-juelich.de/apart/ats/
Desired Functionality • Tests to determine whether the semanticsof the original program were not altered • Tests to see whether the recordedperformance data is correct • Synthetic positive test cases for each known and definedperformance property and combinations of them • Negative test cases which have no known performance problem • “Real world” size parallel applications and benchmarks • Can be partially based on existing validation suites WWW • Probably needs to be tool specific • Collect available benchmarks and applications WWW • Design and Implementation of a ATS Framework
Validation Suites and Kernel Benchmarks (I) Validation MPI test / validation suites from Intel, IBM, ANL • http://www-unix.mcs.anl.gov/mpi/mpi-test/tsuite.html MPI Benchmarks PARKBENCH (PARallel Kernels and BENCHmarks) • http://www.netlib.org/parkbench/ PMB - Pallas MPI Benchmarks • http://www.pallas.com/e/products/pmb/ SKaMPI (Special Karlsruher MPI – Benchmark) • http://liinwww.ira.uka.de/~skampi/
Kernel Benchmarks (II) OpenMP Benchmarks EPCC OpenMP Microbenchmarks • http://www.epcc.ed.ac.uk/… research/openmpbench/openmp_index.html Hybrid Benchmarks The Los Alamos MicroBenchmarks Suite (LAMB) • MPI and multi threading ( Pthreads and OpenMP) programming models based on SKaMPI and EPCC
“Real World” Applications and Benchmarks The NAS Parallel Benchmarks (NPB) • http://www.nas.nasa.gov/Software/NPB/ The ASCI Purple and Blue Benchmark Codes • http://www.llnl.gov/… asci/purple/benchmarks/limited/code_list.html… asci_benchmarks/asci/asci_code_list.html NCAR Benchmarks • http://www.scd.ucar.edu/css/software/bench/
DISTRIBUTION df_same() df_cyclic2() df_block2() df_linear() df_peak() df_cyclic3() df_block3() WORK do_work() Current Design of ATS Framework
The Distribution Module • Distribution specified by • Distribution function • Distribution parameters • All distribution function have the same signature • double distr_func (int me, int size, double sf, distr_t* dd) • me, size: member me of group of size size • sf: scaling factor • dd: distribution parameter descriptor • returns value for me calculated based on me, size, and ddscaled by sf • ATS provides set of predefined distribution functions • Can easily extended if needed
high high high high high high med med val low low low low low low Predefined Distribution Functions n same linear peak block2 cyclic2 block3 cyclic3
MPI PROPERTIES OpenMP PROPERTIES MPI UTILS OpenMP UTILS par_do_mpi_work() alloc_mpi_buf() free_mpi_buf() alloc_mpi_vbuf() free_mpi_vbuf() mpi_commpattern_sendrecv() mpi_commpattern_shift() par_do_omp_work() DISTRIBUTION df_same() df_cyclic2() df_block2() df_linear() df_peak() df_cyclic3() df_block3() WORK do_work() Current Design of ATS Framework
Example: MPI Property Function late_sender void par_do_mpi_work(distr_func_tdf, distr_t*dd, MPI_Comm c) { int me, sz; MPI_Comm_rank(c, &me); MPI_Comm_size(c, &sz); do_work(df(me, sz, 1.0, dd)); } void late_sender(double bwork, double ework, int r, MPI_Comm c) { val2_distr_t dd; int i; mpi_buf_t* buf = alloc_mpi_buf(base_type, base_cnt); dd.low = bwork+ework; dd.high = bwork; for (i = 0; i<r; ++i) { par_do_mpi_work(df_cyclic2, &dd, c); mpi_commpattern_sendrecv(buf, DIR_UP, 0, 0, c); } free_mpi_buf(buf); }
Currently Implemented Performance Property Functions • MPI Point-to-PoCommunication Performance Properties • late_sender(basework, extrawork, rf, MPI_Comm); • late_receiver(basework, extrawork, rf, MPI_Comm); • MPI Collective Communication Performance Properties • imbalance_at_mpi_barrier(distr_func, distr_param, rf, MPI_Comm); • imbalance_at_mpi_alltoall(distr_func, distr_param, rf, MPI_Comm); • late_broadcast(basework, rootextrawork, root, rf, MPI_Comm); • late_scatter(basework, rootextrawork, root, rf, MPI_Comm); • late_scatterv(basework, rootextrawork, root, rf, MPI_Comm); • early_reduce(rootwork, baseextrawork, root, rf, MPI_Comm); • early_gather(rootwork, baseextrawork, root, rf, MPI_Comm); • early_gatherv(rootwork, baseextrawork, root, rf, MPI_Comm); • OpenMP Performance Properties • imbalance_in_parallel_region(distr_func, distr_param, rf); • imbalance_at_barrier(distr_func, distr_param, rf); • imbalance_in_loop(distr_func, distr_param, rf);
TEST PROGRAMS MPI PROPERTIES OpenMP PROPERTIES MPI UTILS OpenMP UTILS par_do_mpi_work() alloc_mpi_buf() free_mpi_buf() alloc_mpi_vbuf() free_mpi_vbuf() mpi_commpattern_sendrecv() mpi_commpattern_shift() par_do_omp_work() DISTRIBUTION df_same() df_cyclic2() df_block2() df_linear() df_peak() df_cyclic3() df_block3() WORK do_work() Current Design of ATS Framework
Performance Property Test Programs • Single performance property testing • Programs can be generated automatically fromperformance property function signature • Generator based on Program Database Toolkit (PDT) • http://www.cs.uoregon.edu/research/paracomp/pdtoolkit/ • Property parameters become test program arguments • More extensive tests through scripting languagesor experiment management system (e.g., Zenturio) • http://www.par.univie.ac.at/project/zenturio/ • Composite performance property testing • Program containing multiple performance property functions • Complexity only limited by imagination • Currently: manually implemented
Example: Single Performance Property Test Program #include "mpi_pattern.h" int main(int argc, char *argv[]) { distr_func_t df = atodf("b2:0.5:1.0"); distr_t *dd = atodd("b2:0.5:1.0"); int r = 1; MPI_Init(&argc, &argv); switch ( argc ) { case 3: r = atoi(argv[2]); case 2: df = atodf(argv[1]); dd = atodd(argv[1]); case 1: break; default: fprintf(stderr, "usage: %s <distf><rfac>\n", argv[0]); break; } imbalance_at_mpi_barrier(df, dd, r, MPI_COMM_WORLD); MPI_Finalize(); }
Example: Single Performance Property Test Program • imbalance_at_mpi_barrier <distribution-spec> <repition-factor> b2:0.5:1.0 2 b2:0.1 :2.0 5 • Problem: additional property “MPI Setup/Termination Overhead” also holds!
ATS: Status and Future Work • Initial prototype available from APART website • List of MPI, OpenMP, and hybridvalidation and benchmark suites • 1st version of ATS framework including • C version of code • Single property test program generator • Future Work • More complete collection of validation and benchmark suites • Real “real world” applications • ATS Framework • Fortran version • More complete list of property functions forMPI, OpenMP, hybrid, and sequential performance properties • Documentation