1 / 31

Intel ® Cluster Tools Introduction and Hands on Sessions

Intel ® Cluster Tools Introduction and Hands on Sessions. MSU Summer S chool Intel Cluster Software and Technologies Software & Services Group July, 8 2010, MSU Moscow. Agenda. Intel Cluster Tools settings and configuration Intel MPI fabrics Message Checker ITAC Introduction

michon
Download Presentation

Intel ® Cluster Tools Introduction and Hands on Sessions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Intel ® Cluster ToolsIntroduction and Hands on Sessions MSU Summer School Intel Cluster Software and Technologies Software & Services Group July, 8 2010, MSU Moscow

  2. Agenda • Intel Cluster Tools settings and configuration • Intel MPI fabrics • Message Checker • ITAC Introduction • ITAC practice

  3. Setup configuration • Source /opt/intel/cc/11.0.74/bin/iccvars.sh intel64 • Source /opt/intel/fc/11.0.74/bin/ifortvars.sh intel64 • Source /opt/intel/impi/4.0.0.25/bin64/mpivars.sh • Source /opt/intel/itac/8.0.0.011/bin/itacvars.sh impi4

  4. Check configuration • Which icc • Which ifort • Which mpiexec • Which traceanalyzer • Echo $LD_LIBRARY_PATH • Set | grep I_MPI • Set | grep VT_

  5. Compile your first MPI application • Using Intel compilers • mpiicc, mpiicpc, mpiifort, ... • Using Gnu compilers • mpicc, mpicxx, mpif77, ... • mpiicc -o hello_ctest.c • mpiifort -o hello_ftest.f

  6. Create mpd.hosts file • Create mpd.hosts file in the working directory with list of available nodes Create mpd ring • mpdboot -r ssh -n #nodes Check mpd ring • mpdtrace

  7. Start your first application • mpiexec -n 16 ./hello_c • mpiexec -n 16 ./hello_f Kill mpd ring • mpdallexit • mpdcleanup -a Start your first application • mpirun -r ssh -n 16 ./hello_c • mpirun -r ssh -n 16 ./hello_f

  8. Alternative Process Manager • Use mpiexec.hydra for better scalability • All options are the same

  9. OFED & DAPL • OFED - OpenFabrics Enterprise Distribution http://openfabrics.org/ • DAPL - Direct Access Programming Library http://www.openfabrics.org/downloads/dapl/ • Check /etc/dat.conf • Set I_MPI_DAPL_PROVIDER=OpenIB-mlx4_0-2

  10. Fabrics selection

  11. Fabrics selection (cont.) • Use I_MPI_FABRICS to set the desired fabric • export I_MPI_FABRICS shm:tcp • mpirun -r ssh -n -env I_MPI_FABRICS shm:tcp ./a.out • DAPL varieties: • export I_MPI_FABRICS shm:dapl • export I_MPI_DAPL_PROVIDER ofa-v2-mlx4_0-1 • export I_MPI_DAPL_UD enable • Connectionless communication • Better scalability • Less memory is required

  12. Fabrics selection (cont.) • OFA fabric • export I_MPI_FABRICS shm:ofa • Multi-rail feature • export I_MPI_OFA_NUM_ADAPTERS=<n> • export I_MPI_OFA_NUM_PORTS=<n> • For OFA devices Intel® MPI Library recognizes some hardware events, can stop using one line and restore connection when a line is OK again

  13. How to get information from Intel MPI library • Use I_MPI_DEBUG env variable • Use a number from 2 to 1001 for different details level • Level 2 shows data transfer mode • Level 4 shows pinning information

  14. cpuinfo utility • Use this utility to get information about processors used in your system Intel(R) Xeon(R) Processor (Intel64 Harpertown) ===== Processor composition ===== Processors(CPUs) : 8 Packages(sockets) : 2 Cores per package : 4 Threads per core : 1 ===== Processor identification ===== Processor Thread Id. Core Id. Package Id. 0 0 0 0 1 0 0 1 2 0 1 0 3 0 1 1 4 0 2 0 5 0 2 1 6 0 3 0 7 0 3 1 ===== Placement on packages ===== Package Id. Core Id. Processors 0 0,1,2,3 0,2,4,6 1 0,1,2,3 1,3,5,7 ===== Cache sharing ===== Cache Size Processors L1 32 KB no sharing L2 6 MB (0,2)(1,3)(4,6)(5,7)

  15. Pinning • One can change default pinning settings • export I_MPI_PIN on/off • export I_MPI_PIN_DOMAIN cache2 (for hybrid) • export I_MPI_PROCESSOR_LIST allcores • export I_MPI_PROCESSOR_LIST shift=socket

  16. $ mpiicc –openmp-o ./your_app $ export OMP_NUM_THREADS=4 $ export I_MPI_FABRICS=shm:dapl $ export KMP_AFFINITY=compact $ mpirun -perhost 4 -n <N> ./a.out OpenMP and Hybrid applications • Check command line for application building • Use the thread safe version of the Intel® MPI Library (-mt_mpi option) • Use the libraries with SMP parallelization (i.e. parallel MKL) • Use –openmp compiler option to enable OpenMP* directives • Set application execution environment for hybrid applications • Set OMP_NUM_THREADS to threads number • Use –perhost option to control process pinning

  17. Intel® MPI Library and MKL • MKL creates own threads (openMP, TBB, …) • MKL from version 10.2 understands settings of Intel® MPI Library and doesn’t create more processes than cores • Use OMP_NUM_THREADS and MKL_NUM_THREADS carefully

  18. How to run a debugger • TotalView • mpirun -r ssh -tv –n # ./a.out • GDB • mpirun -r ssh-gdb–n # ./a.out • Allinea DDT (from GUI) • IDB • mpirun -r ssh-idb–n # ./a.out • You need idb available in your $PATH • Some settings are required

  19. Message Checker • Local checks: isolated to single process • Unexpected process termination • Buffer handling • Request and data type management • Parameter errors found by MPI • Global checks: all processes • Global checks for collectives and p2p ops • Data type mismatches • Corrupted data transmission • Pending messages • Deadlocks (hard & potential) • Global checks for collectives – one report per operation • Operation, size, reduction operation, root mismatch • Parameter error • Mismatched MPI_Comm_free()

  20. Message Checker (cont.) • Levels of severity: • Warnings: application can continue • Error: application can continue but almost certainly not as intended • Fatal error: application must be aborted • Some checks may find both warnings and errors • Example: CALL_FAILED check due to invalid parameter • Invalid parameter in MPI_Send() => msg cannot be sent => error • Invalid parameter in MPI_Request_free() => resource leak => warning

  21. Message Checker (cont.) • Usage model: • Recommended: • -checkoption when running an MPI job $ mpiexec–check–n 4 ./a.out • Use fail-safe version in case of crash $ mpiexec–check libVTfs.so–n 4 ./a.out • Alternatively: • -check_mpi option during link stage $ mpiicc–check_mpi –gtest.c –o a.out • Configuration • Each check can be enabled/disabled individually • set in VT_CONFIG file, e.g. to enable local checks only: CHECK ** OFF CHECK LOCAL:** ON • Change number of warnings and errors printed and/or tolerated before abort See lab/poisson_ITAC_dbglibs

  22. Trace Collector • Link with trace library: • mpiicc -trace test.c -o a.out • Run with -trace option • mpiexec -trace -n # ./a.out • Using of itcpin utility • mpirun –r ssh –n # itcpin --run -- ./a.out • Binary instrumentation • Use -tcollect link option • mpiicc -tcollecttest.c -o a.out

  23. Using Trace Collector for openMP applications • ITA can show only those threads which call MPI functions. There is very simple trick: e.g. before "#pragmaomp barrier" add MPI call: • { int size; MPI_Comm_size(MPI_COMM_WORLD, &size); } • After such modification ITA will show information about OpenMP threads. • Please remember that to support threads you need to use thread-safe MPI Library. Don't forget to set VT_MPI_DLL environment variable. •     $ set VT_MPI_DLL=impimt.dll   (for Windows) •     $ export VT_MPI_DLL=libmpi_mt.so   (for Linux)

  24. Light weight statistics ~~~~ Process 0 of 256 on node C-21-23 Data Transfers Src --> Dst Amount(MB) Transfers ----------------------------------------- 000 --> 000 0.000000e+00 0 000 --> 001 1.548767e-03 60 000 --> 002 1.625061e-03 60 000 --> 003 0.000000e+00 0 000 --> 004 1.777649e-03 60 … ========================================= Totals3.918986e+031209 Communication Activity Operation Volume(MB) Calls ----------------------------------------- P2P Csend 9.147644e-02 1160 Send 3.918895e+03 49 Collectives Barrier 0.000000e+0012 Bcast 3.051758e-05 6 Reduce 3.433228e-05 6 Allgather 2.288818e-04 30 Allreduce 4.108429e-03 97 Use I_MPI_STATS environment variable • export I_MPI_STATS # (up to 10) • export I_MPI_STATS_SCOPE p2p:csend

  25. Intel® Trace Analyzer • Generate a trace file for Game of Life • Investigate blocking Send using ITA • Change code • Look at difference

  26. Ideal Interconnect Simulator (IIS) Helps to figure out application's imbalance simulating its behavior in the "ideal communication environment" Realtrace Ideal trace

  27. Imbalance diagram Calculation MPI_Allreduce ITAC Calculation MPI_Allreduce Calculation traceidealizer Calculation • model Calculation MPI_Allreduce = load imbalance = interconnect Calculation MPI_Allreduce

  28. Trace Analyzer - Filtering

  29. mpitune utility Cluster-specific tune • Run it once after installation and each time after cluster configuration change • Best configuration is recorded for each combination of communication device, number of nodes, MPI ranks and process distribution model # Collect configuration values: $ mpitune # Reuse recorded values: $ mpiexec –tune –n 32 ./your_app • Application-specific tuning • Tune any kind of MPI application specifying its command line • By default performance is measured as inversed execution time • To reduce overall tuning time use the shortest representative application workload (if applicable) # Collect configuration settings $ mpitune –-application \”mpiexec –n 32 ./my_app\” –of ./my_app.conf Note: using of backslash and quote is mandatory # Reuse recorded values $ mpiexec -tune ./my_app.conf -n 32 ./my_app

  30. Stay tuned! • Learn more online • Intel® MPI self-help pageshttp://www.intel.com/go/mpi • Ask questions and share your knowledge • Intel® MPI Library support page http://software.intel.com/en-us/articles/intel-cluster-toolkit-support-resources/ • Intel® Software Network Forumhttp://software.intel.com/en-us/forums/intel-clusters-and-hpc-technology/

More Related