1 / 85

HIGH PERFORMANCE COMPUTING : MODELS, METHODS, & MEANS PERFORMANCE MEASUREMENT & ANALYSIS

Prof. Thomas Sterling Department of Computer Science Louisiana State University March 1, 2011. HIGH PERFORMANCE COMPUTING : MODELS, METHODS, & MEANS PERFORMANCE MEASUREMENT & ANALYSIS. Contact Info. Steven R. Brandt sbrandt@cct.lsu.edu AIM: RegexGuy. Links.

nathan
Download Presentation

HIGH PERFORMANCE COMPUTING : MODELS, METHODS, & MEANS PERFORMANCE MEASUREMENT & ANALYSIS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Prof. Thomas Sterling Department of Computer Science Louisiana State University March 1, 2011 HIGH PERFORMANCE COMPUTING: MODELS, METHODS, & MEANSPERFORMANCE MEASUREMENT & ANALYSIS

  2. Contact Info • Steven R. Brandt • sbrandt@cct.lsu.edu • AIM: RegexGuy

  3. Links • http://cct.lsu.edu/~sbrandt/csc7600l15demos.zip • X-Ming: • http://www.straightrunning.com/XmingNotes/ • Scroll down, click on Xming public release and install • Putty: • http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html • Click on putty.exe and save to the desktop

  4. Topics • Introduction • Measuring System Operation • Gprof • Perfsuite • PAPI • Tau & PAPI • Benchmarks b_eff • MPI Tracing with PMPI • Tau & MPI • Summary – Material for the Test 4

  5. Topics • Introduction • Measuring System Operation • Gprof • Perfsuite • PAPI • Tau & PAPI • Benchmarks b_eff • MPI Tracing with PMPI • Tau & MPI • Summary – Material for the Test 5

  6. Opening Remarks • Up until now, 2 strategies for measuring performance: • 1) wall-clock time for user applications • 2) benchmarks for comparing • Machines of different type • Machines of different scale • But, we have identified factors that contribute to system operational performance, e.g.: • Effective use of parallelism • Cache behavior • To make better use of HPC systems, need to measure operational behavior • How the system is performing during application execution • What are the application demands and bottlenecks • Focus on SMP class system operation during this Segment • Next Segment: measuring MPP & cluster behavior 6

  7. What you’ll Need to Know • This is a skills-oriented lecture • Understand the kinds and levels of metrics of system and processor operation that you can measure • Know the kinds of tools that can expose valuable parameters of system & application operation • Hardware counters • Software instrumentation, data acquisition, and presentation • Learn the basics of how to use specific tools when running your application code • Gprof • Perfsuite • PAPI • TAU 7

  8. Final initial comments(yes, I know that’s an oxymoron) • We are only going to scratch the surface today • Try to get the basic ideas • This will expose you to a range of concepts, strategies, and tools • Lots of details will be left to future discussions • Over the next weeks, we will extend our abilities in using these tools • But don’t hesitate to read through the documentation • Hey, try some things out for yourself • You’ve got a sandbox to play in (Arete)‏ 8

  9. Topics • Introduction • Measuring System Operation • Gprof • Perfsuite • PAPI • Tau & PAPI • Benchmarks b_eff • MPI Tracing with PMPI • Tau & MPI • Summary – Material for the Test 9

  10. MP MP MP MP L1 L1 L1 L1 L2 L2 L2 L2 L3 L3 Hardware Counters • Each processor has the ability to monitor events of various kinds • Small set of registers used to count events. Very processor specific. M1 M2 Mn S PCI-e Controller JTAG Ethernet S Peripherals USB NIC NIC 10

  11. Philip J. Mucci, “Performance Analysis Tools and PAPI” UTK ICL 11

  12. Philip J. Mucci, “Performance Analysis Tools and PAPI” UTK ICL 12

  13. Hardware Events • Floating point operations, Multiplies, Adds, Multiply-Adds, etc. • L1/L2 cache hits/misses (see http://en.wikipedia.org/wiki/CPU_cache)‏ • Translation Lookaside Buffer hits/misses (virtual to physical address translation table)‏ • Branch prediction counters (pipelined systems must guess the next instruction to fetch)‏ 13

  14. A Goal: Optimization • Compile Time: • Various levels enabled by compiler options • Examine Compiler Output • Run Time (Performance Analysis): • Instrument code or execution to produce a trace • Tools to analyze trace: • Standard/basic tool is gprof, but there are many others • Note: Java Hot-Spot environment collects data about execution and uses it to optimize a program as it runs 14

  15. Performance Analysis Tools • Widely Ported Low-Level Interface to hardware counters: PAPI (Performance API)‏: Supports AIX, Linux, Solaris, and even Windows! http://icl.cs.utk.edu/papi/custom/index.html?lid=62&slid=96 • Many tools built on PAPI • Perfsuite (NCSA), psrun command • TAU (University of Oregon)‏ • etc. etc. • Useful for: • Finding performance bottlenecks • Identifying cache problems (badly sized arrays)‏ 15

  16. time • A simple Unix command to give resource usage. • Runs a specified program • time [options] command [arguments …] • Gives timing statistics about program run • The elapsed real time between invocation and termination • User CPU time • System CPU time • See: man time 16

  17. top • Gives an overview of system process status and resource usage • Provides a dynamic realtime view of a running system • System summary information • Currently managed tasks • Updates every few (e.g. 5) seconds • top –hv | -bcisS –d delay –n iterations –p pid [, pid …] • See: man top 17

  18. Basic Tools • Time $ time du -s /usr > /dev/null 2>&1real 0m34.274suser 0m0.082ssys 0m0.957s • top/ps top - 11:29:40 up 49 min, 2 users, load average: 0.32, 0.26, 0.25 Tasks: 125 total, 3 running, 121 sleeping, 0 stopped, 1 zombie Cpu(s): 4.5%us, 0.3%sy, 0.0%ni, 94.7%id, 0.2%wa, 0.3%hi, 0.0%si, 0.0%st Mem: 1030940k total, 1013376k used, 17564k free, 124616k buffers Swap: 2104472k total, 32k used, 2104440k free, 411968k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 4136 sbrandt 15 0 35208 15m 10m S 6 1.5 0:03.35 gnome-terminal 3761 root 16 0 82676 50m 12m R 3 5.0 1:02.82 X 5195 sbrandt 16 0 2176 1172 852 R 1 0.1 0:00.03 top 3487 root 17 0 1820 572 496 S 0 0.1 0:00.25 hald-addon-stor 3930 sbrandt 16 0 99.8m 40m 14m S 0 4.0 0:36.27 beagled 18

  19. Topics • Introduction • Measuring System Operation • Gprof • Perfsuite • PAPI • Tau & PAPI • Benchmarks b_eff • MPI Tracing with PMPI • Tau & MPI • Summary – Material for the Test 19

  20. gprof : quick overview • gprof • a utility which profiles procedures in programs, available in most Unix systems. • gprof provides information about : • An index for each procedure • Parent of each procedures • The percentage of CPU time utilized by a procedure and its calls. • Breakdown of time used by the procedure and its descendents • Number of times a procedure was called. • direct descendents of each procedure • To use gprof: • compile the source code with a –pg option • running the executable created generates an output file gmon.out for serial programs. • For serial programs: gprof exe gmon.out • For parallel programs, set env variable GMON_OUT_PREFIX:gprof exe gmon.out.*‏ 20

  21. GPROF: one minute tutorial • Steps to use gprof: • gcc -pg -g -o prog prog.c • ./prog • gprof prog gmon.out • More reading: http://www.cs.utah.edu/dept/old/texinfo/as/gprof.html • Finds subroutines where the most time is spent • Cannot tell you why some routines are more costly than others. Need more information... 21

  22. Demo of gprof 22

  23. Topics • Introduction • Measuring System Operation • Gprof • Perfsuite • PAPI • Tau & PAPI • Benchmarks b_eff • MPI Tracing with PMPI • Tau & MPI • Summary – Material for the Test 23

  24. Philip J. Mucci, “Performance Analysis Tools and PAPI” UTK ICL 24

  25. Using psrun • psrun cmd (e.g. psrun du -s /usr)‏ • This test will measure performance counters used by the du command. No special compilation of ls is required for this to work. • psprocess cmd.* (e.g. psprocess du.*.xml)‏ • At the bottom of this file, you will see summary events about numerous counters. 25

  26. Demo of psrun 26

  27. Topics • Introduction • Measuring System Operation • Gprof • Perfsuite • PAPI • Tau & PAPI • Benchmarks b_eff • MPI Tracing with PMPI • Tau & MPI • Summary – Material for the Test 27

  28. Philip J. Mucci, “Performance Analysis Tools and PAPI” UTK ICL 28

  29. Philip J. Mucci, “Performance Analysis Tools and PAPI” UTK ICL 29

  30. Philip J. Mucci, “Performance Analysis Tools and PAPI” UTK ICL 30

  31. Philip J. Mucci, “Performance Analysis Tools and PAPI” UTK ICL 31

  32. Philip J. Mucci, “Performance Analysis Tools and PAPI” UTK ICL 32

  33. Philip J. Mucci, “Performance Analysis Tools and PAPI” UTK ICL 33

  34. Philip J. Mucci, “Performance Analysis Tools and PAPI” UTK ICL 34

  35. Philip J. Mucci, “Performance Analysis Tools and PAPI” UTK ICL 35

  36. Philip J. Mucci, “Performance Analysis Tools and PAPI” UTK ICL 36

  37. Philip J. Mucci, “Performance Analysis Tools and PAPI” UTK ICL 37

  38. Philip J. Mucci, “Performance Analysis Tools and PAPI” UTK ICL 38

  39. Philip J. Mucci, “Performance Analysis Tools and PAPI” UTK ICL 39

  40. Philip J. Mucci, “Performance Analysis Tools and PAPI” UTK ICL 40

  41. Philip J. Mucci, “Performance Analysis Tools and PAPI” UTK ICL 41

  42. By hand: Verifying the PAPI Version // When hand-instrumenting you need to check #include <papi.h> ... /* Verifying PAPI Version */ int v = PAPI_library_init(PAPI_VER_CURRENT); if(v != PAPI_VER_CURRENT) { fprintf(stderr,"Bad PAPI version\n"); exit(2); } 42

  43. By Hand: Measuring PAPI Counters • Use "papi_avail -a" to identify counters • Link with -lpapi 43

  44. Demo: Hand instrumentation with PAPI 44

  45. Statistical profiling • profil() - Unix command to examine program to periodically examine program counter. Identify subroutines where code spends most time. • Used by Gprof • PAPI_profil() - Emulates profil(), but looks at a specific hardware counter. Identifies file/line where code spends most time. 45

  46. Using psrun to find hot spots • gcc -g -o cmd cmd.c • psrun -C -c papi_profile_cycles.xml cmd • "-C" Instructs papi to use xml configurations that are in the install path rather than current directory. • "-c papi_profile_cycles.xml" Use the named config file rather than the default. • "papi_profile_cycles.xml" directs papi to collect file/line data. • psprocess cmd.*.xml • display results 46

  47. Demo : 2nd Demo of psrun 47

  48. Topics • Introduction • Measuring System Operation • Gprof • Perfsuite • PAPI • Tau & PAPI • Benchmarks b_eff • MPI Tracing with PMPI • Tau & MPI • Summary – Material for the Test 48

  49. Philip J. Mucci, “Performance Analysis Tools and PAPI” UTK ICL 49

  50. Philip J. Mucci, “Performance Analysis Tools and PAPI” UTK ICL 50

More Related