860 likes | 1.04k Views
Prof. Thomas Sterling Department of Computer Science Louisiana State University March 1, 2011. HIGH PERFORMANCE COMPUTING : MODELS, METHODS, & MEANS PERFORMANCE MEASUREMENT & ANALYSIS. Contact Info. Steven R. Brandt sbrandt@cct.lsu.edu AIM: RegexGuy. Links.
E N D
Prof. Thomas Sterling Department of Computer Science Louisiana State University March 1, 2011 HIGH PERFORMANCE COMPUTING: MODELS, METHODS, & MEANSPERFORMANCE MEASUREMENT & ANALYSIS
Contact Info • Steven R. Brandt • sbrandt@cct.lsu.edu • AIM: RegexGuy
Links • http://cct.lsu.edu/~sbrandt/csc7600l15demos.zip • X-Ming: • http://www.straightrunning.com/XmingNotes/ • Scroll down, click on Xming public release and install • Putty: • http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html • Click on putty.exe and save to the desktop
Topics • Introduction • Measuring System Operation • Gprof • Perfsuite • PAPI • Tau & PAPI • Benchmarks b_eff • MPI Tracing with PMPI • Tau & MPI • Summary – Material for the Test 4
Topics • Introduction • Measuring System Operation • Gprof • Perfsuite • PAPI • Tau & PAPI • Benchmarks b_eff • MPI Tracing with PMPI • Tau & MPI • Summary – Material for the Test 5
Opening Remarks • Up until now, 2 strategies for measuring performance: • 1) wall-clock time for user applications • 2) benchmarks for comparing • Machines of different type • Machines of different scale • But, we have identified factors that contribute to system operational performance, e.g.: • Effective use of parallelism • Cache behavior • To make better use of HPC systems, need to measure operational behavior • How the system is performing during application execution • What are the application demands and bottlenecks • Focus on SMP class system operation during this Segment • Next Segment: measuring MPP & cluster behavior 6
What you’ll Need to Know • This is a skills-oriented lecture • Understand the kinds and levels of metrics of system and processor operation that you can measure • Know the kinds of tools that can expose valuable parameters of system & application operation • Hardware counters • Software instrumentation, data acquisition, and presentation • Learn the basics of how to use specific tools when running your application code • Gprof • Perfsuite • PAPI • TAU 7
Final initial comments(yes, I know that’s an oxymoron) • We are only going to scratch the surface today • Try to get the basic ideas • This will expose you to a range of concepts, strategies, and tools • Lots of details will be left to future discussions • Over the next weeks, we will extend our abilities in using these tools • But don’t hesitate to read through the documentation • Hey, try some things out for yourself • You’ve got a sandbox to play in (Arete) 8
Topics • Introduction • Measuring System Operation • Gprof • Perfsuite • PAPI • Tau & PAPI • Benchmarks b_eff • MPI Tracing with PMPI • Tau & MPI • Summary – Material for the Test 9
MP MP MP MP L1 L1 L1 L1 L2 L2 L2 L2 L3 L3 Hardware Counters • Each processor has the ability to monitor events of various kinds • Small set of registers used to count events. Very processor specific. M1 M2 Mn S PCI-e Controller JTAG Ethernet S Peripherals USB NIC NIC 10
Philip J. Mucci, “Performance Analysis Tools and PAPI” UTK ICL 11
Philip J. Mucci, “Performance Analysis Tools and PAPI” UTK ICL 12
Hardware Events • Floating point operations, Multiplies, Adds, Multiply-Adds, etc. • L1/L2 cache hits/misses (see http://en.wikipedia.org/wiki/CPU_cache) • Translation Lookaside Buffer hits/misses (virtual to physical address translation table) • Branch prediction counters (pipelined systems must guess the next instruction to fetch) 13
A Goal: Optimization • Compile Time: • Various levels enabled by compiler options • Examine Compiler Output • Run Time (Performance Analysis): • Instrument code or execution to produce a trace • Tools to analyze trace: • Standard/basic tool is gprof, but there are many others • Note: Java Hot-Spot environment collects data about execution and uses it to optimize a program as it runs 14
Performance Analysis Tools • Widely Ported Low-Level Interface to hardware counters: PAPI (Performance API): Supports AIX, Linux, Solaris, and even Windows! http://icl.cs.utk.edu/papi/custom/index.html?lid=62&slid=96 • Many tools built on PAPI • Perfsuite (NCSA), psrun command • TAU (University of Oregon) • etc. etc. • Useful for: • Finding performance bottlenecks • Identifying cache problems (badly sized arrays) 15
time • A simple Unix command to give resource usage. • Runs a specified program • time [options] command [arguments …] • Gives timing statistics about program run • The elapsed real time between invocation and termination • User CPU time • System CPU time • See: man time 16
top • Gives an overview of system process status and resource usage • Provides a dynamic realtime view of a running system • System summary information • Currently managed tasks • Updates every few (e.g. 5) seconds • top –hv | -bcisS –d delay –n iterations –p pid [, pid …] • See: man top 17
Basic Tools • Time $ time du -s /usr > /dev/null 2>&1real 0m34.274suser 0m0.082ssys 0m0.957s • top/ps top - 11:29:40 up 49 min, 2 users, load average: 0.32, 0.26, 0.25 Tasks: 125 total, 3 running, 121 sleeping, 0 stopped, 1 zombie Cpu(s): 4.5%us, 0.3%sy, 0.0%ni, 94.7%id, 0.2%wa, 0.3%hi, 0.0%si, 0.0%st Mem: 1030940k total, 1013376k used, 17564k free, 124616k buffers Swap: 2104472k total, 32k used, 2104440k free, 411968k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 4136 sbrandt 15 0 35208 15m 10m S 6 1.5 0:03.35 gnome-terminal 3761 root 16 0 82676 50m 12m R 3 5.0 1:02.82 X 5195 sbrandt 16 0 2176 1172 852 R 1 0.1 0:00.03 top 3487 root 17 0 1820 572 496 S 0 0.1 0:00.25 hald-addon-stor 3930 sbrandt 16 0 99.8m 40m 14m S 0 4.0 0:36.27 beagled 18
Topics • Introduction • Measuring System Operation • Gprof • Perfsuite • PAPI • Tau & PAPI • Benchmarks b_eff • MPI Tracing with PMPI • Tau & MPI • Summary – Material for the Test 19
gprof : quick overview • gprof • a utility which profiles procedures in programs, available in most Unix systems. • gprof provides information about : • An index for each procedure • Parent of each procedures • The percentage of CPU time utilized by a procedure and its calls. • Breakdown of time used by the procedure and its descendents • Number of times a procedure was called. • direct descendents of each procedure • To use gprof: • compile the source code with a –pg option • running the executable created generates an output file gmon.out for serial programs. • For serial programs: gprof exe gmon.out • For parallel programs, set env variable GMON_OUT_PREFIX:gprof exe gmon.out.* 20
GPROF: one minute tutorial • Steps to use gprof: • gcc -pg -g -o prog prog.c • ./prog • gprof prog gmon.out • More reading: http://www.cs.utah.edu/dept/old/texinfo/as/gprof.html • Finds subroutines where the most time is spent • Cannot tell you why some routines are more costly than others. Need more information... 21
Topics • Introduction • Measuring System Operation • Gprof • Perfsuite • PAPI • Tau & PAPI • Benchmarks b_eff • MPI Tracing with PMPI • Tau & MPI • Summary – Material for the Test 23
Philip J. Mucci, “Performance Analysis Tools and PAPI” UTK ICL 24
Using psrun • psrun cmd (e.g. psrun du -s /usr) • This test will measure performance counters used by the du command. No special compilation of ls is required for this to work. • psprocess cmd.* (e.g. psprocess du.*.xml) • At the bottom of this file, you will see summary events about numerous counters. 25
Topics • Introduction • Measuring System Operation • Gprof • Perfsuite • PAPI • Tau & PAPI • Benchmarks b_eff • MPI Tracing with PMPI • Tau & MPI • Summary – Material for the Test 27
Philip J. Mucci, “Performance Analysis Tools and PAPI” UTK ICL 28
Philip J. Mucci, “Performance Analysis Tools and PAPI” UTK ICL 29
Philip J. Mucci, “Performance Analysis Tools and PAPI” UTK ICL 30
Philip J. Mucci, “Performance Analysis Tools and PAPI” UTK ICL 31
Philip J. Mucci, “Performance Analysis Tools and PAPI” UTK ICL 32
Philip J. Mucci, “Performance Analysis Tools and PAPI” UTK ICL 33
Philip J. Mucci, “Performance Analysis Tools and PAPI” UTK ICL 34
Philip J. Mucci, “Performance Analysis Tools and PAPI” UTK ICL 35
Philip J. Mucci, “Performance Analysis Tools and PAPI” UTK ICL 36
Philip J. Mucci, “Performance Analysis Tools and PAPI” UTK ICL 37
Philip J. Mucci, “Performance Analysis Tools and PAPI” UTK ICL 38
Philip J. Mucci, “Performance Analysis Tools and PAPI” UTK ICL 39
Philip J. Mucci, “Performance Analysis Tools and PAPI” UTK ICL 40
Philip J. Mucci, “Performance Analysis Tools and PAPI” UTK ICL 41
By hand: Verifying the PAPI Version // When hand-instrumenting you need to check #include <papi.h> ... /* Verifying PAPI Version */ int v = PAPI_library_init(PAPI_VER_CURRENT); if(v != PAPI_VER_CURRENT) { fprintf(stderr,"Bad PAPI version\n"); exit(2); } 42
By Hand: Measuring PAPI Counters • Use "papi_avail -a" to identify counters • Link with -lpapi 43
Statistical profiling • profil() - Unix command to examine program to periodically examine program counter. Identify subroutines where code spends most time. • Used by Gprof • PAPI_profil() - Emulates profil(), but looks at a specific hardware counter. Identifies file/line where code spends most time. 45
Using psrun to find hot spots • gcc -g -o cmd cmd.c • psrun -C -c papi_profile_cycles.xml cmd • "-C" Instructs papi to use xml configurations that are in the install path rather than current directory. • "-c papi_profile_cycles.xml" Use the named config file rather than the default. • "papi_profile_cycles.xml" directs papi to collect file/line data. • psprocess cmd.*.xml • display results 46
Topics • Introduction • Measuring System Operation • Gprof • Perfsuite • PAPI • Tau & PAPI • Benchmarks b_eff • MPI Tracing with PMPI • Tau & MPI • Summary – Material for the Test 48
Philip J. Mucci, “Performance Analysis Tools and PAPI” UTK ICL 49
Philip J. Mucci, “Performance Analysis Tools and PAPI” UTK ICL 50