1 / 29

Memory Access Behavior Analysis with CPTE - Counter-based Profiling and Tracing Environment

Memory Access Behavior Analysis with CPTE - Counter-based Profiling and Tracing Environment. Michael Gerndt, Tianchao Li Technische Universität München {gerndt, lit}@in.tum.de. Performance Analysis for Parallel Systems. Development cycle Assumption: Reproducibility Monitoring

bazyli
Download Presentation

Memory Access Behavior Analysis with CPTE - Counter-based Profiling and Tracing Environment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Memory Access Behavior Analysis with CPTE- Counter-based Profiling and Tracing Environment Michael Gerndt, Tianchao Li Technische Universität München {gerndt, lit}@in.tum.de

  2. Performance Analysis for Parallel Systems • Development cycle • Assumption: Reproducibility • Monitoring • Software vs Hardware • Statistical profiles vs Event traces • Analysis • Source-based tools • Visualization tools • Automatic analysis tools Coding Performance Monitoringand Analysis Program Tuning Production

  3. Performance Measurement Techniques • Event model of the execution • Events occur at a processor at a specific point in time • Events belong to event types • clock cycles • cache misses • remote references • start of a send operation • ... • Profiling: Recording accumulated performance data for events • Tracing: Recording performance data of individual events

  4. Tr P n-1 Trace P1 Trace P0 Tracing ... Function Obelix (...) call monitor(“Obelix“, “enter“) ... call MPI_send(…) call monitor(“Obelix“,“exit“)end Obelix ... Process 0 Process 1 MPI Library Function MPI_send (...) call monitor(“MPI_send“, “enter“) ... call monitor(“MPI_send“,“exit“)end Obelix ... Process n-1

  5. 10 200 Instrumentation and Monitoring CPU ... Function Obelix (...) call monitor(“Obelix“, “enter“) ... call monitor(“Obelix“,“exit“)end Obelix ... cache miss counter monitor(routine, location) if (“enter“) then else end if Function Table Main Asterix + - Obelix 10 1490 1300

  6. CPTE • Counter-based Profiling and Tracing Environment • Selective Instrumentation • Selective Monitoring • Profiling, Tracing and Sampling

  7. Infratructure Overview

  8. Fortran 95 Selective Instrumentation • Based on NAG Fortran 95 frontend • the only Fortran 95 instrumenter based on a productive quality front-end? • Instruments Fortran 95 programs • Inserts calls to monitoring library • Allows to select regions to be instrumented • FileID-FirstLine# as region ID • Generates information about instrumented regions • Selective Instrumentation • Reduce the measurement perturbation of the program

  9. Instumentable Regions • Command line arguments allow to control instrumentation of sequential and OpenMP parallel regions • Main program, subprograms • Outermost and nested multidimensional loops • Vector statements, Where-, Forall-statements • Subroutine calls • I/O statements • OpenMP parallel regions • OpenMP synchronization • Handles arbitrary control flow • Generates Standard Intermediate Representation (SIR)

  10. Example of Nested Loops Instrumentation call enter_region(LOOP,17,42) do i=1,n //orig line 42 do j=1,n a(i,j)=... call enter_region(LOOP,17,45) do k=1,n //orig line 45 ... enddo call end_region(LOOP,17,45) enddo enddo call end_region(LOOP,17,42)

  11. Example of Instrumentation Transformations call start_region(LOOP,42,17) do i=1,n ... call start_region(LOOP,42,19) do j=1,m ... call end_region(LOOP,42,17,TRUE) GOTO 123 enddo call end_region(LOOP,42,19,FALSE) enddo call end_region(LOOP,42,17,FALSE) ... 123 continue

  12. Instrumentation of OpenMP program hello integer a(100) !$OMP parallel do do i=1,100 a(i)=i enddo end implicit barrier synchronization

  13. Instrumented OpenMP CALL start_region(PARALLEL,42,17) !$OMP PARALLEL CALL start_region(PARALLELBODY,42,17) CALL start_region(DO,42,17) !$ OMP DO DO i=1,n ... END DO !$OMP END DO NOWAIT CALL start_region(IMPLBARRIER,42,17) !$OMP BARRIER CALL end_region(IMPLBARRIER,42,17) CALL end_region(DO,42,17) CALL end_region(PARALLELBODY,42,17) !$OMP END PARALLEL CALL end_region(PARALLEL,42,17) explicit barrier synchronization

  14. B D A C E 6 5 4 3 2 1 Application Interface Request File Missed Request File Measurement Result Trace Buffer Measurement Stack Counter Layer Counter Library PCL Monitoring Library Application Region Table Wildcard Table

  15. Monitor Library Interface • Start_Region (RType, FileID, RFL) • End_Region (RType, FileID, RFL) • Marks the start/end of a region. • RType is the region type • FileID the unique number assigned to a file • RFL the first line number of the region in the file. • End_All_Regions (FileID, RFL) • Inserted before STOP statements. • Enter_Region (RType, FileID, RFL) • Leave_Region (RType, FileID, RFL) • Marks the start/end of a subprogram that is not instrumented • Ignore_Next_Entry_Point (RType, FileID, RFL) • inserted before ENTRY statements

  16. Request File • Global and region measurement requests • Both for all or some nodes • Global • Different types of monitoring overheads including file flush • Summary at program termination • Region-based measurement requests • Region specification • Region type, file, region first line number • Region type, file, * • Region type, *, * • Information • Execution_time, cache misses, specific instructions

  17. Request File • Measurement Request Language (MRL) //Trace file go to a subdirectory CONFIG TDF_PATH=data; //Request sampling CONFIG REGION_SAMPLING_RATE=100; //Request additional summary information CONFIG REGION_SUMMARY=ON; //Request L2 cache misses for all loops in all processes REQUEST (*) LOCAL (*,LOOP,*)=L2_CACHE_MISS; //Request overhead information REQUEST (*) GLOBAL GLOBAL_OVERHEAD;

  18. B D A C E 6 5 4 3 2 1 Application Interface Request File Missed Request File Measurement Result Trace Buffer Measurement Stack Counter Layer Counter Library PCL Monitoring Library Application Region Table Wildcard Table

  19. Performance Cockpit • A common GUI platform for the integration of several performance measurement tools • Based on Eclipse, CDT, (PTP) • www.eclipse.org • The starting point of a Universal GUI Platform for Performance Tools • programming language neutral • programming paradigm neutral • performance tool neutral

  20. CPTE Plug-in EPCM Plug-in Instrumentation Plug-ins Example Plug-ins Performance Platform Eclipse + CDT + Fortran Plug-in The General Extensible Infrastructure

  21. EPCM • Based on simulator for shared memory multi-processor • Following a novel hardware monitor design • Runtime instrumentation of application binary, on-the-fly simulation of cache access behavior and performance monitoring • Static mode • Address-range specific measurement per counter • Dynamic mode • Histogram for address range with all counters

  22. CPTE vs. EPCM

  23. Role of Performance Cockpit for CPTE

  24. Role of Performance Cockpit for EPCM

  25. Performance Cockpit

  26. Performance Cockpit

  27. Performance Cockpit

  28. Performance Cockpit http://wwwbode.in.tum.de/~lit/cockpit/

  29. Summary • CPTE – Counter-based Profiling and Tracing Environment • Selective instrumentation for OpenMP Fortran 95 source codes • Selective monitoring on specified code regions (through MRL) • Performance Cockpit - general extensible GUI for performance tools integration based on Eclipse

More Related