270 likes | 467 Views
OpenMP Performance Visualization with Paraver. Jesús Labarta, Jordi Caubet, Judit Gimenez Sergi Girona, Francesc Escale CEPBA-UPC. PARAVER. (1992- ) Flexible performance visualization tool Functions of time Precedence relationships Quantitative, comparative Powerful / not trivial
E N D
OpenMP Performance Visualization with Paraver Jesús Labarta, Jordi Caubet, Judit Gimenez Sergi Girona, Francesc Escale CEPBA-UPC
PARAVER (1992- ) • Flexible performance visualization tool • Functions of time • Precedence relationships • Quantitative, comparative • Powerful / not trivial • You drive the analysis • MPI + OpenMP, System activity, performance counters,… • Distributed by CEPBA
Process model Multithreaded + message passing + multiprogramming • Objects: • Thread • Task • Ptask (application)
Tracefile • Instrumented codes • MPI + OpenMP • Java • Pthreads, shmem • Monitoring tools • System activity (SCPUs) • InfoPerfex • Simulators • Dimemas • Simplescalar • Filters • par2Paraver • UTE2Paraver • Records • State (Object, time_start, time_end, state) • Events: • Flag(Object, time, type, value) • Precedence(Object_src, Object_dst, time_src, time_dst, tag, size)
Structure Filter Semantics Representation Visualization Analysis Textual Tracefile Reduced Tracefile Function of time (semantic value) Events Demand Driven evaluation
Filter module • Events • by type • by value • Communications • by tag • by size • by source / destination • logical / physical
Semantic module fcomp1 fPtask ftask ftask ftask fthread fthread fthread fthread fthread fthread • Semantic value: f(t) • f = fcomp2 fcomp1 fPtask ftask fthread • Semantic functions • fcomp2, fcomp1: sign, mod, div, in range • fPtask : add, average, max, select • ftask : add, average, max, select • fthread: in state, useful, given state, last event value, next event value, average next event value
Visualization • Type of window • Ptask / Task / thread: one row per object of selected type • Object selection (scalability) • Representation • Color encoded / Gradient / Function of time • Multiple windows • Synchronised • Forward/backward animation • Precise time measurement • Within/between windows
Textual • Textual detail of area around point within window • Semantic value and duration / flag / communication • Numeric / translated text (.pcf file)
Analysis • Time and object range selected pointing on window • Analysis function applied to output of semantic module • Average semantic value • Average duration/variance/number of bursts (if within range) • Number of events • Number of communications • ...
OpenMP instrumentation • Compiler instrumentation • NANOS compiler • Dynamic Interception • SGI native OpenMP (MP library) • Tracing of thread status • running • idle (busy wait) • scheduling • blocked
OpenMP analysis • Application structure • Stamping code
OpenMP analysis • Loop scheduling • Antena design
OpenMP analysis How do bees see flowers?
OpenMP analysis What bees don’t see Function A B C D Av. L2 misses/ms 62 52 163 14 FLOPS/ms 41K 21K 8K 1K Loads/ms 57K 52K 18K 100K
More on hardware counters Less misses, more time
More on hardware counters More memory accesses per second Less coherence state changes
MPI + OpenMP NAS FT Quantitative data: %MPI collective comm: 18% %OMP: fork/join 5% %non parallelized: 32% Avg. || Loop: 50ms # || loops: 38 # || loops < 5ms 6
Other uses Average : 33 MFLOPS Peak: 60 MFLOPS • System activity • InfoPerfex • Pthreads
Paraver on IBM • DPCL + PAPI : • Sequential programs • OpenMP • UTE • MPI • MPI+OpenMP
UTE Paraver • Filter • Thread states • Executing application code • Executing MPI Reveive • Executing MPI Send • Descheduled • Statistics
UTE Analysis • Communication pattern • Exchanges 1 2 ; 3 4 • Load balance • More load on thread 1 • MPI implementation • Busy wait on receives • Scheduling • Thread 2 and 3 time sharing one CPU • Thread 4 time sharing one CPU with other processes • OS quantum: 10 ms.
More information http://www.cepba.upc.es/paraver cepbatools@cepba.upc.es