160 likes | 291 Views
OpenMP in a H eterogeneous W orld. Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston. Top 10 Supercomputers (June 2011). Why OpenMP. Shared memory parallel programming model Extends C, C++. Fortran Directives-based
E N D
OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston
Why OpenMP • Shared memory parallel programming model • Extends C, C++. Fortran • Directives-based • Single code for sequential and parallel version • Incremental parallelism • Little code modification • High-level • Leave multithreading details to compiler and runtime • Widely supported by major compilers • Open64, Intel, GNU, IBM, Microsoft, … • Portable www.openmp.org
OpenMP Example Fork #pragmaomp parallel { inti; #pragmaomp for for(i=0;i<100;i++){ //do stuff } //do more stuff } 25-49 50-74 75-99 0-24 Implicit barrier More stuff More stuff More stuff More stuff Join
Memory Memory Memory Memory Present/Future Architectures & Challenges they pose Node 0 Node 0 Node 1 Node 1 Node 2 Node 2 Node 3 Node 3 Memory Memory Memory Memory Memory … accelerator Heterogeneity Location Many more CPUS Scalability
Heterogeneous High-Performance Systems Each node has multiple CPU cores, and some of the nodes are equipped with additional computational accelerators, such as GPUs. www.olcf.ornl.gov/wp-content/uploads/.../Exascale-ASCR-Analysis.pdf
Programming Heterogeneous Multicore:Issues Always hardware-specific! • Must map data/computations to specific devices • Usually involves substantial rewrite of code • Verbose code • Move data to/fromdevice x • Launch kernel on device • Wait until y is ready/done • Portability becomes an issue • Multiple versions of same code • Hard to maintain
Programming Models? Today’s Scenario // Run one OpenMP thread per device per MPI node #pragma omp parallel num_threads(devCount) if (initDevice()) { // Block and grid dimensions dim3 dimBlock(12,12); kernel<<<1,dimBlock>>>(); cudaThreadExit(); } else { printf("Device error on %s\n",processor_name); } MPI_Finalize(); return 0; } www.cse.buffalo.edu/faculty/miller/Courses/CSE710/heavner.pdf
OpenMP in the Heterogeneous World • All threads are equal • No vocabulary for heterogeneity, separate device • All threads must have access to the memory • Distributed memories common in embedded systems • Memories may not be coherent • Implementations rely on OS and threading libraries • Memory allocation, synchronization e.g. Linux, Pthreads
Extending OpenMP Example HWA Main Memory Uploadremote data Application data Application data #pragma ompparallel for target(dsp) for(j=0;i<m;i++) for (i=0;i<n,i++) c(i,j)=a(i,j)+b(i,j) Downloadremote data General Purpose Processor Cores RemoteProcedure call Device cores
Heterogeneous OpenMP Solution Stack 12 OpenMP Parallel Computing Solution Stack OpenMP Application User layer • Language extensions • Efficient code generation Directives, Compiler OpenMP library Environment variables Prog. layer OpenMP API Runtime library • Target Portable Runtime Interface System layer OS/system support for shared memory MCAPI, MRAPI, MTAPI … Core 1 Core 2 Core n
Summarizing My Research • OpenMP on heterogeneous architectures • Expressing heterogeneity • Generating efficient code for GPUs/DSPs • Managing memories • Distributed • Explicitly managed • Enabling portable implementations
MCA: Generic Multicore Programming (www.multicore-association.org) • Solve portability issue in embedded multicore programming • Defining and promoting open specifications for • Communication - MCAPI • Resource Management - MRAPI • Task Management - MTAPI