220 likes | 358 Views
A. Saà-Garriga , D. Castells-Rufas and J. Carrabina Albert.saa@uab.cat Centre d’Intel·ligència Ambiental I Accessibilitat de Catalunya (CAIAC) Universitat Autònoma de Barcelona. UAB 21/01/2014. OMP2HMPP: HMPP Source Code Generation from Programs with Pragma Extensions.
E N D
A. Saà-Garriga, D. Castells-Rufas and J. Carrabina Albert.saa@uab.cat Centre d’Intel·ligència Ambiental I Accessibilitat de Catalunya (CAIAC) Universitat Autònoma de Barcelona. UAB 21/01/2014 OMP2HMPP: HMPP Source Code Generation from Programs with Pragma Extensions
Intro Compiler Results Conclusions 1Introduction 2OMP2HMPP Compiler 3Results 4Conclusions
Intro Compiler Results Conclusions 1Introduction 2 OMP2HMPP Compiler 3 Results 4 Conclusions
Intro Compiler Results Conclusions GPGPUS and Embedded Systems • One of the main integrated blocks on heterogeneous platforms • Mali GPUs (embedded systems) • NVIDIA GPUs in first 10 machines of Green Top 500 (Nov, 2013) • GPGPUs are potentially useful for speed up applications • Both classical HPC and EHPC • Complex and error-prone due to the programming complexity and language paradigms
Intro Compiler Results Conclusions Actual Programming Workflow • GPGPUs programming could become a hurdle that can limit their adoption, since the programmer has to learn the hardware capabilities and the language to work with these. New Proposals Learning Source Code Adaptation Version Evaluation • New language • Language extensions • Language syntax • Programing paradigms
Intro Compiler Results Conclusions Programming Alternatives • Directive Based Languages • New Languages • OpenACC[2] • HMPP[3] • Language Extensions • OpenMPC[4] • hiCUDA[5] • Direct Transformations • Par4All[6] • Hide GPU Complexity • New Language • Hide GPU complexity • No automatic transfer optimization • New list of directives • Hide GPU complexity • No intermediate language • No data transfer optimization • Just C source code transformation
Intro Compiler Results Conclusions Proposed Programming Workflow • OMP2HMPP • Hide GPU complexity • Just one new directive • Uses HPC standard as input • C/C++ New Proposals Learning Source Code Adaptation Version Evaluation • New language • Language extensions • Language syntax • Programing paradigms OpenMP OMP2HMPP HMPP • Mercurium Infrastucture. • [J. Balart et al. EWOMP 2004]
IntroCompiler Results Conclusions 1Introduction 2OMP2HMPP Compiler 3 Results 4 Conclusions
IntroCompilerResults Conclusions Generate HMPP Directives • Callsite • Codelet • Group • Advanced Load • Delegate Store • Syncronize • …
IntroCompilerResults Conclusions Generate HMPP Directives • OpenMP block Outlining #pragma hmpp outlined_block codelet void outlined_block(int i, int A[10], int C[10]) { for(i=...) { ... C[i]=A[i]*k; ... } } int main() { ... A[x]=v; #pragma hmpp outlined_block callsite outlined_block(i,A,C); ... A[j]=C[j]; }
IntroCompilerResults Conclusions Contextual Information • For each of the variables used inside an OpenMP block to transform OMP2HMPP analyze the Abstract Syntax Tree to identify: • The next/last access (read/write) • Where is computed (CPU/GPU) this access • If an operation is made inside a loop and identify this one.
IntroCompilerResults Conclusions Contextual information • Data Transfer Optimitzation • Advanced Load • Delegate Store
IntroCompilerResults Conclusions Use of Contextual Information • Data Transfer Optimitzation (Loops)
IntroCompilerResults Conclusions Use of Contextual Information • Data Transfer Optimitzation (Loops)
Intro Compiler Results Conclusions 1Introduction 2 OMP2HMPP Compiler 3Results 4 Conclusions
Intro Compiler Results Conclusions Source Code Example
Intro Compiler Results Conclusions Experimental Results • Tested Architectures
Intro Compiler Results Conclusions Experimental Results • B505(1)
Intro Compiler Results Conclusions Experimental Results • B505(2)
Intro Compiler Results Conclusions Experimental Results • B515
Intro Compiler Results Conclusions 1Introduction 2 OMP2HMPP Compiler 3 Results 4 Conclusions
Intro Compiler Results Conclusions Conclusions • The programmer avoid to expend time in learning. • Tested set of problems from Polybench[8] obtains an average speedup of 113x compared to sequential. • An average speedup over 31x compared to OpenMP. • OMP2HMPP gives a solution that rarely differ from the best HMPP hand-coded version. • OMP2HMPP establish a GPU parallel code reference point for expert developers that wants to refine the parallelization. • …thanks for your attention!