160 likes | 299 Views
High Level OpenCL Implementation. By: Matthew Royle Supervisor: Prof. Shaun Bangay. Introduction. Multi-core CPUs Sequential algorithms to parallel algorithms GPUs used for more than just graphics Use of GPGPUs (General-Purpose Graphics Processing Unit). Introduction Cont….
E N D
High Level OpenCL Implementation By: Matthew Royle Supervisor: Prof. Shaun Bangay
Introduction • Multi-core CPUs • Sequential algorithms to parallel algorithms • GPUs used for more than just graphics • Use of GPGPUs (General-Purpose Graphics Processing Unit)
Introduction Cont… • Parallel programming languages for specific architectures, namely NVIDIA’s CUDA • Lack of a multi-platform open language • The OpenCL (Open Computing Language) standard • Heterogenous Parallel Programming
Problem Statement • Parallel nature of GPUs • No Implementation • Implement OpenCL using existing technologies • High level translator • Use Parallel Frameworks
Rationale and Motivation • GPU most likely form of implementation • NVIDIA and AMD plan to include OpenCL • Future Apple iPhones • Lack of implementation on CPU architecture
Project Aims • Select a parallel processing framework • Create a high level translator • Create valid tests • Run created tests
Proposed Implementation Method - OpenCL _kernel int add_vect (); //create computation unit cl_cmd_queue cmd_queue = CreateCommandQueue(); //create computation queue clEnqueueTask(kernel,i); //enqueue task and execute
Proposed Implementation Method – C cl_cmd_queue CreateCommandQueue() { return cmd_queue[]; } void clEnqueueTask(kernel,i) { cmd_queue[i] = kernel; } #pragma omp parallel for{ for(int k = 0; k < cmd_queue.length; k++) Execute(cmd_queue[k]); }
Possible Tests • John Conway’s Game Of Life • Fractal Flame algorithm
Tools • OpenMP (Open Multi-Processing) framework • Parallel Processing Framework • Available with the GNU Compiler Collection • Free! • OpenCL header files
OpenCL Header File Sample /* scalar types */ typedef int8_t cl_char; typedef uint8_t cl_uchar; typedef int16_t cl_short __attribute__((aligned(2))); typedef uint16_t cl_ushort __attribute__((aligned(2))); typedef int32_t cl_int __attribute__((aligned(4))); typedef uint32_t cl_uint __attribute__((aligned(4))); typedef int64_t cl_long __attribute__((aligned(8))); typedef uint64_t cl_ulong __attribute__((aligned(8))); typedef uint16_t cl_half __attribute__((aligned(2))); typedef float cl_float __attribute__((aligned(4))); typedef double cl_double __attribute__((aligned(8)));
OpenMP example code //hello.c #include <omp.h> #include <stdio.h> int main() { #pragma omp parallel num_threads(10) printf("Hello from thread %d, nthreads %d\n", omp_get_thread_num(), omp_get_num_threads()); }
Possible Extensions • Improve performance • Evaluation of OpenCL on various Architectures • Heterogenous execution
Key Points • Lack of multi-platform open language • OpenCL standard • Most implementations for GPU • Implementation for CPU • High Level Translator • Use OpenMP framework