200 likes | 482 Views
Peter Holvenstot. OpenCL. OpenCL. Designed as an API and language specification Standards maintained by the Khronos group Currently 1.0, 1.1, and 1.2 Manufacturers release their own SDK and drivers Major backers: Apple, AMD/ATI, Intel. OpenCL. Alternative to CUDA Not limited to ATI GPUs
E N D
Peter Holvenstot OpenCL
OpenCL • Designed as an API and language specification • Standards maintained by the Khronos group • Currently 1.0, 1.1, and 1.2 • Manufacturers release their own SDK and drivers • Major backers: Apple, AMD/ATI, Intel
OpenCL • Alternative to CUDA • Not limited to ATI GPUs • Designed for “heterogenous computing” • Executable on many devices, including CPUs, GPUs, DSPs, and FPGAs
OpenCL • Similar structure of host programs and kernels • Set of compute devices is called a 'context' • Kernels executed by 'processing elements' • Kernels can be compiled at run-time or build-time
OpenCL • Task Parallelism – many kernels running at once • OpenCL 1.2 – device can be partitioned down to single Compute Unit • Built-in kernels for device-specific functionality
Advantages • Same code can be run on different devices • Can also be run on NVIDIA GPUs! • AMD/ATI attempting to integrate compute elements into other platforms (Accelerated Processing Units) • Limited library of portable math routines • Most common BLAST and FFT routines
Disadvantages • No “official” implementation • Vendors may meet specs or add restrictions • Apple adds restrictions on group size • Devices need appropriate settings to perform well • Different capabilities → different performance • Solution: Tuning/load balancing framework
Restrictions • No recursion, variadics, or function pointer • Cannot dynamically allocate memory from device • No native variable-length arrays, double-precision • Some can be worked around by extensions
OpenCL: Stream Core Compute Unit Wavefront Intermediate Language Terminology CUDA: • Scalar Core • Streaming Multiprocssr • Warp • PTX
OpenCL: Host Memory Global Memory Global Memory Constant Memory Local Memory Private Memory Terminology CUDA: • Host Memory • Global/Device Memory • Local Memory • Constant Memory • Shared Memory • Registers
OpenCL: NDRange Work group Work item Global ID Block ID Local ID Terminology CUDA: • Grid • Block • Thread • Thread ID • Block Index • Thread Index
References • http://blog.accelereyes.com/blog/wp-content/uploads/2012/02/CUDAvsOpenCL.pdf • https://wiki.aalto.fi/download/attachments/40025977/Cuda+and+OpenCL+API+comparison_presented.pdf • http://www.hpcwire.com/hpcwire/2012-02-28/opencl_gains_ground_on_cuda.html • http://www.netlib.org/utk/people/JackDongarra/PAPERS/parcocudaopencl.pdf • http://www.netlib.org/lapack/lawnspdf/lawn228.pdf