1 / 17

OpenCL

Peter Holvenstot. OpenCL. OpenCL. Designed as an API and language specification Standards maintained by the Khronos group Currently 1.0, 1.1, and 1.2 Manufacturers release their own SDK and drivers Major backers: Apple, AMD/ATI, Intel. OpenCL. Alternative to CUDA Not limited to ATI GPUs

stasia
Download Presentation

OpenCL

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Peter Holvenstot OpenCL

  2. OpenCL • Designed as an API and language specification • Standards maintained by the Khronos group • Currently 1.0, 1.1, and 1.2 • Manufacturers release their own SDK and drivers • Major backers: Apple, AMD/ATI, Intel

  3. OpenCL • Alternative to CUDA • Not limited to ATI GPUs • Designed for “heterogenous computing” • Executable on many devices, including CPUs, GPUs, DSPs, and FPGAs

  4. OpenCL • Similar structure of host programs and kernels • Set of compute devices is called a 'context' • Kernels executed by 'processing elements' • Kernels can be compiled at run-time or build-time

  5. OpenCL • Task Parallelism – many kernels running at once • OpenCL 1.2 – device can be partitioned down to single Compute Unit • Built-in kernels for device-specific functionality

  6. Advantages • Same code can be run on different devices • Can also be run on NVIDIA GPUs! • AMD/ATI attempting to integrate compute elements into other platforms (Accelerated Processing Units) • Limited library of portable math routines • Most common BLAST and FFT routines

  7. Performance

  8. Performance

  9. Performance

  10. Disadvantages • No “official” implementation • Vendors may meet specs or add restrictions • Apple adds restrictions on group size • Devices need appropriate settings to perform well • Different capabilities → different performance • Solution: Tuning/load balancing framework

  11. Non-Optimized Performance

  12. Non-Optimized Performance

  13. Restrictions • No recursion, variadics, or function pointer • Cannot dynamically allocate memory from device • No native variable-length arrays, double-precision • Some can be worked around by extensions

  14. OpenCL: Stream Core Compute Unit Wavefront Intermediate Language Terminology CUDA: • Scalar Core • Streaming Multiprocssr • Warp • PTX

  15. OpenCL: Host Memory Global Memory Global Memory Constant Memory Local Memory Private Memory Terminology CUDA: • Host Memory • Global/Device Memory • Local Memory • Constant Memory • Shared Memory • Registers

  16. OpenCL: NDRange Work group Work item Global ID Block ID Local ID Terminology CUDA: • Grid • Block • Thread • Thread ID • Block Index • Thread Index

  17. References • http://blog.accelereyes.com/blog/wp-content/uploads/2012/02/CUDAvsOpenCL.pdf • https://wiki.aalto.fi/download/attachments/40025977/Cuda+and+OpenCL+API+comparison_presented.pdf • http://www.hpcwire.com/hpcwire/2012-02-28/opencl_gains_ground_on_cuda.html • http://www.netlib.org/utk/people/JackDongarra/PAPERS/parcocudaopencl.pdf • http://www.netlib.org/lapack/lawnspdf/lawn228.pdf

More Related