340 likes | 487 Views
CUDA Workshop, Week 4. NVVP, Existing Libraries, Q/A. Agenda. Text book / resources Eclipse Nsight , NVIDIA Visual Profiler Available libraries Questions Certificate dispersal (Optional) Multiple GPUs: Where’s Pixel-Waldo?. Text Book / Resources. Text book.
E N D
CUDA Workshop, Week 4 NVVP, Existing Libraries, Q/A
Agenda Text book / resources Eclipse Nsight, NVIDIA Visual Profiler Available libraries Questions Certificate dispersal (Optional) Multiple GPUs: Where’s Pixel-Waldo?
Text Book / Resources Text book • Programming Massively Parallel Processors, A Hands on approach • David Kirk, Wen-meiHwu
Text Book / Resources Nvidia developer zone • Early access to updated drivers / updates • Heavily curated help forum • Requires registration and approval (nearly automated) • developer.nvidia.com
Text Book / Resources US! • We’re pretty passionate about this GPU computing stuff. • Collaboration is cool • If you think you’ve got a problem that can benefit from GPU computation we may have some ideas.
Eclipse Nsight, NVVP IDE with an Eclipse foundation CUDA aware syntax highlighting / suggestions / recognition Hooked into NVVP
Eclipse Nsight, NVVP Deep profiling of every aspect of GPU execution ( memory bandwidth, branch divergence, bank conflicts, compute / transfer overlap, and more! ) Provides suggestions for optimization Graphical view of GPU performance
Eclipse Nsight, NVVP Nsight and NVVP are available on our cuda# machines Ssh–X <user>@<cuda machine> Nsight demo on Week 3 code
Available Libraries • Why re-invent the wheel? • There are many GPU enabled tools built on CUDA that are already available • These tools have been extensively tested for efficiency and in most cases will outperform custom solutions • Some require CUDA-like code structure
Available Libraries Linear Algebra, cuBLAS • CUDA enabled basic linear algebra subroutines • GPU-accelerated version of the complete standard BLAS library • Provided with the CUDA toolkit. Code examples are also provided • Callable from C and Fortran
Available Libraries Linear Algebra, cuBLAS
Available Libraries Linear Algebra, cuBLAS
Available Libraries Linear Algebra, CULA, MAGMA • CULA and MAGMA extend BLAS • CULA (Paid) • CULA-dense: LAPACK and BLAS implementations, solvers, decompositions, basic matrix operations • CULA-sparse: sparse matrix specialized routines, specialized storage structures, iterative methods • MAGMA (Free, BSD) (Fortran Bindings) • LAPACK and BLAS implementations, developed by the same dev. team as LAPACK.
Available Libraries Linear Algebra, CULA, MAGMA
Available Libraries Linear Algebra, CULA, MAGMA
Available Libraries IMSL Fortran/C Numerical Library • Large collection of mathematical and statistical gpu-accelerated functions • Free evaluation, paid extension • http://www.roguewave.com/products/imsl-numerical-libraries/fortran-library.aspx
Available Libraries Image/Signal Processing: NVIDIA Performance Primitives • 1900 Image processing and 600 signal processing algorithms • Free and provided with the CUDA toolkit, code examples included. • Can be used in tandem with visualization libraries like OpenGL, DirectX.
Available Libraries Image/Signal Processing: NVIDIA Performance Primitives
Available Libraries CUDA without the CUDA: Thrust Library • Thrust is a high level interface to GPU computing. • Offers template-interface access to sort, scan, reduce, etc. • A production tested version is provided with the CUDA toolkit.
Available Libraries CUDA without the CUDA: Thrust Library
Available Libraries CUDA without the CUDA: Thrust Library
Available Libraries CUDA without the CUDA: Thrust Library
Available Libraries Python and CUDA • PyCUDA • Python interface to CUDA functions. • Simply a collection of wrappers, but effective. • NumbaPro (Paid) • Announced this year at GTC 2013, native CUDA python compiler • Python = 4th major cuda language
Available Libraries R and CUDA • R+GPU • Package with accelerated alternatives for common R statistical functions • Rpud / rpudplus • Package with accelerated alternatives for common R statistical functions • Rcuda • … Package with accelerated alternatives for common R statistical functions
Available Libraries R and CUDA
Multiple GPUs • Where’s Pixel-Waldo? Motivation: Given two images which contain a unique suspect and a number of distinct bystanders, identify the suspect by pairwise comparison.
Multiple GPUs • This is hard We’ll simplify the problem by reducing the targets to pixel triples.
Multiple GPUs f.bmp s.bmp GPU0 GPU1 0 | 0 | 0 | … 0 | 0 | 0 | … 0: upload an image and a list to store targets to each GPU.
Multiple GPUs f.bmp s.bmp GPU0 GPU1 11 | 143 | 243 | … 3 | 1632 | 54321 | … 1: Find all positions of potential targets (triples) within each image using both GPUS independently.
Multiple GPUs f.bmp s.bmp GPU0 11 | 143 | 243 | … GPU1 3 | 1632 | 54321 | … 0 | 0 PCI Bus 2: Allow GPU0 to access GPU1 memory, use both images and target lists to compare potential suspects.
Multiple GPUs f.bmp CPU GPU0 11 | 143 | 243 | … 132 | 629 PCI Bus 3: Print the positions of the single matching suspect.
Multiple GPUs • Walk though the source code. • Things to note: • This is un-optimized and known to be inefficient, but the concepts of asynchronous streams, GPU context switching, universal addressing, and peer-to-peer access are covered • Source code requires the tclap library to compile appropriately. • Source code will be made available in a github repository after the workshop.