340 likes | 354 Views
Explore the power of CUDA through workshops, NVVP tools, existing libraries, and Q/A sessions. Learn about multi-GPU usage and optimization techniques. Discover available libraries for linear algebra, image processing, and advanced CUDA development. Dive deep into programming massively parallel processors with hands-on guidance from experts. Get certificates and unleash the potential of NVIDIA tools and resources for enhanced GPU performance. Engage in collaborative GPU computing at Eclipse Nsight and NVVP IDE for deep GPU profiling.
E N D
CUDA Workshop, Week 4 NVVP, Existing Libraries, Q/A
Agenda Text book / resources Eclipse Nsight, NVIDIA Visual Profiler Available libraries Questions Certificate dispersal (Optional) Multiple GPUs: Where’s Pixel-Waldo?
Text Book / Resources Text book • Programming Massively Parallel Processors, A Hands on approach • David Kirk, Wen-meiHwu
Text Book / Resources Nvidia developer zone • Early access to updated drivers / updates • Heavily curated help forum • Requires registration and approval (nearly automated) • developer.nvidia.com
Text Book / Resources US! • We’re pretty passionate about this GPU computing stuff. • Collaboration is cool • If you think you’ve got a problem that can benefit from GPU computation we may have some ideas.
Eclipse Nsight, NVVP IDE with an Eclipse foundation CUDA aware syntax highlighting / suggestions / recognition Hooked into NVVP
Eclipse Nsight, NVVP Deep profiling of every aspect of GPU execution ( memory bandwidth, branch divergence, bank conflicts, compute / transfer overlap, and more! ) Provides suggestions for optimization Graphical view of GPU performance
Eclipse Nsight, NVVP Nsight and NVVP are available on our cuda# machines Ssh–X <user>@<cuda machine> Nsight demo on Week 3 code
Available Libraries • Why re-invent the wheel? • There are many GPU enabled tools built on CUDA that are already available • These tools have been extensively tested for efficiency and in most cases will outperform custom solutions • Some require CUDA-like code structure
Available Libraries Linear Algebra, cuBLAS • CUDA enabled basic linear algebra subroutines • GPU-accelerated version of the complete standard BLAS library • Provided with the CUDA toolkit. Code examples are also provided • Callable from C and Fortran
Available Libraries Linear Algebra, cuBLAS
Available Libraries Linear Algebra, cuBLAS
Available Libraries Linear Algebra, CULA, MAGMA • CULA and MAGMA extend BLAS • CULA (Paid) • CULA-dense: LAPACK and BLAS implementations, solvers, decompositions, basic matrix operations • CULA-sparse: sparse matrix specialized routines, specialized storage structures, iterative methods • MAGMA (Free, BSD) (Fortran Bindings) • LAPACK and BLAS implementations, developed by the same dev. team as LAPACK.
Available Libraries Linear Algebra, CULA, MAGMA
Available Libraries Linear Algebra, CULA, MAGMA
Available Libraries IMSL Fortran/C Numerical Library • Large collection of mathematical and statistical gpu-accelerated functions • Free evaluation, paid extension • http://www.roguewave.com/products/imsl-numerical-libraries/fortran-library.aspx
Available Libraries Image/Signal Processing: NVIDIA Performance Primitives • 1900 Image processing and 600 signal processing algorithms • Free and provided with the CUDA toolkit, code examples included. • Can be used in tandem with visualization libraries like OpenGL, DirectX.
Available Libraries Image/Signal Processing: NVIDIA Performance Primitives
Available Libraries CUDA without the CUDA: Thrust Library • Thrust is a high level interface to GPU computing. • Offers template-interface access to sort, scan, reduce, etc. • A production tested version is provided with the CUDA toolkit.
Available Libraries CUDA without the CUDA: Thrust Library
Available Libraries CUDA without the CUDA: Thrust Library
Available Libraries CUDA without the CUDA: Thrust Library
Available Libraries Python and CUDA • PyCUDA • Python interface to CUDA functions. • Simply a collection of wrappers, but effective. • NumbaPro (Paid) • Announced this year at GTC 2013, native CUDA python compiler • Python = 4th major cuda language
Available Libraries R and CUDA • R+GPU • Package with accelerated alternatives for common R statistical functions • Rpud / rpudplus • Package with accelerated alternatives for common R statistical functions • Rcuda • … Package with accelerated alternatives for common R statistical functions
Available Libraries R and CUDA
Multiple GPUs • Where’s Pixel-Waldo? Motivation: Given two images which contain a unique suspect and a number of distinct bystanders, identify the suspect by pairwise comparison.
Multiple GPUs • This is hard We’ll simplify the problem by reducing the targets to pixel triples.
Multiple GPUs f.bmp s.bmp GPU0 GPU1 0 | 0 | 0 | … 0 | 0 | 0 | … 0: upload an image and a list to store targets to each GPU.
Multiple GPUs f.bmp s.bmp GPU0 GPU1 11 | 143 | 243 | … 3 | 1632 | 54321 | … 1: Find all positions of potential targets (triples) within each image using both GPUS independently.
Multiple GPUs f.bmp s.bmp GPU0 11 | 143 | 243 | … GPU1 3 | 1632 | 54321 | … 0 | 0 PCI Bus 2: Allow GPU0 to access GPU1 memory, use both images and target lists to compare potential suspects.
Multiple GPUs f.bmp CPU GPU0 11 | 143 | 243 | … 132 | 629 PCI Bus 3: Print the positions of the single matching suspect.
Multiple GPUs • Walk though the source code. • Things to note: • This is un-optimized and known to be inefficient, but the concepts of asynchronous streams, GPU context switching, universal addressing, and peer-to-peer access are covered • Source code requires the tclap library to compile appropriately. • Source code will be made available in a github repository after the workshop.