1 / 9

GPU Programming with CUDA – CUDA 5 and 6 Paul Richmond

GPU Programming with CUDA – CUDA 5 and 6 Paul Richmond. GPUComputing@Sheffield http://gpucomputing.sites.sheffield.ac.uk/. Overview. Dynamic Parallelism (CUDA 5+) GPU Object Linking (CUDA 5+) Unified Memory (CUDA 6+) Other Developer Tools. Dynamic Parallelism. GPU. Kernel B. CPU.

Download Presentation

GPU Programming with CUDA – CUDA 5 and 6 Paul Richmond

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GPU Programming with CUDA – CUDA 5 and 6Paul Richmond GPUComputing@Sheffield http://gpucomputing.sites.sheffield.ac.uk/

  2. Overview Dynamic Parallelism (CUDA 5+) GPU Object Linking (CUDA 5+) Unified Memory (CUDA 6+) Other Developer Tools

  3. Dynamic Parallelism GPU Kernel B CPU Kernel A Kernel D Kernel C • Before CUDA 5 threads had to be launched from the host • Limited ability to perform recursive functions • Dynamic Parallelism allows threads to be launched from the device • Improved load balancing • Deep Recursion

  4. An Example //Host Code ... A<<<...>>>(data); B<<<...>>>(data); C<<<...>>>(data); //Kernel Code __global__ void vectorAdd(float *data) { do_stuff(data); X<<<...>>>(data); X<<<...>>>(data); X<<<...>>>(data); do_more stuff(data); }

  5. GPU Object Linking a.cu _________ _____ ______ b.cu _________ _____ ______ c.cu _________ _____ ______ Main .cpp ___________ _______ _________ + Program.exe a.o b.o c.o • CUDA 4 required a single source file for a single kernel • No linking of compiled device code • CUDA 5.0+ Allows different object files to be linked • Kernels and host code can be built independently

  6. GPU Object Linking Main .cpp ___________ _______ _________ Main2 .cpp ___________ _______ _________ + + foo.cu bar.cu a.cu _________ _____ ______ b.cu _________ _____ ______ + + ... ab.culib ab.culib + a.o b.o Program.exe Program2.exe • Objects can also be built into static libraries • Shared by different sources • Much better code reuse • Reduces compilation time • Closed source device libraries

  7. Unified Memory Unified Memory System Memory GPU Memory CPU GPU CPU GPU • Developer view is that GPU and CPU have separate memory • Memory must be explicitly copied • Deep copies required for complex data structures • Unified Memory changes that view • Single pointer to data accessible anywhere • Simpler code porting

  8. Unified Memory Example void sortfile(FILE *fp, int N) { char *data; data = (char *)malloc(N); fread(data, 1, N, fp); qsort(data, N, 1, compare); use_data(data); free(data); } void sortfile(FILE *fp, int N) { char *data; cudaMallocManaged(&data, N); fread(data, 1, N, fp); qsort(data, N, 1, compare); cudaDeviceSynchronize(); use_data(data); free(data); }

  9. Other Developer Tools • XT and Drop-in libraries • cuFFT and cuBLAS optimised for multi GPU (on the same node) • GPUDirect • Direct Transfer between GPUs (cut out the host) • To support direct transfer via Infiniband (over a network) • Developer Tools • Remote Development using Nsight Eclipse • Enhanced Visual Profiler

More Related