70 likes | 90 Views
Learn how to implement shared memory programming in CUDA to calculate the sum from 1 to 1024 efficiently using multi-blocks and multi-threads. Follow the provided code example to achieve speedup.
E N D
CUDA-Programming training-1 Che-Lun Hung
GPUMemory On-chip memory On-board memory
Reference • https://devblogs.nvidia.com/using-shared-memory-cuda-cc/
Training • Please write a CUDA program to calculate the sum from 1 to 1024 using share memory programming. • Please use multi-blocks and multi-threads to finish it.
Please send the code of this training and the word file with the speedup ratio to my email. (before 5/24, 23:59). • Please annotate your name and your student ID in email. • The file name is “student ID”. If more than one file, the names are “student ID-1”, “student ID-2”, etc. • Please compress the files to a .zip or .rar file. The file name is “student ID-train7”. • For example, a student has ID “A0000001” and two files. The file names are A0000001-1.cuand A0000001-2.cu and the compressed file name is A0000001-train7.zip. • The file will be rejected without following the rules.