1 / 16

Special Course on Computer Architectures ~ GPU Programming Contest~

2012/06/22 Email: nomura@am.ics.keio.ac.jp. Special Course on Computer Architectures ~ GPU Programming Contest~. Contents. GPU (Graphic Processing Unit) CUDA Programming Target: Clustering with Kmeans How to use toolkit1.0 Towards the fastest program. GPU (Graphic Processing Unit).

hallie
Download Presentation

Special Course on Computer Architectures ~ GPU Programming Contest~

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 2012/06/22 Email: nomura@am.ics.keio.ac.jp Special Course on Computer Architectures~GPU Programming Contest~

  2. Contents • GPU (Graphic Processing Unit) • CUDA Programming • Target: Clustering with Kmeans • How to use toolkit1.0 • Towards the fastest program

  3. GPU (Graphic Processing Unit) GPU SM SM SM • Multicore processor • Several handredscores • SP: Core in GPU • SM: Composed of SPs • High memory bandwidth … Global Memory SM Table: Specification of GeForce280 SP SP SP SP SP SP SP SP SP: Streaming Processor SM: Streaming MultiProcessor

  4. Flow of CUDA Program Array Host • Allocate GPU memory • cudaMalloc() • Transfer input data • cudaMemcpy() • Execute kernel • Transfer result data • Free GPU memory • cudaFree() output 1 Main Memory CPU output 2 output N Device (GPU) Data Transfer Data Transfer SP SP SP Kernel Kernel Kernel … Array input 1 output 1 input 2 output 2 Global Memory input N output N

  5. Target application:clustering with Kmeans • A famous method for clustering • A program with kmeans method for a host processor is given. Modify it so that it works on GPU as fast as possible. • GeForce Tesla (GTX280) in Amano Lab. can be used for this contest.

  6. Kmeansmethod(1/5) Initial state: Nodes in a certain color is distributed randomly. (Here, 100nodes with 5 colorsare shown) STEP1: Centre of gravity is computed for each colored node set. (X in the figure is each centre) Reference URL: http://d.hatena.ne.jp/nitoyon/20090409/kmeans_visualise

  7. Kmeansmethod(2/5) STEP2 The color of each node is changed into that of the nearest centre. STEP1: Again, the centre of gravity is computer in node set with the same color.

  8. Kmeansmethod(3/5) STEP2: Again, the color of each node is changed into that of the nearest centre. STEP1: Again, the centre of gravity is computer in node set with the same color.

  9. Kmeansmethod(4/5) STEP2: Again, the color of each node is changed into that of the nearest centre. STEP1: Again, the centre of gravity is computer in node set with the same color.

  10. Kmeansmethod(5/5) STEP2: Again and again, the color of each node is changed into that of the nearest centre. Terminate Condition: The color of all nodes are the same as the color of the centre, thus, there is no need to change the color. →Terminate.

  11. How to start • ssh131.113.69.98 for login. • Your account has been available. If you have not received mail about account, please send mail to nomura@am.ics.keio.ac.jp . • Download kmeans.tar.gzand ungip. • There are useful sample codes in kmeans. • Mission1:Make GPU version based on CPU version. • Describe gpuKMeans in kmeans.cu cpuKMeansin main.cu is a CPU version for reference. • Mission2:Optimize the CPU code so that it runs as fast as possible.

  12. Toolkit1.0 • kmeans.cu • To describe K-means program for GPU • Please modify this file • main.cu • To read input data, describe CPU program • Modification forbidden • check.c • To visualize output data by OpenCV • gen.c • To generate input data • Makefile • data/ • Input data • result/ • Output data

  13. How to use Toolkit1.0 • $ make • Compile • $ make gpu • Execute GPU Program • $ make cpu • Execute CPU Program • $ ./gen SEED (SEED = 0,1,2,…) • Generate input data

  14. Sample Code • Vector addition program for GPU • $ make : Compile • $ ./main : Program run • Point • Memory allocation on GPU • cudaMalloc(), cudaFree() • Data transfer between CPU and GPU • cudaMemcpy() • Format of GPU kernel function

  15. Towards the fastest program • Minimumrequirement • Implementation K-means program on GPU • Parallelizing STEP1 or STEP2in K-means • How to optimize program • Parallelizing both of STEP1 and STEP2 • Shared memory, Constant memory • Coalesced Memory Accessetc • Web Site • NVIDIA GPU Computing Document: http://developer.nvidia.com/nvidia-gpu-computing-documentation • Fixstars CUDA Infromation Site: http://gpu.fixstars.com/index.php/

  16. Announcement: • If you have not an account mail tonomura@am.ics.keio.ac.jp • Your name should be included in the mail. • Deadline:7/22(Fri)24:00 • Copy follows in ~/comparch • Source code and simple report • Please check the web site. Additional information will be on it. • If you have any question about the contest, please send mail to:nomura@am.ics.keio.ac.jp

More Related