Special Course on Computer Architectures ~ GPU Programming Contest~

2012/06/22 Email: nomura@am.ics.keio.ac.jp Special Course on Computer Architectures~GPU Programming Contest~

Contents • GPU (Graphic Processing Unit) • CUDA Programming • Target: Clustering with Kmeans • How to use toolkit1.0 • Towards the fastest program

GPU (Graphic Processing Unit) GPU SM SM SM • Multicore processor • Several handredscores • SP: Core in GPU • SM: Composed of SPs • High memory bandwidth … Global Memory SM Table: Specification of GeForce280 SP SP SP SP SP SP SP SP SP: Streaming Processor SM: Streaming MultiProcessor

Flow of CUDA Program Array Host • Allocate GPU memory • cudaMalloc() • Transfer input data • cudaMemcpy() • Execute kernel • Transfer result data • Free GPU memory • cudaFree() output 1 Main Memory CPU output 2 output N Device (GPU) Data Transfer Data Transfer SP SP SP Kernel Kernel Kernel … Array input 1 output 1 input 2 output 2 Global Memory input N output N

Target application：clustering with Kmeans • A famous method for clustering • A program with kmeans method for a host processor is given. Modify it so that it works on GPU as fast as possible. • GeForce Tesla (GTX280) in Amano Lab. can be used for this contest.

Kmeansmethod(1/5) Initial state： Nodes in a certain color is distributed randomly. (Here, 100nodes with 5 colorsare shown) STEP1: Centre of gravity is computed for each colored node set. (X in the figure is each centre) Reference URL: http://d.hatena.ne.jp/nitoyon/20090409/kmeans_visualise

Kmeansmethod(2/5) STEP2 The color of each node is changed into that of the nearest centre. STEP1: Again, the centre of gravity is computer in node set with the same color.

Kmeansmethod(3/5) STEP2: Again, the color of each node is changed into that of the nearest centre. STEP1: Again, the centre of gravity is computer in node set with the same color.

Kmeansmethod(4/5) STEP2: Again, the color of each node is changed into that of the nearest centre. STEP1: Again, the centre of gravity is computer in node set with the same color.

Kmeansmethod(5/5) STEP2: Again and again, the color of each node is changed into that of the nearest centre. Terminate Condition： The color of all nodes are the same as the color of the centre, thus, there is no need to change the color. →Terminate.

How to start • ssh131.113.69.98 for login. • Your account has been available. If you have not received mail about account, please send mail to nomura@am.ics.keio.ac.jp . • Download kmeans.tar.gzand ungip. • There are useful sample codes in kmeans. • Mission１：Make GPU version based on CPU version. • Describe gpuKMeans in kmeans.cu cpuKMeansin main.cu is a CPU version for reference. • Mission２：Optimize the CPU code so that it runs as fast as possible.

Toolkit1.0 • kmeans.cu • To describe K-means program for GPU • Please modify this file • main.cu • To read input data, describe CPU program • Modification forbidden • check.c • To visualize output data by OpenCV • gen.c • To generate input data • Makefile • data/ • Input data • result/ • Output data

How to use Toolkit1.0 • $ make • Compile • $ make gpu • Execute GPU Program • $ make cpu • Execute CPU Program • $ ./gen SEED (SEED = 0,1,2,…) • Generate input data

Sample Code • Vector addition program for GPU • $ make : Compile • $ ./main : Program run • Point • Memory allocation on GPU • cudaMalloc(), cudaFree() • Data transfer between CPU and GPU • cudaMemcpy() • Format of GPU kernel function

Towards the fastest program • Minimumrequirement • Implementation K-means program on GPU • Parallelizing STEP1 or STEP2in K-means • How to optimize program • Parallelizing both of STEP1 and STEP2 • Shared memory, Constant memory • Coalesced Memory Accessetc • Web Site • NVIDIA GPU Computing Document: http://developer.nvidia.com/nvidia-gpu-computing-documentation • Fixstars CUDA Infromation Site: http://gpu.fixstars.com/index.php/

Announcement: • If you have not an account mail tonomura@am.ics.keio.ac.jp • Your name should be included in the mail. • Deadline：7/22(Fri)24:00 • Copy follows in ~/comparch • Source code and simple report • Please check the web site. Additional information will be on it. • If you have any question about the contest, please send mail to:nomura@am.ics.keio.ac.jp

Special Course on Computer Architectures ~ GPU Programming Contest~

Special Course on Computer Architectures ~ GPU Programming Contest~

Presentation Transcript

Universal Mechanisms for Data-Parallel Architectures

Programming with C# and .NET

Python Programming: An Introduction to Computer Science

Documenting Software Architectures

Python Programming: An Introduction to Computer Science

Parallel and Concurrent Programming

PROGRAMMING IN HASKELL

COS 461: Computer Networks Course Review (12 weeks in 80 minutes)

Advanced Computer Architectures – HB49 –

Applying Semantics to Service Oriented Architectures

Software Engineering Extreme Programming

CG Architectures, Image Formation, and Models Angel, Chapter 1

TM 331: Computer Programming Introduction to Class, Introduction to Programming

Lecture 2 C++ Programming

44th Annual Oratorical Chairpersons Conference

Introduction

Algorithms and Architectures for Decimal Transcendental Function Computation

Advanced Computer Architecture 5MD00 / 5Z033 ILP architectures with emphasis on Superscalar

Introduction to Computer programming

Socket Programming(2/2)