150 likes | 297 Views
Lecture 1 ( Advanced Object Oriented Programming) GPGPU-Programming. Arne Kutzner Hanyang University / Seoul Korea. Contact. Contact data: E-Mail kutzner@hanyang.ac.kr Phone 2220 2397 Office Room 77-714 in emergency cases: 010 3938 1997.
E N D
Lecture 1(Advanced Object Oriented Programming)GPGPU-Programming Arne Kutzner Hanyang University / Seoul Korea
Contact • Contact data: • E-Mail kutzner@hanyang.ac.kr • Phone 2220 2397 • Office Room 77-714 • in emergencycases: 010 3938 1997 Algorithm Analysis
Prof. Dr. Arne Kutzner / Weekly Schedule 2010.2 Alg. Analysis13:00 - 14:30(H77 - 203) C++ 14:00-16:00(H77 – 703) Master Students14:00 – 17:00 C++ 16:00-18:00(H77 – 703) Alg. Analysis16:30-18:00(H77 – 302) Algorithm Analysis
Goals • Study some selected problems and their solutions using GPGPU. Examples: • Multimedia-Content (e.g. video encoding, decoding) • Simulations (cellular automata) • Sorting, Merging • Combinatory Problems • Scheduling - Problems • Brute Force Attacks (cryptography) • Starting Points • Examples contained in the NVIDIA or ATI SDK • Nvidia Examples:http://developer.nvidia.com/object/cuda_sdk_samples.html • Papers from conferences with focus on parallel computing • List of Conferences on parallel computing:http://www.google.com.bz/Top/Computers/Parallel_Computing/Conferences/ Algorithm Analysis
Structure of Course • Seminar like style • Each participant selects some topic and prepares a presentation about this topic • Parallel we will discuss the ongoing research work in Prof. Kutzner’s workgroup (parallel reductions) • Prof. Kutzner will report about his work in the area of merging and sorting using GPGPU • Participants can decide whether they want to report about more technical aspects or rather algorithmic aspects in general • Grading: • Presentation, Summary, Participation, Attendance Algorithm Analysis
Why parallel CPU/GPU architectures? • There are limits regarding the speed (frequency) for the clocking of a single CPU • the higher the frequency, the higher the power consumption (thermal performance) • Limit nowadays ≈ 3 Ghz • The energy efficiency of IT-infrastructures (IT-gadgets) becomes more and more important • Cost aspect (company), battery lifespan (hand-phones etc.) • Increasing wish for computational performance – e.g. with hardcore gamers • Solution: Parallel architectures Algorithm Analysis
Development in field of parallel CPU/GPU architectures GPU’s – OpenGL(architectures optimized for many parallel floating point operations) Single core CPU’s(general purpose processing units) yesterday Multi core CPU’s(2, 4, 8 cores) today GPU’s – CUDA /OpenCL(general purpose programming on GPU’s) Multi core CPU’swith 64 or more cores, where some cores are “simplified” tomorrow ? GPU’s many core, where some behave like a CPU Flexible Parallel Architectures Algorithm Analysis
Texture Texture Texture Texture Texture Texture Texture Texture Texture Host Input Assembler Thread Execution Manager Parallel DataCache Parallel DataCache Parallel DataCache Parallel DataCache Parallel DataCache Parallel DataCache Parallel DataCache Parallel DataCache Load/store Load/store Load/store Load/store Load/store Load/store Global Memory Modern GPU - Architecture “Master control unit” Multiprocessor Processor Scalar Processor (SP) Algorithm Analysis
Programmers view of GPGPU Thread Block 1 Thread Block 2 Shared Memory Shared Memory Thread 1 Thread 2 Thread 1 Thread 2 Register Register Register Register Global Memory (GPU – on graphics card) Algorithm Analysis
GPGPU Basics (1) • All threads execute the same code (kernel) • Code on GPU side is called kernel • Threads have a unique thread id number • Blocks have a unique block id number • Using thread id and block id we can compute for all threads a unique global id Algorithm Analysis
Multi-Processor Processors (Cores) and Thread-blocks • The model allows some form of automatic scalability • If we have more cores, we don’t have to change the code Algorithm Analysis
GPGPU Basics (2) • Memory Hierarchy • Access (Read/Write) to global memory is expensive • Access to shared memory is efficient but can create access conflicts and requires synchronization • Access to registers is most efficient because free of conflicts Algorithm Analysis
GPGPU Basics (3) – Memory Access • Modern cards try to “bundle” access to the global memory • This technique is call coalitioning • The capabilities of different cards vary from NVIDIA documentation Algorithm Analysis
GPGPU and programmers / algorithm engineering • The special architecture of modern GPUs requires some special form of thinking on programmers side • Idea’s that work great in a single thread word can represent a great mess in the context of GPGPU • Example: Algorithms that contain some form of implicit serialization • To solve a problem partially redundant can become advantageous in the GPGPU-world Algorithm Analysis