310 likes | 476 Views
Programming with CUDA and Parallel Algorithms. Waqar Saleem Jens Müller. Organization. People Waqar Saleem, waqar.saleem@uni-jena.de Jens Mueller, jkm@informatik.uni-jena.de Room 3335, Ernst-Abbe-Platz 2 The course will be conducted in English 6 points Wahl/Wahlpflicht
E N D
Programming with CUDA and Parallel Algorithms • Waqar Saleem • Jens Müller
Organization • People • Waqar Saleem, waqar.saleem@uni-jena.de • Jens Mueller, jkm@informatik.uni-jena.de • Room 3335, Ernst-Abbe-Platz 2 • The course will be conducted in English • 6 points • Wahl/Wahlpflicht • Theoretical/Practical
Organization • Meetings, before winter break • Tue 12-14, CZ 129 • Thu 16-18, CZ 129 • Every second week • Starting next week • Exercises: Wed 8-10, CZ 125 • Starting tomorrow in the pool
The course • 2 parts • Before winter break: Lectures and assignments • Need at least 50% in assignments to qualify for ... • After the break: Group projects • Project chosen by or assigned to each group • Regular meetings • Presentation of each project on semester end
Assignments • Build up a minimal ray tracer on GPU • Implement basic ray tracer on CPU • Port to GPU • Make ray tracer more interesting/efficient • Utilize CUDA concepts • Basic framework will be provided • Scene format and scenes • Introduction to ray tracing concepts
Requirements • Strong background in C programming • Familiarity with your OS • Modifying default settings • Writing/understanding Makefiles • Compiler flags and options
Course content • Parallel programming models and platforms • GPGPU • GPGPU on NVIDIA cards: CUDA • Architecture and programming model • OpenCL
Today • Organization • Brief introduction to parallel programming and CUDA • Short introduction to Ray tracing
Growth of Compute Capability • Moore’s law: the number of transistors that can be placed ... on an integrated circuit [doubles] approximately every two yearssource: wikipedia
Growth of Compute Capability • Moore’s lawsource: wikipedia
Need for increasing compute capability • Problems are getting more complex • e.g. Text editing to Image editing to Video editing • Current hardware complexity is never enough • Impractical to stop development at current state of the art
Barriers to growth • Natural limit on transistor size: the size of an atom • More transistors per unit area lead to higher power consumption and heat dissipation
Parallel architectures • Multiple Instructions Multiple Data (MIMD) • multi-threaded, multi-core architectures, clusters, grids • Single Instruction Multiple Data (SIMD) • Cell processor, GPUs, clusters, grids • GPU: Graphics Processing Unit • Parallel programming allows to program for parallel architectures
GPU architecture • Simpler architecture than MIMD • Little overhead for instruction scheduling, branch prediction etc.Subsequent figures from NVIDIA CUDA Programming Guide 2.3.1 unless mentioned otherwise
GPU architecture • Simpler architecture leads to higher performance (compared to CPUs)
General Purpose computing on GPU, GPGPU • Attractive because of raw GPU power • Traditionally hard because GPU programming was closely associated to graphics • Simplicity of GPU architecture limits the kind of problems suitable for GPGPU • or at least requires some problems to be reformulated
GPGPU for the masses* • Freeing the GPU from graphics: Nvidia CUDA, ATI Stream • C-like programming interface to the GPU • * - knowledge of underlying architecture required to achieve peak performance
Freeing Parallel Programming • OpenCL: code once, run anywhere • single core, multi core, GPU, ... • platform details transparent to the user • supported by major vendors: Apple, Intel, AMD, Nvidia, ... • OpenCL drivers made available by ATI and Nvidia for their cards
This course • chiefly CUDA: Nvidia specific, mature, well documented, easily available literature • some OpenCL: open standard, very new, limited documentation available, very similar concepts to CUDA • no ATI Stream
CUDA, Compute Unified Device Architecture • Software: C like programming interface to the GPU • Hardware: the hardware that supports the above programming model
CUDA programming model • CPU=host, GPU=device, work unit=thread
Ray tracing • A method to render a given scene • Cast rays from a camera into the scene • Compute ray intersections with scene geometry • Render pixelimage source: wikipedia
Ray tracer complexity • A ray tracer can be arbitrarily complex • Recursively compute intersections for reflected, refracted and shadow rays • Account for diffuse lighting • Consider multiple light sources • Consider light sources other than point lights • Account for textures: object materials
Coding a ray tracer • Relatively easy to code on the CPU • Call the same intersection function recursively on secondary rays • CPU code is not so complex • Tricky to code on the GPU as recursion is not yet supported in GPGPU models
This course • Build a trivial ray tracer on the CPU • compute view rays only • part of tomorrow’s exercise • Port to GPU • Add complexity to your GPU ray tracer
Reminders • Exercise session tomorrow • Register on CAJ