200 likes | 258 Views
CS 179: Lecture 2 Lab Review 1. The Problem. Add two arrays A[] + B[] -> C[]. GPU Computing: Step by Step. Setup inputs on the host (CPU-accessible memory) Allocate memory for inputs on the GPU Copy inputs from host to GPU Allocate memory for outputs on the host
E N D
The Problem • Add two arrays • A[] + B[] -> C[]
GPU Computing: Step by Step • Setup inputs on the host (CPU-accessible memory) • Allocate memory for inputs on the GPU • Copy inputs from host to GPU • Allocate memory for outputs on the host • Allocate memory for outputs on the GPU • Start GPU kernel • Copy output from GPU to host • (Copying can be asynchronous)
The Kernel • Determine a thread index from block ID and thread ID within ablock:
Fixing the Kernel • For large arrays, our kernel doesn’t work! • Bounds-checking – be on the lookout! • Also, need a way for kernel to handle a few more elements…
Lab 1! • Sum of polynomials – Fun, parallelizable example! • Suppose we have a polynomial P(r) with coefficients c0, …, cn-1, given by: • We want, for r0, …, rN-1, the sum: • Output condenses to one number!
Calculating P(r) once • Pseudocode (one possible method): Given r, coefficients[] result <- 0.0 power <- 1.0 for all coefficient indeciesi from 0 to n-1: result += (coefficients[i] * power) power *= r
Accumulation • atomicAdd() function • Important for safe operations!
Shared Memory • Faster than global memory • Per-block • One block
Linear Accumulation • atomicAdd() has a choke point! • What if we reduced our results in parallel?
Last notes • minuteman.cms.caltech.edu – the easiest option • CMS accounts! • Office hours • Kevin: Monday, 8-10 PM • Connor: Tuesday, 8-10 PM