1 / 20

CS 179: Lecture 2 Lab Review 1

CS 179: Lecture 2 Lab Review 1. The Problem. Add two arrays A[] + B[] -> C[]. GPU Computing: Step by Step. Setup inputs on the host (CPU-accessible memory) Allocate memory for inputs on the GPU Copy inputs from host to GPU Allocate memory for outputs on the host

teagan-rich
Download Presentation

CS 179: Lecture 2 Lab Review 1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 179: Lecture 2Lab Review 1

  2. The Problem • Add two arrays • A[] + B[] -> C[]

  3. GPU Computing: Step by Step • Setup inputs on the host (CPU-accessible memory) • Allocate memory for inputs on the GPU • Copy inputs from host to GPU • Allocate memory for outputs on the host • Allocate memory for outputs on the GPU • Start GPU kernel • Copy output from GPU to host • (Copying can be asynchronous)

  4. The Kernel • Determine a thread index from block ID and thread ID within ablock:

  5. Calling the Kernel

  6. CUDA implementation (2)

  7. Fixing the Kernel • For large arrays, our kernel doesn’t work! • Bounds-checking – be on the lookout! • Also, need a way for kernel to handle a few more elements…

  8. Fixing the Kernel – Part 1

  9. Fixing the Kernel – Part 2

  10. Fixing our Call

  11. Lab 1! • Sum of polynomials – Fun, parallelizable example! • Suppose we have a polynomial P(r) with coefficients c0, …, cn-1, given by: • We want, for r0, …, rN-1, the sum: • Output condenses to one number!

  12. Calculating P(r) once • Pseudocode (one possible method): Given r, coefficients[] result <- 0.0 power <- 1.0 for all coefficient indeciesi from 0 to n-1: result += (coefficients[i] * power) power *= r

  13. Accumulation • atomicAdd() function • Important for safe operations!

  14. Accumulation

  15. Shared Memory • Faster than global memory • Per-block • One block

  16. Linear Accumulation • atomicAdd() has a choke point! • What if we reduced our results in parallel?

  17. Linear Accumulation

  18. Linear Accumulation (2)

  19. Can we do better?

  20. Last notes • minuteman.cms.caltech.edu – the easiest option • CMS accounts! • Office hours • Kevin: Monday, 8-10 PM • Connor: Tuesday, 8-10 PM

More Related