70 likes | 212 Views
Chris Kerkhoff Matthew Sullivan 12/2/2009. Programming With CUDA. Basic Flow. The host computer initializes an array with data. The array is copied from the main memory to the memory on the GPU. The GPU performs operations on the array.
E N D
Chris Kerkhoff Matthew Sullivan 12/2/2009 Programming With CUDA
Basic Flow • The host computer initializes an array with data. • The array is copied from the main memory to the memory on the GPU. • The GPU performs operations on the array. • The array is copied back to the main memory on the computer.
#include<stdio.h> #include<cuda.h> // Kernel that executes on the CUDA device: __global__ voidcube_array(float *a, int N) {intidx = blockIdx.x * blockDim.x + threadIdx.x; if (idx<N) a[idx] = a[idx] * a[idx] * a[idx]; } int main(void) {// main routine that executes on the host float *a_h, *a_d; // Pointer to host & device arrays constint N = 10; // Number of elements in arrays size_t size = N * sizeof(float); a_h = (float *)malloc(size); // Allocate array on host cudaMalloc((void **) &a_d, size); // Allocate array on device // Initialize host array and copy it to CUDA device: for (inti=0; i<N; i++) a_h[i] = (float)i; cudaMemcpy(a_d, a_h, size, cudaMemcpyHostToDevice); // Do calculation on device: intblock_size = 4; intn_blocks = N/block_size + (N%block_size == 0 ? 0:1); cube_array <<< n_blocks, block_size >>> (a_d, N); // Retrieve result from device and store it in host array: cudaMemcpy(a_h, a_d, sizeof(float)*N, cudaMemcpyDeviceToHost); for (inti=0; i<N; i++) printf("%d %f\n", i, a_h[i]); //Print results free(a_h); cudaFree(a_d);}//Cleanup
0 0.000000 1 1.000000 2 8.000000 3 27.000000 4 64.000000 5 125.000000 6 216.000000 7 343.000000 8 512.000000 9 729.000000 Press any key to continue . . .