1 / 32

Thrust & Curand

Thrust & Curand. Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn. Outline. Thrust Basic Work with Thrust General Types of Thrust Fancy Iterators of Thrust Curand Library. Thrust Basic. What is Thrust? Containers Iterators Interaction with raw pointers Namespace. What is Thrust?.

hovan
Download Presentation

Thrust & Curand

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Thrust & Curand Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

  2. Outline • Thrust Basic • Work with Thrust • General Types of Thrust • Fancy Iterators of Thrust • Curand Library

  3. Thrust Basic • What is Thrust? • Containers • Iterators • Interaction with raw pointers • Namespace

  4. What is Thrust? • C++ template library for CUDA • Mimics Standard Template Library (STL) • Containers • thrust::host_vector<T> • thrust::device_vector<T> • Algorithms • thrust::sort() • thrust::reduce() • thrust::inclusive_scan() • …

  5. Containers • Make common operations concise and readable • Hides cudaMalloc, cudaMemcpy and cudaFree // allocate host vector with two elements thrust::host_vector<int> h_vec(2); // copy host vector to device thrust::device_vector<int> d_vec = h_vec; // manipulate device values from the host d_vec[0] = 13; d_vec[1] = 27; std::cout << "sum: " << d_vec[0] + d_vec[1] << std::endl; // vector memory automatically released w/ free() or cudaFree()

  6. Iterators • Sequences defined by pair of iterators // allocate device vector thrust::device_vector<int> d_vec(4); // returns iterator at first element of d_vec d_vec.begin(); // returns iterator one past the last element of d_vec d_vec.end() // [begin, end) pair defines a sequence of 4 elements d_vec.begin() d_vec.end()

  7. Iterators // initialize random values on host thrust::host_vector<int> h_vec(1000); thrust::generate(h_vec.begin(), h_vec.end(), rand); // copy values to device thrust::device_vector<int> d_vec = h_vec; // compute sum on host inth_sum = thrust::reduce(h_vec.begin(),h_vec.end()); // compute sum on device intd_sum = thrust::reduce(d_vec.begin(),d_vec.end());

  8. Convertible to Raw Pointers // allocate device vector thrust::device_vector<int> d_vec(4); // obtain raw pointer to device vector’s memory int * ptr = thrust::raw_pointer_cast(&d_vec[0]); // use ptr in a CUDA C kernel my_kernel<<<N/256, 256>>>(N, ptr); // Note: ptr cannot be dereferenced on the host!

  9. Wrap Raw Pointers with device_ptr // raw pointer to device memory int * raw_ptr; cudaMalloc((void **) &raw_ptr, N * sizeof(int)); // wrap raw pointer with a device_ptr thrust::device_ptr<int> dev_ptr(raw_ptr); // use device_ptr in thrust algorithms thrust::fill(dev_ptr, dev_ptr + N, (int) 0); // access device memory through device_ptr dev_ptr[0] = 1;

  10. Namespaces • C++ supports namespaces • Thrust uses thrust namespace • thrust::device_vector • thrust::copy • STL uses std namespace • std::vector • std::list • Avoids collisions • thrust::sort() • std::sort() • For brevity • using namespace thrust;

  11. Usage of Thrust: • Thrust provides many standard algorithms • Transformations • Reductions • Prefix Sums • Sorting • Generic definitions • General Types • Built-in types (int, float,…) • User-defined structures • General Operators • reduce with plus operator • scan with maximum operator

  12. Transformations // allocate three device_vectors with 10 elements thrust::device_vector<int> X(10); thrust::device_vector<int> Y(10); thrust::device_vector<int> Z(10); // initialize X to 0,1,2,3, .... thrust::sequence(X.begin(), X.end()); // compute Y = -X thrust::transform(X.begin(), X.end(), Y.begin(), thrust::negate<int>()); // fill Z with twos thrust::fill(Z.begin(), Z.end(), 2); // compute Y = X mod 2 thrust::transform(X.begin(), X.end(), Z.begin(), Y.begin(), thrust::modulus<int>()); // replace all the ones in Y with tens thrust::replace(Y.begin(), Y.end(), 1, 10);

  13. Reductions int sum = thrust::reduce(D.begin(), D.end(), (int) 0, thrust::plus<int>()); int sum = thrust::reduce(D.begin(), D.end(), (int) 0); int sum = thrust::reduce(D.begin(), D.end()) thrust::count_if thrust::min_element thrust::max_element thrust::is_sorted thrust::inner_product

  14. Sorting • thrust::sort • thrust::stable_sort • thrust::sort_by_key • thrust::stable_sort_by_key

  15. General Types and Operators #include <thrust/reduce.h> // declare storage device_vector<int> i_vec = ... device_vector<float> f_vec = ... // sum of integers (equivalent calls) reduce(i_vec.begin(), i_vec.end()); reduce(i_vec.begin(), i_vec.end(), 0, plus<int>()); // sum of floats (equivalent calls) reduce(f_vec.begin(), f_vec.end()); reduce(f_vec.begin(), f_vec.end(), 0.0f, plus<float>()); // maximum of integers reduce(i_vec.begin(), i_vec.end(), 0, maximum<int>());

  16. General Types and Operators struct negate_float2 { __host__ __device__ float2 operator()(float2 a) { return make_float2(-a.x, -a.y); } }; // declare storage device_vector<float2> input = ... device_vector<float2> output = ... // create functor negate_float2 func; // negate vectors transform(input.begin(), input.end(), output.begin(), func);

  17. General Types and Operators // compare x component of two float2 structures struct compare_float2 { __host__ __device__ bool operator()(float2 a, float2 b) { return a.x < b.x; } }; // declare storage device_vector<float2> vec = ... // create comparison functor compare_float2 comp; // sort elements by x component sort(vec.begin(), vec.end(), comp);

  18. Advanced Thrust • Behave like “normal” iterators • Algorithms don't know the difference • constant_iterator • counting_iterator • transform_iterator • zip_iterator

  19. Constant // create iterators constant_iterator<int> begin(10); constant_iterator<int> end = begin + 3; begin[0] // returns 10 begin[1] // returns 10 begin[100] // returns 10 // sum of [begin, end) reduce(begin, end); // returns 30 (i.e. 3 * 10)

  20. Counting // create iterators counting_iterator<int> begin(10); counting_iterator<int> end = begin + 3; begin[0] // returns 10 begin[1] // returns 11 begin[100] // returns 110 // sum of [begin, end) reduce(begin, end); // returns 33 (i.e. 10 + 11 + 12)

  21. Transform • Yields a transformed sequence • Facilitates kernel fusion • Conserves memory capacity and bandwidth F( x ) F( x ) F( y) F( z ) X Y Z

  22. Transform // initialize vector device_vector<int> vec(3); vec[0] = 10; vec[1] = 20; vec[2] = 30; // create iterator (type omitted) begin = make_transform_iterator(vec.begin(), negate<int>()); end = make_transform_iterator(vec.end(), negate<int>()); begin[0] // returns -10 begin[1] // returns -20 begin[2] // returns -30 // sum of [begin, end) reduce(begin, end); // returns -60 (i.e. -10 + -20 + -30)

  23. Zip • Looks like an array of structs (AoS) • Stored in structure of arrays (SoA)

  24. Zip // initialize vectors device_vector<int> A(3); device_vector<char> B(3); A[0] = 10; A[1] = 20; A[2] = 30; B[0] = ‘x’; B[1] = ‘y’; B[2] = ‘z’; // create iterator (type omitted) begin = make_zip_iterator(make_tuple(A.begin(), B.begin())); end = make_zip_iterator(make_tuple(A.end(), B.end())); begin[0] // returns tuple(10, ‘x’) begin[1] // returns tuple(20, ‘y’) begin[2] // returns tuple(30, ‘z’) // maximum of [begin, end) maximum< tuple<int,char> > binary_op; reduce(begin, end, begin[0], binary_op); // returns tuple(30, ‘z’)

  25. Curand Library • # include <curand.h> • curandStatus_tcurandGenerate(curandGenerator_t generator, unsigned int *outputPtr, size_t num) • A given state may not be used by more than one block. • A given block may generate random numbers using multiple states.

  26. Distribution __device__ float curand_uniform (curandState_t *state) __device__ float curand_normal(curandState_t *state) __device__ unsigned intcurand_poisson (curandState_t *state, double lambda) __device__ double curand_uniform_double (curandState_t *state) __device__ float2 curand_normal2 (curandState_t *state)

  27. Estimating π structestimate_pi{ public thrust::unary_function<unsigned int, float>{ __device__ float operator()(unsigned intthread_id) { float sum = 0; unsigned int N = 10000; // samples per thread unsigned int seed = thread_id; curandState s; // seed a random number generator curand_init(seed, 0, 0, &s); // take N samples in a quarter circle for(unsigned int i = 0; i < N; ++i) {

  28. Estimating π // draw a sample from the unit square float x = curand_uniform(&s); float y = curand_uniform(&s); float dist = sqrtf(x*x + y*y); // measure distance from the origin // add 1.0f if (u0,u1) is inside the quarter circle if(dist <= 1.0f) sum += 1.0f; } // multiply by 4 to get the area of the whole circle sum *= 4.0f; // divide by N return sum / N; }};

  29. Estimating π int main(void) { // use 30K independent seeds int M = 30000; float estimate = thrust::transform_reduce ( thrust::counting_iterator<int>(0),thrust::counting_iterator<int>(M), estimate_pi(), 0.0f, thrust::plus<float>()); estimate /= M; std::cout << std::setprecision(3); std::cout << "pi is approximately "; std::cout << estimate << std::endl; return 0; }

  30. Review • What are containers and iterators? • How to transfer information between Host and Device via Thurst? • How to use thrust’s namespace? • In which way can we make containers associated with raw pointers? • How to use Thrust to perform transformation, reduce and sort? • How to use general types in Thrust?

  31. Review • Why constant iterators can save storage? • When do we need to use counting iterators, and how much memory does it require? • How to use transforiterators? • Why do we need zip iterators? • How to generate different distribution by Curand? • How to use Curand with Thrust? • How to use Curand in a kernel function?

  32. References • An Introduction to Thrust • GTC 2010 • (Part 1) - High Productivity Development • (Part 2) - Thrust By Example • Thrust_Quick_Start_Guide • Thrust - A Productivity-Oriented Library for CUDA • http://code.google.com/p/thrust/ • CURAND_Library

More Related