40 likes | 188 Views
Sparse Matrix-Dense Vector Multiply on G80: Probing the CUDA Parameter Space. Comp 790 GPGP Project Stephen Olivier. Currently…. Have a working “naïve” implementation in which each thread computes one dot product (Similar to Sashi’s implementation) 1.26 GFLOPs, 7.56 GB/s for n=32k, nz/row=20
E N D
Sparse Matrix-Dense Vector Multiply on G80: Probing the CUDA Parameter Space Comp 790 GPGP Project Stephen Olivier
Currently… • Have a working “naïve” implementation in which each thread computes one dot product (Similar to Sashi’s implementation) • 1.26 GFLOPs, 7.56 GB/s for n=32k, nz/row=20 • In the midst of implementing a version using the texture memory, which is cached, to store the input vector • Also developing an analytic model to express the parameterization of work and data partitioning to suit the G80
Pertinent Constraints • Available parallelism • Potential reuse • Capacity constraints of the various memories • Multithreading constraints • Thread/block/grid layout • Data distribution and blocking for the memory hierarchy • Amount of sequential work done for latency hiding
Resulting Analytic Model • Model will approximate ideal parameters based on problem size, e.g. number of rows and (average) number of nonzeros per row • Plan to verify the model by testing against a wide variation in the combinations of the parameters for some key sample problems • Can implement the model as an “autotuner” for G80 SpMV in the spirit of ATLAS or FFTW • Can integrate directly into code for g80 iterative methods, e.g. conjugate gradient