Sparse Matrix-Dense Vector Multiply on G80: Probing the CUDA Parameter Space

Sparse Matrix-Dense Vector Multiply on G80: Probing the CUDA Parameter Space Comp 790 GPGP Project Stephen Olivier

Currently… • Have a working “naïve” implementation in which each thread computes one dot product (Similar to Sashi’s implementation) • 1.26 GFLOPs, 7.56 GB/s for n=32k, nz/row=20 • In the midst of implementing a version using the texture memory, which is cached, to store the input vector • Also developing an analytic model to express the parameterization of work and data partitioning to suit the G80

Pertinent Constraints • Available parallelism • Potential reuse • Capacity constraints of the various memories • Multithreading constraints • Thread/block/grid layout • Data distribution and blocking for the memory hierarchy • Amount of sequential work done for latency hiding

Resulting Analytic Model • Model will approximate ideal parameters based on problem size, e.g. number of rows and (average) number of nonzeros per row • Plan to verify the model by testing against a wide variation in the combinations of the parameters for some key sample problems • Can implement the model as an “autotuner” for G80 SpMV in the spirit of ATLAS or FFTW • Can integrate directly into code for g80 iterative methods, e.g. conjugate gradient

Sparse Matrix-Dense Vector Multiply on G80: Probing the CUDA Parameter Space

Sparse Matrix-Dense Vector Multiply on G80: Probing the CUDA Parameter Space

Presentation Transcript

GIS Data Models III

Perceptual Categories: Old and gradient, young and sparse.

Kernel – Based Methods

VECTOR FUNCTIONS

VECTOR CALCULUS

Accelerating SQL Database Operations on a GPU with CUDA

Vector Space Text Classification

Introduction to CUDA (2 of 2)

Automatic Performance Tuning and Sparse-Matrix-Vector-Multiplication (SpMV)

Dense Cold Matter

Review on Linear Algebra

An Introduction To Matrix Decomposition and Graphical Model

Chapter 12 Vectors

Matrix Multiplication and Graph Algorithms

VECTOR FUNCTIONS

Extracellular Matrix

VECTOR CALCULUS

Chapter 3 Vector Spaces

CS267 – Lecture 14 Automatic Performance Tuning and Sparse-Matrix-Vector-Multiplication (SpMV)