210 likes | 243 Views
Optimization of Sparse Matrix Kernels for Data Mining. Eun-Jin Im and Katherine Yelick U.C.Berkeley. Outline. SPARSITY : Performance optimization of Sparse Matrix Vector Operations Sparse Matrices in Data Mining Applications Performance Improvements by SPARSITY for Data Mining Matrices.
E N D
Optimization of Sparse Matrix Kernels for Data Mining Eun-Jin Im and Katherine Yelick U.C.Berkeley
Outline • SPARSITY : Performance optimization of Sparse Matrix Vector Operations • Sparse Matrices in Data Mining Applications • Performance Improvements by SPARSITY for Data Mining Matrices Im and Yelick
The Need for optimized sparse matrix codes • Sparse matrix is represented with indirect data structures. • Sparse matrix routines are slower than dense matrix counterparts. • The performance is dependent on the distribution of nonzero element of the sparse matrix. Im and Yelick
The Solution : SPARSITY System • System that provides optimized C codes for sparse matrix vector operations • http://www.cs.berkeley.edu/~ejim/sparsity • Related Work : ATLAS, PHiPAC for dense matrix routines and FFTW Im and Yelick
SPARSITY optimizations (1) Register Blocking 2x2 register blocked matrix • Identify a small dense blocks of nonzeros. • Use an optimized multiplication code for the particular block size. 3 2 1 2 0 1 4 2 2 5 1 0 0 3 1 0 3 1 2 0 5 0 3 7 0 1 1 4 • Improves register reuse, lowers indexing overhead. • Challenge : choosing a block size Im and Yelick
SPARSITY optimizations (2) Cache Blocking • Keeping part of source vector in cache Source vector (x) = Destination Vector (y) Sparse matrix(A) • Improves cache reuse of source vector. • Challenge : choosing a block size Im and Yelick
SPARSITY optimizations (3) Multiple Vectors • Better potential for reuse • Loop unrolled codes multiplying across vectors are generated by a code generator. x j1 y a i2 y ij i1 • Allows reuse of matrix elements. • Choosing the number of vectors for loop unrolling. Im and Yelick
SPARSITY : automatic performance tuning SPARSITY is a system for automatic performance engineering. • Parameterized code generation • Search combined with performance modeling selects : • Register block size • Cache block size • Number of vectors for loop unrolling Im and Yelick
Sparse Matrices from Data Mining App. Im and Yelick
Data Mining Algorithms • For Text Retrieval Term-by-document matrix • Latent Semantic Indexing [Berry et. Al.] • Computation of Singular Value Decomposition • Blocked SVD uses multiple vectors • Concept Decomposition [Dhillon and Modha] • Matrix approximation solving least-squares problem • Also uses multiple vectors Im and Yelick
Data Mining Algorithms • For Image Retrieval • Eigenface Approximation [Li] • Used for face recognition • Pixel-by-image matrix • Each image has multi-resolution hierarchy and is compressed with wavelet transformation. Im and Yelick
Platforms Used in Performance Measurements Im and Yelick
Performance on Web Document Data Im and Yelick
Performance on NSF Abstract Data Im and Yelick
Performance on Face Image Data Im and Yelick
Speedup Im and Yelick
Performance Summary • Performance is better when a matrix is denser. (Face Images) • Cache blocking is effective for a matrix with a large number of columns. (Web Documents) • Optimization of the multiplication with multiple vectors is effective. Im and Yelick
Cache Block Size for Web Document Matrix • Width of cache block is limited by the size of cache. • For multiple vectors, the loop unrolling factor is 10 except for Alpha 21164 where the factor is 3. Im and Yelick
Conclusion • Most of the matrices used in data mining is sparse matrix. • The sparse matrix operation is memory-inefficient and needs optimization. • The optimization is dependent on the nonzero structure of the matrix. • SPARSITY system effectively speeds up this operation. Im and Yelick
For Contribution • Contact ejim@cs.berkeley.edu to donate your matrix ! Thank you. Im and Yelick