1 / 20

Optimization of Sparse Matrix Kernels for Data Mining

Optimization of Sparse Matrix Kernels for Data Mining. Eun-Jin Im and Katherine Yelick U.C.Berkeley. Outline. SPARSITY : Performance optimization of Sparse Matrix Vector Operations Sparse Matrices in Data Mining Applications Performance Improvements by SPARSITY for Data Mining Matrices.

davidlyons
Download Presentation

Optimization of Sparse Matrix Kernels for Data Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Optimization of Sparse Matrix Kernels for Data Mining Eun-Jin Im and Katherine Yelick U.C.Berkeley

  2. Outline • SPARSITY : Performance optimization of Sparse Matrix Vector Operations • Sparse Matrices in Data Mining Applications • Performance Improvements by SPARSITY for Data Mining Matrices Im and Yelick

  3. The Need for optimized sparse matrix codes • Sparse matrix is represented with indirect data structures. • Sparse matrix routines are slower than dense matrix counterparts. • The performance is dependent on the distribution of nonzero element of the sparse matrix. Im and Yelick

  4. The Solution : SPARSITY System • System that provides optimized C codes for sparse matrix vector operations • http://www.cs.berkeley.edu/~ejim/sparsity • Related Work : ATLAS, PHiPAC for dense matrix routines and FFTW Im and Yelick

  5. SPARSITY optimizations (1) Register Blocking 2x2 register blocked matrix • Identify a small dense blocks of nonzeros. • Use an optimized multiplication code for the particular block size. 3 2 1 2 0 1 4 2 2 5 1 0 0 3 1 0 3 1 2 0 5 0 3 7 0 1 1 4 • Improves register reuse, lowers indexing overhead. • Challenge : choosing a block size Im and Yelick

  6. SPARSITY optimizations (2) Cache Blocking • Keeping part of source vector in cache Source vector (x) = Destination Vector (y) Sparse matrix(A) • Improves cache reuse of source vector. • Challenge : choosing a block size Im and Yelick

  7. SPARSITY optimizations (3) Multiple Vectors • Better potential for reuse • Loop unrolled codes multiplying across vectors are generated by a code generator. x j1 y a i2 y ij i1 • Allows reuse of matrix elements. • Choosing the number of vectors for loop unrolling. Im and Yelick

  8. SPARSITY : automatic performance tuning SPARSITY is a system for automatic performance engineering. • Parameterized code generation • Search combined with performance modeling selects : • Register block size • Cache block size • Number of vectors for loop unrolling Im and Yelick

  9. Sparse Matrices from Data Mining App. Im and Yelick

  10. Data Mining Algorithms • For Text Retrieval Term-by-document matrix • Latent Semantic Indexing [Berry et. Al.] • Computation of Singular Value Decomposition • Blocked SVD uses multiple vectors • Concept Decomposition [Dhillon and Modha] • Matrix approximation solving least-squares problem • Also uses multiple vectors Im and Yelick

  11. Data Mining Algorithms • For Image Retrieval • Eigenface Approximation [Li] • Used for face recognition • Pixel-by-image matrix • Each image has multi-resolution hierarchy and is compressed with wavelet transformation. Im and Yelick

  12. Platforms Used in Performance Measurements Im and Yelick

  13. Performance on Web Document Data Im and Yelick

  14. Performance on NSF Abstract Data Im and Yelick

  15. Performance on Face Image Data Im and Yelick

  16. Speedup Im and Yelick

  17. Performance Summary • Performance is better when a matrix is denser. (Face Images) • Cache blocking is effective for a matrix with a large number of columns. (Web Documents) • Optimization of the multiplication with multiple vectors is effective. Im and Yelick

  18. Cache Block Size for Web Document Matrix • Width of cache block is limited by the size of cache. • For multiple vectors, the loop unrolling factor is 10 except for Alpha 21164 where the factor is 3. Im and Yelick

  19. Conclusion • Most of the matrices used in data mining is sparse matrix. • The sparse matrix operation is memory-inefficient and needs optimization. • The optimization is dependent on the nonzero structure of the matrix. • SPARSITY system effectively speeds up this operation. Im and Yelick

  20. For Contribution • Contact ejim@cs.berkeley.edu to donate your matrix ! Thank you. Im and Yelick

More Related