290 likes | 300 Views
Sparse Matrices and Combinatorial Algorithms in Star-P Status and Plans April 8, 2005. Matlab’s sparse matrix design principles. All operations should give the same results for sparse and full matrices (almost all)
E N D
Sparse Matrices and Combinatorial Algorithms in Star-P Status and PlansApril 8, 2005
Matlab’s sparse matrix design principles • All operations should give the same results for sparse and full matrices (almost all) • Sparse matrices are never created automatically, but once created they propagate • Performance is important -- but usability, simplicity, completeness, and robustness are more important • Storage for a sparse matrix should be O(nonzeros) • Time for a sparse operation should be O(flops)(as nearly as possible)
Matlab’s sparse matrix design principles • All operations should give the same results for sparse and full matrices (almost all) • Sparse matrices are never created automatically, but once created they propagate • Performance is important -- but usability, simplicity, completeness, and robustness are more important • Storage for a sparse matrix should be O(nonzeros) • Time for a sparse operation should be O(flops)(as nearly as possible) Star-P dsparse matrices: same principles, some different tradeoffs
Sparse matrix operations • dsparse layout, same semantics as ddense • For now, only row distribution • Matrix operators: +, -, max, etc. • Matrix indexing and concatenation A (1:3, [4 5 2]) = [ B(:, 7) C ] ; • A \ b by direct methods
Star-P sparse data structure • Full: • 2-dimensional array of real or complex numbers • (nrows*ncols) memory • Sparse: • compressed row storage • about (2*nzs + nrows) memory
Star-P distributed sparse data structure P0 P1 • Each processor stores: • # of local nonzeros • range of local rows • nonzeros in CSR form P2 Pn
Sorting in Star-P • [V, perm] = sort (V) • Useful primitive for some sparse array algorithms, e.g. the sparse constructor • Star-P uses a parallel sample sort
The sparse( ) constructor • A = sparse (I, J, V, nr, nc) • Input: ddense vectors I, J, V, and dimensions nr, nc • A(I(k), J(k)) = V(k) for each k • Sort triples (i, j, v) by (i, j) • Adjust for desired row distribution • Locally convert to compressed row indices • Sum values with duplicate indices
Sparse matrix times dense vector • y = A * x • The first call to matvec caches a communication schedule for matrix A. Later calls to multiply any vector by A use the cached schedule. • Communication and computation overlap.
Sparse linear systems • x = A \ b • Matrix division uses MPI-based direct solvers: • SuperLU_dist: nonsymmetric static pivoting • MUMPS: nonsymmetric multifrontal • PSPASES: Cholesky ppsetoption(’SparseDirectSolver’,’SUPERLU’) • Iterative solvers implemented in Star-P • Ongoing work on preconditioners
Sparse eigenvalue solvers • [V, D] = eigs(A); etc. • Implemented via MPI-based PARPACK [Lehoucq, Maschhoff, Sorensen, Yang] • Work in progress
Combinatorial Scientific Computing • Sparse matrix methods; machine learning; search and information retrieval; adaptive multilevel modeling; geometric meshing; computational biology; . . . • How will combinatorial methods be used by people who don’t understand them in complete detail?
Analogy: Matrix division in Matlab x = A \ b; • Works for either full or sparse A • Is A square? no => use QR to solve least squares problem • Is A triangular or permuted triangular? yes => sparse triangular solve • Is A symmetric with positive diagonal elements? yes => attempt Cholesky after symmetric minimum degree • Otherwise => use LU on A(:, colamd(A))
Combinatorics in Star-P • A sparse matrix language is a good start on primitives for combinatorial computing. • Random-access indexing: A(i,j) • Neighbor sequencing: find (A(i,:)) • Sparse table construction: sparse (I, J, V) • Other primitives we are building: sorting, searching, pointer-jumping, connectivity and paths, . . .
Matching and depth-first search in Matlab • dmperm: Dulmage-Mendelsohn decomposition • Square, full rank A: • [p, q, r] = dmperm(A); • A(p,q) is block upper triangular with nonzero diagonal • also, strongly connected components of a directed graph • also, connected components of an undirected graph • Arbitrary A: • [p, q, r, s] = dmperm(A); • maximum-size matching in a bipartite graph • minimum-size vertex cover in a bipartite graph • decomposition into strong Hall blocks
Connected components (work in progress) • Sequential Matlab uses depth-first search (dmperm), which doesn’t parallelize well • Shiloach-Vishkin algorithm: • repeat • Link every (super)vertex to a random neighbor • Shrink each tree to a supervertex by pointer jumping • until no further change • Hybrid SV / search method under construction • Other potential graph kernels: • Breadth-first search (after Husbands, LBNL) • Bipartite matching (after Riedy, UCB) • Strongly connected components (after Pinar, LBNL)
SSCA #2 -- Overview • Scalable data generator • Kernel 1 — Graph Construction • Kernel 2 — Classify Large Sets (Search) • Kernel 3 — Subgraph Extraction • Kernel 4 — Graph Clustering / Partitioning • Randomly sized “cliques” • Most intra-clique edges present • Random inter-clique edges,gradually 'thinning' with distance • Integer and character string edge labels • Randomized vertex numbers
Scalable Data Generator • Produce edge tuples containing the start vertex, end vertex, and weight for each directed edge • Hierarchical nature, with random-sized cliques connected by edges between some of the cliques • Interclique edges are assigned using a random distribution that represents a hierarchical thinning as vertices are further apart • Weights are either random integer values or randomly selected character strings • Random permutations to avoid induced locality • Vertex numbers — local processor memory locality • Parallel — global memory locality • Data sets must be developed in their entirety before graph construction • May be parallelized — data tuple vertex naming schemes must be globally consistent • Data generator will not be timed • Statistics will be saved to be used in later verification stages
Kernel 1 — Graph Construction • Construct the multigraph in a format usable by all subsequent kernels • No subsequent modifications will be permitted to benefit specific kernels • Graph construction will be timed
Kernel 2 — Sort on Selected Edge Weights • Sort on edge weights to determine those vertex pairs with • The largest integer weights • With a desired string weight • The two vertex pair lists will be saved for use in the subsequent kernel • Start set SI for integer start sets • Start set SC for character start sets • Sorting weights will be timed
Kernel 3 — Extract Subgraphs • Extract a series of subgraphs formed by following paths of length k from a start set of initial vertices • The set of initial vertices will be determined from the previous kernel • Produce a series of subgraphs consisting of vertices and edges on paths of length k from the vertex pairs start sets SI and SC • A possible computational kernel for graph extraction is Breadth First Search • Extracting graphs will be timed
Kernel 4 — Partition Graph Using a Clustering Algorithm • Apply a graph clustering algorithm to partition the vertices of the graph into subgraphs no larger than a maximum size so as to minimize the number of edges that need be “cut”, e.g. • Kernighan and Lin • Sangiovanni-Vincentelli, Chen, and Chua • The output of this kernel will be a description of the clusters and the edges that form the interconnecting “network” • This step should identify the structure in the graph specified in the data generation phase • It is important that the implementation of this kernel does not utilize a priori knowledge of the details in the data generator or the statistics collected in the graph generation process • Identifying the clusters and the interconnecting “network” will be timed
A serial Matlab implementation of SSCA#2 Coding style is simple and “data parallel” Our starting point for Star-P implementation Edge labels are only integers, not strings yet Concise SSCA#2
Data generation • Using Lincoln Labs serial Matlab generator at present • Data-parallel Star-P generator under construction • Generate “edge triples” as distributed dense vectors
Kernel 1: Graph construction • cSSCA#2 K1: ~30 lines of executable serial Matlab • Multigraph is a cell array of simple directed graphs • Graphs are dsparse matrices, created by sparse( )
cSSCA#2 K2: ~12 lines of executable serial Matlab Uses max( ) and find( ) to locate edges of max weight Kernel 2: Search in large sets ng = length(G.edgeWeights); % number of graphs maxwt = 0; for g = 1:ng maxwt = max(maxwt, max(max(G.edgeWeights{g}))); end intedges = zeros(0,2); for g = 1:ng [intstart intend] = find(G.edgeWeights{g} == maxwt); intedges = [intedges ; intstart intend]; end;
cSSCA#2 K3: ~25 lines of executable serial Matlab Returns subgraphs consisting of vertices and edges within specified distance of specified starting vertices Sparse matrix-vector product for breadth-first search Kernel 3: Subgraph extraction
Kernel 4: Partitioning / clustering • cSSCA#2 K4: serial prototype exists, not yet parallel • Different algorithm from Lincoln Labs executable spec, which uses seed-growing breadth-first search • cSSCA#2 uses spectral methods for clustering, based on eigenvectors (Fiedler vectors) of the graph • One version seeks small balanced edge cuts (graph partitioning); another seeks clusters with low isoperimetric number (clustering)