260 likes | 502 Views
Parallel Software for SemiDefinite Programming with Sparse Schur Complement Matrix. Makoto Yamashita @ Tokyo-Tech Katsuki Fujisawa @ Chuo University Mituhiro Fukuda @ Tokyo-Tech Yoshiaki Futakata @ University of Virginia Kazuhiro Kobayashi @ National Maritime Research Institute
E N D
Parallel Software for SemiDefinite Programming with Sparse Schur Complement Matrix Makoto Yamashita @ Tokyo-Tech Katsuki Fujisawa @ Chuo University Mituhiro Fukuda @ Tokyo-Tech Yoshiaki Futakata @ University of Virginia Kazuhiro Kobayashi @ National Maritime Research Institute Masakazu Kojima @ Tokyo-Tech Kazuhide Nakata @ Tokyo-Tech Maho Nakata @ RIKEN ISMP 2009 @ Chicago [2009/08/26]
Extremely Large SDPs • Arising from various fields • Quantum Chemistry • Sensor Network Problems • Polynomial Optimization Problems • Most computation time is related to Schur complement matrix (SCM) • [SDPARA]Parallel computation for SCM • In particular, sparse SCM
Outline • SemiDefinite Programming and Schur complement matrix • Parallel Implementation • Parallel for Sparse Schur complement • Numerical Results • Future works
Exploitation of Sparsity in Computation for Search Direction 2.CHOLESKY Schur complement matrix ⇒ Cholesky Factorizaiton 1.ELEMENTS
Bottlenecks on Single Processor in second Opteron 246 (2.0GHz) Apply Parallel Computation to the Bottlenecks
SDPARA http://sdpa.indsys.chuo-u.ac.jp/sdpa/ • SDPA parallel version(generic SDP solver) • MPI & ScaLAPACK • Row-wise distribution for ELEMENTS • parallel Cholesky factorization for CHOLESKY
Row-wise distribution for evaluation of the Schur complement matrix • 4 CPU is available • Each CPU computes only their assigned rows • . • No communication between CPUs • Efficient memory management
Parallel Cholesky factorization • We adopt Scalapack for the Cholesky factorization of the Schur complement matrix • We redistribute the matrix from row-wise to two-dimensional block-cyclic distribtuion Redistribution
Computation time on SDP from Quantum Chemistry [LiOH] AIST super clusterOpteron 246 (2.0GHz) 6GB memory/node
Sclability on SDP from Quantum Chemistry [NF] ELEMENTS 63 times CHOLESKY 39 times Total 29 times ELEMENTS is very effective
Sparse Schur complement matrix • Schur complement matrix becomes very sparse for some applications. from Control Theory(100%) from Sensor Network(2.12%) ⇒Simple Row-wise loses its efficiency
Sparseness ofSchur complement matrix • Many applications havediagonal block structure
Exploitation of Sparsityin SDPA • We change the formula by row-wise F1 F2 F3
CHOLESKY forSparse Schur complement • Parallel Sparse Cholesky factorization implemented in MUMPS • MUMPS adopts Multiple Frontal method Memory storage on each processor should be consecutive. The distribution for ELEMENTS matches this method.
Computation time for SDPs from Polynomial Optimization Problem Parallel Sparse Cholesky achieves mild scalability. ELEMENTS attains 24x speed-up on 32 CPUs. tsubasaXeon E5440 (2.83GHz) 8GB memory/node
ELEMENTS Load-balance on 32 CPUs • Only first processor has a little heavier computation.
Automatic selection ofsparse / dense SCM • Dense Parallel Cholesky achieves higher scalability than Sparse Parallel Cholesky • Dense becomes better for many processors. • We estimate both computation time using computation cost and scalability.
Sparse/Dense CHOLESKY for a small SDP from POP tsubasaXeon E5440 (2.83GHz) 8GB memory/node Only on 4 CPUs, the auto selection failed. (since scalability on sparse cholesky is unstable on 4 CPUs.)
Numerical Results • Comparison with PCSDP • Sensor Network Problemgenerated by SFSDP • Multi Threading • Quantum Chemistry
SDPs from Sensor Network (time unit : second)
MPI + Multi Threading for Quantum Chemistry N.4P.DZ.pqgt11t2p(m=7230) second 64x speed-up on [16nodesx8threads]
Concluding Remarks & Future works • New parallel schemes for sparse Schur complement matrix • Reasonable Scalability • Extremely large-scale SDPs with sparse Schur complement matrix • Improvement on Multi-Threading for sparse Schur complement matrix