200 likes | 362 Views
A Large-Grained Parallel Algorithm for Nonlinear Eigenvalue Problems Using Complex Contour Integration. Takeshi Amako, Yusaku Yamamoto and Shao-Liang Zhang Dept. of Computational Science & Engineering Nagoya University, Japan. Outline of the talk. Introduction
E N D
A Large-Grained Parallel Algorithm for Nonlinear Eigenvalue Problems Using Complex Contour Integration Takeshi Amako, Yusaku Yamamoto and Shao-Liang Zhang Dept. of Computational Science & Engineering Nagoya University, Japan
Outline of the talk • Introduction • The nonlinear eigenvalue problem • Existing algorithms • Our objective • The algorithm • Formulation as a nonlinear equation • Application of Kravanja et al’s method • Detecting and removing spurious eigenvalues • Numerical results • Accuracy of the computed eigenvalues • Parallel performance • Conclusion
Introduction • The nonlinear eigenvalue problem • Given A(z)∈Cn×n,z: complex parameter • Find z1 ∈C such that A(z1) x = 0 has a nonzero solution x = x1. • z1 and x1 are called the eigenvalue and the corresponding eigenvector, respectively. • Examples • A(z) = A – zB+ z2C : quadratic eigenvalue problem • A(z) = A – zB+ ezC : general nonlinear eigenvalue problem • Applications • Electronic structure calculation • Nonlinear elasticity • Theoretical fluid dynamics
Existing algorithms difficult to obtain • Multivariate Newton’s method and its variants • Locally quadratic convergence • Requires good initial estimate both for z1 and x1. • Nonlinear Arnoldi methods • Nonlinear Jacobi-Davidson methods • Efficient for large sparse matrices • Not suitable for finding all eigenvalues within a specified region of the complex plane
Our objective • Let • G: closed Jordan curve on the complex plane, • A(z)∈Cn×n:analytical function of z in G. • We propose an algorithm that • can find all the eigenvalues within G, and • has large-grain parallelism. Im z Assumption: In the following, we mainly consider the case where G is a circle centered at the origin and with radius r. G Re z O r Related work: Sakurai et al. propose an algorithm for linear generalized eigenvalue problems
Our approach • The basic idea • Let f(z) = det(A(z)). • Then f(z) is an analytical function of z in G and the eigenvalues of A(z) are characterized as the zeros of f(z). • Use Kravanja’s method (Kravanja et al., 1999) to find the zeros of an analytic function.
Finding zeros of f(z) computable unknown • Let • z1, z2, ..., zm : zeros of f(z) in G, and • n1, n2, ..., nm : their multiplicity. Then f(z) can be written as • Define the complex moments by Then f(z) = ×g(z) analytical and nonzero in G analytical in G
Finding zeros of f(z) (cont'd) • To extract information on {zk} from {mp}, define the following matrices: • Then it is easy to see that
Finding zeros of f(z) (cont'd) • Noting that Vm and Dm are nonsingular, we have the following equivalence relation: • That is, we can find the zeros of f(z) in G by • computing the complex moments m0,m1 , ...,m2m-1, • constructing HmandHm<, and • computing the eigenvalues of Hm< – lHm. l is an eigenvalue of Hm< – lHm ⇔ l is an eigenvalue of Lm – lI ⇔ ∃k,l=zk
Application to the nonlinear eigenvalue problem Im z Re z O • In our case, f(z) = det(A(z)) and • By applying the trapezoidal rule with K points, we have where G
The algorithm The computationally intensive part. Large-grain parallelism
Detecting and removing spurious eigenvalues • Usually, we do not know m, the number of eigenvalues of A(z) in G, in advance and use some estimate M instead. • When M > m, the eigenvalues of Hm< – lHm include spurious solutions that do not correspond to an eigenvalue of A(z). • To detect them, we compute the corresponding eigenvector by inverse iteration and evaluate the relative residual defined by • Of course, this quantity can also be used to check the accuracy of the computed eigenvalues. relative residual =
Numerical results • Test problem • A(z) = A – zI + eB(z), where • A(z) : real random nonsymmetric matrix • B(z) : antidiagonal matrix with antidiagonal elements ez • e : parameter to specify the strength of nonlinearity • Parameters • n =500, 1000,2000 • e = 0, 10–4, 10–3, 10–2, 10–1 • Computational environment • Fujitsu HPC2500 (SPARC 64IV), 1-16 processors • Program written with C and MPI • LAPACK routines were used to compute (A(z))–1 and to compute the eigenvalues of Hm< – lHm.
Accuracy of the computed eigenvalues • Parameters • n = 500 and e = 0.1 • r = 0.85, K = 128 and M = 11. • There are 7 eigenvalues in G. • Results • Our algorithm succeeded in locating all the eigenvalues in G. • The relative residuals were all under 10–10. • Similar results for other cases. Im z Re z
Effect of K and M on the accuracy • Effect of the number of sample points K • Usually K=128 gives sufficient accuracy. • Effect of the Hankel matrix size M • It is better to take M a few more than the number of eigenvalues within G (7 in this case). • This is to mitigate the perturbation from eigenvalues outside G. K M Residuals as a function of K. Residuals as a function of M.
Detecting and removing spurious eigenvalues spurious eigenvalue • Parameters • n = 1000 and e = 0.01 • r = 0.7, K = 128 and M = 10. • There are 9 eigenvalues in G. • Eigenvalues of Hm< – lHm • 10 eigenvalues were found within G. • For 9 of the eigenvalues, the residual was less than 10–11. • For one eigenvalue, the residual was 10–2. Im z Re z
Parallel performance • Performance on Fujitsu HPC2500 • Matrix size: n =500, 1000,2000 • Number of processors: P = 1, 2, 4, 8, 16 Almost linear speedup was obtained in all cases due to large-grain parallelism. Execution time (sec) Number of processors
Parallel performance (cont'd) • Performance in a Grid environment • Matrix size: n =1000 • Machine: Intel Xeon Cluster • Master-worker type parallelization using OmniRPC (GridRPC) Good scalability was obtained for up to 14 processors. 2:00:00 16 Execution time Speedup 14 1:30:00 12 10 1:00:00 8 6 0:30:00 4 2 0:00:00 0 Number of processors 2 4 6 8 10 12 14
Summary of this study • We proposed a new algorithm for the nonlinear eigenvalue problem based on complex contour integration. • Our algorithm can find all the eigenvalues within a closed curve on the complex plane. Moreover, it has large-grain parallelism and is expected to show excellent parallel performance. • These advantages have been confirmed by numerical experiments.
Future work • Performance evaluation on large-scale grid environments. • Application to practical problems. • Computation of scaling exponent in theoretical fluid dynamics • Development of an efficient algorithm for computing