590 likes | 941 Views
Towards Optimal Solvers for PDE-Constrained Optimization. Volkan Akcelik Omar Ghattas Carnegie Mellon University George Biros New York University. TOPS Winter Meeting Livermore, CA January 25-26, 2002. Simulation vs. Optimization. PDE model Simulation (forward) problem
E N D
Towards Optimal Solvers for PDE-Constrained Optimization Volkan Akcelik Omar Ghattas Carnegie Mellon University George Biros New York University TOPS Winter Meeting Livermore, CA January 25-26, 2002
Simulation vs. Optimization • PDE model • Simulation (forward) problem • Given “data” x (e.g. material coefficients, domain or boundary sources, geometry), find state variables u (velocity, stress, temperature, magnetic or electric field, displacement, etc.) • Optimization (inverse) problem • Given desired goal involving u, find x • Optimal design, optimal control, or parameter estimation
Examples • Optimal design • Shape optimization of viscous flow system • Nd= O(1) • Optimal control • Boundary flow control • Nd = O(Nu2/3) • Parameter estimation • Heterogeneous inverse wave propagation • Nd = O(Nu)
Optimal Design: Artificial Heart Shape Optimization 1 Goal: Find pump geometry that minimizes blood damage (J. Antaki & G. Burgreen, Univ Pittsburgh Medical Center) G. Burgreen
Optimal Design: Artificial Heart Shape Optimization 2 Burgreen & Antaki, 1996
Optimal Control:Boundary Flow Control 1 MHD flow control using boundary magnetic field
Optimal Control: Boundary Flow Control 2 Goal: find boundary suction/injection that minimizes energy dissipation in a viscous incompressible flow
Optimal Control: Viscous Boundary Flow Control 2 (G. Biros, 2000)
Optimal Control: Viscous Boundary Flow Control 3 No control Optimal control
Parameter Estimation: Seismic Inversion 1 • Forward Problem: Given soil material and earthquake source parameters, find earthquake ground motion. • Inverse Problem: Given earthquake observations, estimate material and source parameters. • Part of Quake KDI Project at CMU (J. Bielak, D. O’Hallaron), Berkeley (J. Shewchuk), and SDSU (S. Day, H. Magistrale)
Parameter Estimation: Seismic Inversion 3 Simulation of 1994 Northridge Earthquake
Parameter Estimation: Seismic Inversion 4 Surface Visualization
states controls objective function state equations (PDEs) Lagrange-Newton-Krylov-Schur method • Consider discrete form of problem:
Optimality Conditions • Lagrangian function: • First order optimality conditions: • Define:
Newton’s Method for Optimality Conditions (SQP, Newton-Lagrange) • A Newton step: (KKT system) • Two possibilities for solution: • Full space method: solve KKT system with a Krylov method—need good preconditioner • Reduced space method: eliminate state variables and Lagrange multipliers
Quasi-Newton Reduced SQP • LU factorization of (permuted) KKT matrix • Problems: • Reduced Hessian Wz needs m (number of decision variables) PDE solves per optimization iteration • 2nd derivatives needed • Quasi-Newton RSQP: • Replace reduced Hessian by quasi-Newton approximation • Discard other 2nd derivative terms • Properties • Just 2 PDE solves required per optimization iteration • 2-step superlinear convergence
~ ~ ~ Approximate QN-RSQP Preconditioner • Use Schur-based approximate LU factorization as KKT preconditioner • Since preconditioner, can replace PDE solves by actions of PDE preconditioner • Properties: • No PDE solves per iteration (2 preconditioner applications required per inner Krylov iteration) • Newton convergence for outer iteration • Inspiration: • Schur complement domain decomposition preconditioners (Keyes & Gropp, 1987)
Effectiveness of Schur Preconditioner • Preconditioned KKT system (w/exact A) • Schur preconditioner results in: • Reduced condition number • Clustering of eigenvalues • But not a normal matrix—convergence rate bounds depend on condition number of eigenvector matrix • In practice, works very well
How to Precondition Reduced Hessian? • Requirements • Cannot compute reduced Hessian, only its (approximate) action on a vector • Possibilities • Quasi-Newton formula (limited memory for large m) • Fixed small number of stationary iterations • Sparse approximate inverse preconditioner (SPAI) • Discrete approximation of underlying continuous Wz • Sherman-Morrison-Woodbury, if reduced Hessian is of form “compact operator + SPD”
2-step Stationary Iterations as Reduced Hessian Preconditioner • Properties • Same complexity as CG but a constant preconditioner • Needs only Matvecs (here with approximate reduced Hessian) • Needs estimates of extremal eigenvalues • Use to initialize L-BFGS preconditioner (effectively H0)
Inexact Solution of KKT System • Solve KKT system iteratively to tolerance dependent on h x Lagrangian gradient • Eisenstat & Walker inexact Newton theory applies for reduction of norm of Lagrangian gradient • But will it satisfy merit function criteria? • Can show that in vicinity of local minimum (for augmented Lagrangian merit function): • If h small enough, Armijo condition satisfied with unit steplengths • If h is O(Lagrangian gradient), quadratic convergence preserved
Putting it all together: Lagrange-Newton-Krylov-Schur Method • Continuation loop • Optimization iteration (Lagrange-Newton) • Estimate extremal eigenvalues of (approximate) reduced Hessian using Lanczos (retreat to QN-RSQP if negative) • Inexact KKT solution via symmetric QMR (Krylov) • Quasi-Newton RSQP preconditioner (Schur) • 2-step stationary iterations+ L-BFGS approximation of inverse reduced Hessian • PDE solve replaced by PDE preconditioner • Backtracking line search on augmented Lagrangian or l1 merit function • If no sufficient descent use QN-RSQP • Compute derivatives, objective function, and residuals • Update solution, tighten tolerances
Application: Optimal Boundary Control of Viscous Flow Goal: find boundary suction/injection that minimizes energy dissipation in a viscous incompressible flow
Fixed-size efficiency (n=120,000) algorithmic parallel overall Veltisto + PETSc implementation
Mesh independence of Krylov iteration with exact PDE solves (implies reduced Hessian preconditioner is effective) 4x cost of PDE solve Moderate growth of Krylov iterations with approximate PDE solves (implies PDE precond is moderately effective) “textbook” Newton mesh independence Isogranular algorithmic efficiency
Veltisto: A Library for PDE-Constrained Optimization • PETSc extension • Object oriented • Nonlinearly constrained optimization • Targets steady PDE constraints • Supports domain-based parallelism • Matrix-free implementation • In use at Sandia-Albuquerque
Veltisto Features • Optimizers: LNKS, LNKS-IP, QN-RSQP, N-RSQP • Preconditioners: LM-BFGS, SR1, Broyden, 2-step stationary • Merit functions: augmented Lagrangian, L1, L1+second order correction • Symmetric QMR (allows indefinite preconditioning)
Conclusions • Schur-complementing the optimality system permits capitalizing on investment in PDE solvers • Lagrange-Newton-Krylov-Schur methods can exhibit mesh-independent outer iterations for mesh-based control spaces • Inner iteration potentially mesh independent • Depends on effective PDE and reduced Hessian preconditioners • Very good scalability • >10x speedup over quasi-Newton • Optimization in small multiple of simulation cost
Future Work • Approximate block elimination preconditioners • Other reduced Hessian preconditioners • Direct preconditioning of KKT system • DD preconditioners directly on KKT system • Multigrid on KKT system (Barry) • Multigrid preconditioner on block diagonal approximation of KKT system • Reformulation of KKT system to make it SPD • FOSLS? • Augmented Lagrangian? • Time dependence and inequalities remain a challenge
Summary: first order optimality conditions (Karush-Kuhn-Tucker)
Schur complement method • Schur complement is self adjoint & positive definite near minimum • Action of Schur complement on q requires 1 forward & 1 adjoint wave prop
Gauss-Newton-Schur method • GN Schur complement is self adjoint & positive definite everywhere • Quadratic convergence for good fit problems, linear otherwise • Action of Schur complement on q requires 1 forward & 1 adjoint wave prop
Spectrum of GN Schur complement (just least squares term; no regularization) 400 eigenvalues smooth eigenvectors rough eigenvectors first 12 eigenvalues
W-(Gauss)-Newton-Schur-CG Solver • Multiscale continuation over u, l, q grids and source frequency • (Gauss)-Newton nonlinear iteration • Conjugate gradient solution of Schur-decomposed linear system • Currently no preconditioner
2D/3D Acoustic Wave Equation Example • Piecewise bi(tri)linear approximation of pressure u, adjoint l, and square of wave speed q • Explicit time integration • Up to 512x512 and 64x64x64 grids, 2000 time steps • 50 receivers, 1 harmonic source • Homogeneous initial guess • PETSc (www.petsc.anl.gov) implementation • Performance (Tikhonov regularization) • 6 hours on 64 processors of T3E-900 at PSC • Number of inner conjugate gradient iterations 10~20 • Number of outer Newton iterations 10~20
Effect of different regularization weights(Tikhonov regularization)
single level solution Multilevel vs. single level continuation(Tikhononv regularization) multilevel solution