430 likes | 578 Views
Challenges and Opportunities in Using Automatic Differentiation with Object-Oriented Toolkits for Scientific Computing. Paul Hovland (Argonne National Laboratory) Steven Lee (Lawrence Livermore National Laboratory) Lois McInnes (ANL) Boyana Norris (ANL) Barry Smith (ANL).
E N D
Challenges and Opportunities in Using Automatic Differentiation with Object-Oriented Toolkits for Scientific Computing Paul Hovland (Argonne National Laboratory) Steven Lee (Lawrence Livermore National Laboratory) Lois McInnes (ANL) Boyana Norris (ANL) Barry Smith (ANL) The Computational Differentiation Project at Argonne National Laboratory
Acknowledgments • Jason Abate • Satish Balay • Steve Benson • Peter Brown • Omar Ghattas • Lisa Grignon • William Gropp • Alan Hindmarsh • David Keyes • Jorge Moré • Linda Petzold • Widodo Samyono
Outline • Intro to AD • Survey of Toolkits • SensPVODE • PETSc • TAO • Using AD with Toolkits • Toolkit Level • Parallel Function Level • Subdomain Level • Element/Vertex Function Level • Experimental Results • Conclusions and Expectations
Automatic Differentiation • Technique for augmenting code for computing a function with code for computing derivatives • Analytic differentiation of elementary operations/functions, propagation by chain rule • Can be implemented using source transformation or operator overloading • Two main modes • Forward: propagates derivatives from independent to dependent variables • Reverse (adjoint): propagates derivatives from dependent to independent variables
Comparison of Methods • Finite Differences • Advantages: cheap Jv, easy • Disadvantages: inaccurate (not robust) • Hand Differentiation • Advantages: accurate; cheap Jv, JTv, Hv, … • Disadvantages: hard; difficult to maintain consistency • Automatic Differentiation • Advantages: cheap JTv, Hv; easy? • Disadvantages: Jv costs ~2 function evals; hard?
PVODE: Parallel ODE-IVP solver • Algorithm developers: Hindmarsh, Byrne, Brown and Cohen • ODE Initial-Value Problems • Stiff and non-stiff integrators • Written in C • MPI calls for communication
ODE Initial-Value Problem (standard form): Implicit time-stepping using BDF methods for y(tn) Solve nonlinear system for y(tn)via Inexact Newton Solve update to Newton iterate using Krylov methods PVODE for ODE simulations
Objective F(y,p) p ODE Solver (PVODE) y|t=t1, t2, ... y|t=0 automatically Sensitivity Solver y|t=t1, t2, ... p dy/dp|t=t1, t2, ... y|t=0
Possible Approaches Apply AD to PVODE: ad_F(y,ad_y,p,ad_p) p,ad_p ad_PVODE y, ad_y|t=t1, t2, ... y, ad_y|t=0 Solve sensitivity eqns: ad_F(y,ad_y,p,ad_p) p SensPVODE y, dy/dp |t=t1, t2, ... y|t=0
Sensitivity Differential Equations • Differentiate y= f(t, y, p) with respect to pi: • A linearODE-IVP for the sensitivity vector si(t) :
PETSc • Portable, Extensible Toolkit for Scientific computing • Parallel • Object-oriented • Free • Supported (manuals, email) • Interfaces with Fortran 77/90, C/C++ • Available for virtually all UNIX platforms, as well as Windows 95/NT • Flexible: many options for solver algorithms and parameters
Nonlinear Solvers (SNES) Solve F(u) = 0 Linear Solvers (SLES) PETSc PC KSP Application Initialization Function Evaluation Jacobian Evaluation Post- Processing User code PETSc code AD-generated code Nonlinear PDE Solution Main Routine
TAO The process of nature by which all things change and which is to be followed for a life of harmony • Object-oriented techniques • Component-based (CCA) interaction • Leverage existing parallel computing infrastructure • Reuse of external toolkits The Right Way Toolkit for advanced optimization
TAO Goals • Portability • Performance • Scalable parallelism • An interface independent of architecture
TAO Algorithms • Unconstrained optimization • Limited-memory variable-metric method • Trust region/line search Newton method • Conjugate-gradient method • Levenberg-Marquardt method • Bound-constrained optimization • Trust region Newton method • Gradient projection/conjugate gradient method • Linearly-constrained optimization • Interior-point method with iterative solvers
PETSc and TAO Application Driver Optimization Tools Toolkit for Advanced Optimization (TAO) Linear Solvers Matrices PC KSP Vectors Application Initialization Function & Gradient Evaluation Hessian Evaluation Post- Processing PETSc code User code TAO code
Using AD with Toolkits • Apply AD to toolkit to produce derivative-enhanced toolkit • Use AD to provide Jacobian/Hessian/gradient for use by toolkit. Apply AD at • Parallel Function Level • Subdomain Function Level • Element/Vertex Function Level
Differentiated Version of Toolkit • Makes possible sensitivity analysis, black-box optimization of models constructed using toolkit • Can take advantage of high-level structure of algorithms, providing better performance: see Andreas’ and Linda’s talks • Ongoing work with PETSc and PVODE
Levels of Function Evaluation int FormFunction(SNES snes,Vec X,Vec F,void *ptr){ Parallel Function Level /* Variable declarations omitted */ mx = user->mx; my = user->my; lambda = user->param; Subdomain Function Level hx = one/(double)(mx-1); hy = one/(double)(my-1); sc = hx*hy*lambda; hxdhy = hx/hy; hydhx = hy/hx; ierr = DAGlobalToLocalBegin(user->da,X,INSERT_VALUES,localX);CHKERRQ(ierr); ierr = DAGlobalToLocalEnd(user->da,X,INSERT_VALUES,localX);CHKERRQ(ierr); ierr = VecGetArray(localX,&x);CHKERRQ(ierr); ierr = VecGetArray(localF,&f);CHKERRQ(ierr); ierr = DAGetCorners(user->da,&xs,&ys,PETSC_NULL,&xm,&ym,PETSC_NULL);CHKERRQ(ierr); ierr = DAGetGhostCorners(user->da,&gxs,&gys,PETSC_NULL,&gxm,&gym,PETSC_NULL);CHKERRQ(ierr); for (j=ys; j<ys+ym; j++) { row = (j - gys)*gxm + xs - gxs - 1; for (i=xs; i<xs+xm; i++) { row++; if (i == 0 || j == 0 || i == mx-1 || j == my-1) {f[row] = x[row]; continue;} Vertex/Element Function Level u = x[row]; uxx = (two*u - x[row-1] - x[row+1])*hydhx; uyy = (two*u - x[row-gxm] - x[row+gxm])*hxdhy; f[row] = uxx + uyy - sc*PetscExpScalar(u); } } ierr = VecRestoreArray(localX,&x);CHKERRQ(ierr); ierr = VecRestoreArray(localF,&f);CHKERRQ(ierr); ierr = DALocalToGlobal(user->da,localF,INSERT_VALUES,F);CHKERRQ(ierr); ierr = PLogFlops(11*ym*xm);CHKERRQ(ierr); return 0; }
Parallel Function Level • Advantages • Well-defined interface:int Function(SNES, Vec, Vec, void *);void function(integer, Real, N_Vector, N_Vector, void *); • No changes to function • Disadvantages • Differentiation of toolkit support functions (may result in unnecessary work) • AD of parallel code (MPI, OpenMP) • May need global coloring
Subdomain Function Level • Advantages • No need to differentiate communication functions • Interface may be well defined • Disadvantages • May need local coloring • May need to extract from parallel function
Using AD with PETSc Global-to-local scatter of ghost values Local Function computation Local Function computation Parallel function assembly Script file Global-to-local scatter of ghost values ADIFOR or ADIC Coded manually; can be automated Seed matrix initialization Local Jacobian computation Local Jacobian computation Parallel Jacobian assembly
Global-to-local scatter of ghost values Local Function computation Parallel function assembly Using AD with TAO Local Function computation Script file Global-to-local scatter of ghost values ADIFOR or ADIC Coded manually; can be automated Seed matrix initialization Local gradient computation Local gradient computation Parallel gradient assembly
Element/Vertex Function Level • Advantages • Reduced memory requirements • No need for matrix coloring • Disadvantages • May be difficult to decompose function to this level (boundary conditions, other special cases) • Decomposition to this level may impede efficiency
Example int localfunction2d(Field **x,Field **f,int xs, int xm, int ys, int ym, int mx,int my, void *ptr) { xints = xs; xinte = xs+xm; yints = ys; yinte = ys+ym; if (yints == 0) { j = 0; yints = yints + 1; for (i=xs; i<xs+xm; i++) { f[j][i].u = x[j][i].u; f[j][i].v = x[j][i].v; f[j][i].omega = x[j][i].omega + (x[j+1][i].u - x[j][i].u)*dhy; f[j][i].temp = x[j][i].temp-x[j+1][i].temp; } } if (yinte == my) { j = my - 1; yinte = yinte - 1; for (i=xs; i<xs+xm; i++) { f[j][i].u = x[j][i].u - lid; f[j][i].v = x[j][i].v; f[j][i].omega = x[j][i].omega + (x[j][i].u - x[j-1][i].u)*dhy; f[j][i].temp = x[j][i].temp-x[j-1][i].temp; } } if (xints == 0) { i = 0; xints = xints + 1; for (j=ys; j<ys+ym; j++) { f[j][i].u = x[j][i].u; f[j][i].v = x[j][i].v; f[j][i].omega = x[j][i].omega - (x[j][i+1].v - x[j][i].v)*dhx; f[j][i].temp = x[j][i].temp; } } if (xinte == mx) { i = mx - 1; xinte = xinte - 1; for (j=ys; j<ys+ym; j++) { f[j][i].u = x[j][i].u; f[j][i].v = x[j][i].v; f[j][i].omega = x[j][i].omega - (x[j][i].v - x[j][i-1].v)*dhx; f[j][i].temp = x[j][i].temp - (double)(grashof>0); } } for (j=yints; j<yinte; j++) { for (i=xints; i<xinte; i++) { vx = x[j][i].u; avx = PetscAbsScalar(vx); vxp = p5*(vx+avx); vxm = p5*(vx-avx); vy = x[j][i].v; avy = PetscAbsScalar(vy); vyp = p5*(vy+avy); vym = p5*(vy-avy); u = x[j][i].u; uxx = (two*u - x[j][i-1].u - x[j][i+1].u)*hydhx; uyy = (two*u - x[j-1][i].u - x[j+1][i].u)*hxdhy; f[j][i].u = uxx + uyy - p5*(x[j+1][i].omega-x[j-1][i].omega)*hx; u = x[j][i].v; uxx = (two*u - x[j][i-1].v - x[j][i+1].v)*hydhx; uyy = (two*u - x[j-1][i].v - x[j+1][i].v)*hxdhy; f[j][i].v = uxx + uyy + p5*(x[j][i+1].omega-x[j][i-1].omega)*hy; u = x[j][i].omega; uxx = (two*u - x[j][i-1].omega - x[j][i+1].omega)*hydhx; uyy = (two*u - x[j-1][i].omega - x[j+1][i].omega)*hxdhy; f[j][i].omega = uxx + uyy + (vxp*(u - x[j][i-1].omega) + vxm*(x[j][i+1].omega - u)) * hy +(vyp*(u - x[j-1][i].omega) + vym*(x[j+1][i].omega - u)) * hx -p5 * grashof * (x[j][i+1].temp - x[j][i-1].temp) * hy; u = x[j][i].temp; uxx = (two*u - x[j][i-1].temp - x[j][i+1].temp)*hydhx; uyy = (two*u - x[j-1][i].temp - x[j+1][i].temp)*hxdhy; f[j][i].temp = uxx + uyy + prandtl * ((vxp*(u - x[j][i-1].temp) + vxm*(x[j][i+1].temp - u)) * hy + (vyp*(u - x[j-1][i].temp) + vym*(x[j+1][i].temp - u)) * hx); } } ierr = PetscLogFlops(84*ym*xm);CHKERRQ(ierr); return 0; }
Experimental Results • Toolkit Level – Differentiated PETSc Linear Solver • Parallel Nonlinear Function Level – SensPVODE • Local Subdomain Function Level • PETSc • TAO • Element Function Level – PETSc
SensPVODE: Problem • Diurnl kinetics advection-diffusion equation • 100x100 structured grid • 16 processors of a Linux cluster with 550 MHz processors and Myrinet interconnect
PETSc Applications • Toy problems • Solid fuel ignition: finite difference discretization; Fortran & C variants; differentiated using ADIFOR, ADIC • Driven cavity: finite difference discretization; C implementation; differentiated using ADIC • Euler code • Based on legacy F77code from D. Whitfield (MSU) • Finite volume discretization • Up to 1,121,320 unknowns • Mapped C-H grid • Fully implicit steady-state • Tools: SNES, DA, ADIFOR
For More Information • PETSc: http://www.mcs.anl.gov/petsc/ • TAO: http://www.mcs.anl.gov/tao/ • Automatic Differentiation at Argonne • http://www.mcs.anl.gov/autodiff/ • ADIFOR: http://www.mcs.anl.gov/adifor/ • ADIC: http://www.mcs.anl.gov/adic/ • http://www.autodiff.org
10 challenges for PDE optimization algorithms • Problem size • Efficiency vs. intrusiveness • “Physics-based” globalizations • Inexact PDE solvers • Approximate Jacobians • Implicitly-defined PDE residuals • Non-smooth solutions • Pointwise inequality constraints • Scalability of adjoint methods to large numbers of inequalities • Time-dependent PDE optimization
7. Non-smoothness • PDE residual may not depend smoothly on state variables (maybe not even continuously) • Solution-adaptivity • Discontinuity-capturing, front-tracking • Subgrid-scale models • Material property evaluation • Contact problems • Elasto(visco)plasticity • PDE residual may not depend smoothly or continuously on decision variables • Solid modeler- and mesh generator-induced