1 / 43

Paul Hovland (Argonne National Laboratory) Steven Lee (Lawrence Livermore National Laboratory)

Challenges and Opportunities in Using Automatic Differentiation with Object-Oriented Toolkits for Scientific Computing. Paul Hovland (Argonne National Laboratory) Steven Lee (Lawrence Livermore National Laboratory) Lois McInnes (ANL) Boyana Norris (ANL) Barry Smith (ANL).

kellison
Download Presentation

Paul Hovland (Argonne National Laboratory) Steven Lee (Lawrence Livermore National Laboratory)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Challenges and Opportunities in Using Automatic Differentiation with Object-Oriented Toolkits for Scientific Computing Paul Hovland (Argonne National Laboratory) Steven Lee (Lawrence Livermore National Laboratory) Lois McInnes (ANL) Boyana Norris (ANL) Barry Smith (ANL) The Computational Differentiation Project at Argonne National Laboratory

  2. Acknowledgments • Jason Abate • Satish Balay • Steve Benson • Peter Brown • Omar Ghattas • Lisa Grignon • William Gropp • Alan Hindmarsh • David Keyes • Jorge Moré • Linda Petzold • Widodo Samyono

  3. Outline • Intro to AD • Survey of Toolkits • SensPVODE • PETSc • TAO • Using AD with Toolkits • Toolkit Level • Parallel Function Level • Subdomain Level • Element/Vertex Function Level • Experimental Results • Conclusions and Expectations

  4. Automatic Differentiation • Technique for augmenting code for computing a function with code for computing derivatives • Analytic differentiation of elementary operations/functions, propagation by chain rule • Can be implemented using source transformation or operator overloading • Two main modes • Forward: propagates derivatives from independent to dependent variables • Reverse (adjoint): propagates derivatives from dependent to independent variables

  5. Comparison of Methods • Finite Differences • Advantages: cheap Jv, easy • Disadvantages: inaccurate (not robust) • Hand Differentiation • Advantages: accurate; cheap Jv, JTv, Hv, … • Disadvantages: hard; difficult to maintain consistency • Automatic Differentiation • Advantages: cheap JTv, Hv; easy? • Disadvantages: Jv costs ~2 function evals; hard?

  6. PVODE: Parallel ODE-IVP solver • Algorithm developers: Hindmarsh, Byrne, Brown and Cohen • ODE Initial-Value Problems • Stiff and non-stiff integrators • Written in C • MPI calls for communication

  7. ODE Initial-Value Problem (standard form): Implicit time-stepping using BDF methods for y(tn) Solve nonlinear system for y(tn)via Inexact Newton Solve update to Newton iterate using Krylov methods PVODE for ODE simulations

  8. Objective F(y,p) p ODE Solver (PVODE) y|t=t1, t2, ... y|t=0 automatically Sensitivity Solver y|t=t1, t2, ... p dy/dp|t=t1, t2, ... y|t=0

  9. Possible Approaches Apply AD to PVODE: ad_F(y,ad_y,p,ad_p) p,ad_p ad_PVODE y, ad_y|t=t1, t2, ... y, ad_y|t=0 Solve sensitivity eqns: ad_F(y,ad_y,p,ad_p) p SensPVODE y, dy/dp |t=t1, t2, ... y|t=0

  10. Sensitivity Differential Equations • Differentiate y= f(t, y, p) with respect to pi: • A linearODE-IVP for the sensitivity vector si(t) :

  11. PETSc • Portable, Extensible Toolkit for Scientific computing • Parallel • Object-oriented • Free • Supported (manuals, email) • Interfaces with Fortran 77/90, C/C++ • Available for virtually all UNIX platforms, as well as Windows 95/NT • Flexible: many options for solver algorithms and parameters

  12. Nonlinear Solvers (SNES) Solve F(u) = 0 Linear Solvers (SLES) PETSc PC KSP Application Initialization Function Evaluation Jacobian Evaluation Post- Processing User code PETSc code AD-generated code Nonlinear PDE Solution Main Routine

  13. TAO The process of nature by which all things change and which is to be followed for a life of harmony • Object-oriented techniques • Component-based (CCA) interaction • Leverage existing parallel computing infrastructure • Reuse of external toolkits The Right Way Toolkit for advanced optimization

  14. TAO Goals • Portability • Performance • Scalable parallelism • An interface independent of architecture

  15. TAO Algorithms • Unconstrained optimization • Limited-memory variable-metric method • Trust region/line search Newton method • Conjugate-gradient method • Levenberg-Marquardt method • Bound-constrained optimization • Trust region Newton method • Gradient projection/conjugate gradient method • Linearly-constrained optimization • Interior-point method with iterative solvers

  16. PETSc and TAO Application Driver Optimization Tools Toolkit for Advanced Optimization (TAO) Linear Solvers Matrices PC KSP Vectors Application Initialization Function & Gradient Evaluation Hessian Evaluation Post- Processing PETSc code User code TAO code

  17. Using AD with Toolkits • Apply AD to toolkit to produce derivative-enhanced toolkit • Use AD to provide Jacobian/Hessian/gradient for use by toolkit. Apply AD at • Parallel Function Level • Subdomain Function Level • Element/Vertex Function Level

  18. Differentiated Version of Toolkit • Makes possible sensitivity analysis, black-box optimization of models constructed using toolkit • Can take advantage of high-level structure of algorithms, providing better performance: see Andreas’ and Linda’s talks • Ongoing work with PETSc and PVODE

  19. Levels of Function Evaluation int FormFunction(SNES snes,Vec X,Vec F,void *ptr){ Parallel Function Level /* Variable declarations omitted */ mx = user->mx; my = user->my; lambda = user->param; Subdomain Function Level hx = one/(double)(mx-1); hy = one/(double)(my-1); sc = hx*hy*lambda; hxdhy = hx/hy; hydhx = hy/hx; ierr = DAGlobalToLocalBegin(user->da,X,INSERT_VALUES,localX);CHKERRQ(ierr); ierr = DAGlobalToLocalEnd(user->da,X,INSERT_VALUES,localX);CHKERRQ(ierr); ierr = VecGetArray(localX,&x);CHKERRQ(ierr); ierr = VecGetArray(localF,&f);CHKERRQ(ierr); ierr = DAGetCorners(user->da,&xs,&ys,PETSC_NULL,&xm,&ym,PETSC_NULL);CHKERRQ(ierr); ierr = DAGetGhostCorners(user->da,&gxs,&gys,PETSC_NULL,&gxm,&gym,PETSC_NULL);CHKERRQ(ierr); for (j=ys; j<ys+ym; j++) { row = (j - gys)*gxm + xs - gxs - 1; for (i=xs; i<xs+xm; i++) { row++; if (i == 0 || j == 0 || i == mx-1 || j == my-1) {f[row] = x[row]; continue;} Vertex/Element Function Level u = x[row]; uxx = (two*u - x[row-1] - x[row+1])*hydhx; uyy = (two*u - x[row-gxm] - x[row+gxm])*hxdhy; f[row] = uxx + uyy - sc*PetscExpScalar(u); } } ierr = VecRestoreArray(localX,&x);CHKERRQ(ierr); ierr = VecRestoreArray(localF,&f);CHKERRQ(ierr); ierr = DALocalToGlobal(user->da,localF,INSERT_VALUES,F);CHKERRQ(ierr); ierr = PLogFlops(11*ym*xm);CHKERRQ(ierr); return 0; }

  20. Parallel Function Level • Advantages • Well-defined interface:int Function(SNES, Vec, Vec, void *);void function(integer, Real, N_Vector, N_Vector, void *); • No changes to function • Disadvantages • Differentiation of toolkit support functions (may result in unnecessary work) • AD of parallel code (MPI, OpenMP) • May need global coloring

  21. Subdomain Function Level • Advantages • No need to differentiate communication functions • Interface may be well defined • Disadvantages • May need local coloring • May need to extract from parallel function

  22. Using AD with PETSc Global-to-local scatter of ghost values Local Function computation Local Function computation Parallel function assembly Script file Global-to-local scatter of ghost values ADIFOR or ADIC Coded manually; can be automated Seed matrix initialization Local Jacobian computation Local Jacobian computation Parallel Jacobian assembly

  23. Global-to-local scatter of ghost values Local Function computation Parallel function assembly Using AD with TAO Local Function computation Script file Global-to-local scatter of ghost values ADIFOR or ADIC Coded manually; can be automated Seed matrix initialization Local gradient computation Local gradient computation Parallel gradient assembly

  24. Element/Vertex Function Level • Advantages • Reduced memory requirements • No need for matrix coloring • Disadvantages • May be difficult to decompose function to this level (boundary conditions, other special cases) • Decomposition to this level may impede efficiency

  25. Example int localfunction2d(Field **x,Field **f,int xs, int xm, int ys, int ym, int mx,int my, void *ptr) { xints = xs; xinte = xs+xm; yints = ys; yinte = ys+ym; if (yints == 0) { j = 0; yints = yints + 1; for (i=xs; i<xs+xm; i++) { f[j][i].u = x[j][i].u; f[j][i].v = x[j][i].v; f[j][i].omega = x[j][i].omega + (x[j+1][i].u - x[j][i].u)*dhy; f[j][i].temp = x[j][i].temp-x[j+1][i].temp; } } if (yinte == my) { j = my - 1; yinte = yinte - 1; for (i=xs; i<xs+xm; i++) { f[j][i].u = x[j][i].u - lid; f[j][i].v = x[j][i].v; f[j][i].omega = x[j][i].omega + (x[j][i].u - x[j-1][i].u)*dhy; f[j][i].temp = x[j][i].temp-x[j-1][i].temp; } } if (xints == 0) { i = 0; xints = xints + 1; for (j=ys; j<ys+ym; j++) { f[j][i].u = x[j][i].u; f[j][i].v = x[j][i].v; f[j][i].omega = x[j][i].omega - (x[j][i+1].v - x[j][i].v)*dhx; f[j][i].temp = x[j][i].temp; } } if (xinte == mx) { i = mx - 1; xinte = xinte - 1; for (j=ys; j<ys+ym; j++) { f[j][i].u = x[j][i].u; f[j][i].v = x[j][i].v; f[j][i].omega = x[j][i].omega - (x[j][i].v - x[j][i-1].v)*dhx; f[j][i].temp = x[j][i].temp - (double)(grashof>0); } } for (j=yints; j<yinte; j++) { for (i=xints; i<xinte; i++) { vx = x[j][i].u; avx = PetscAbsScalar(vx); vxp = p5*(vx+avx); vxm = p5*(vx-avx); vy = x[j][i].v; avy = PetscAbsScalar(vy); vyp = p5*(vy+avy); vym = p5*(vy-avy); u = x[j][i].u; uxx = (two*u - x[j][i-1].u - x[j][i+1].u)*hydhx; uyy = (two*u - x[j-1][i].u - x[j+1][i].u)*hxdhy; f[j][i].u = uxx + uyy - p5*(x[j+1][i].omega-x[j-1][i].omega)*hx; u = x[j][i].v; uxx = (two*u - x[j][i-1].v - x[j][i+1].v)*hydhx; uyy = (two*u - x[j-1][i].v - x[j+1][i].v)*hxdhy; f[j][i].v = uxx + uyy + p5*(x[j][i+1].omega-x[j][i-1].omega)*hy; u = x[j][i].omega; uxx = (two*u - x[j][i-1].omega - x[j][i+1].omega)*hydhx; uyy = (two*u - x[j-1][i].omega - x[j+1][i].omega)*hxdhy; f[j][i].omega = uxx + uyy + (vxp*(u - x[j][i-1].omega) + vxm*(x[j][i+1].omega - u)) * hy +(vyp*(u - x[j-1][i].omega) + vym*(x[j+1][i].omega - u)) * hx -p5 * grashof * (x[j][i+1].temp - x[j][i-1].temp) * hy; u = x[j][i].temp; uxx = (two*u - x[j][i-1].temp - x[j][i+1].temp)*hydhx; uyy = (two*u - x[j-1][i].temp - x[j+1][i].temp)*hxdhy; f[j][i].temp = uxx + uyy + prandtl * ((vxp*(u - x[j][i-1].temp) + vxm*(x[j][i+1].temp - u)) * hy + (vyp*(u - x[j-1][i].temp) + vym*(x[j+1][i].temp - u)) * hx); } } ierr = PetscLogFlops(84*ym*xm);CHKERRQ(ierr); return 0; }

  26. Experimental Results • Toolkit Level – Differentiated PETSc Linear Solver • Parallel Nonlinear Function Level – SensPVODE • Local Subdomain Function Level • PETSc • TAO • Element Function Level – PETSc

  27. Differentiated Linear Equation Solver

  28. Increased Accuracy

  29. Increased Accuracy

  30. SensPVODE: Problem • Diurnl kinetics advection-diffusion equation • 100x100 structured grid • 16 processors of a Linux cluster with 550 MHz processors and Myrinet interconnect

  31. SensPVODE: Time

  32. SensPVODE: Number of Timesteps

  33. SensPVODE: Time/Timestep

  34. PETSc Applications • Toy problems • Solid fuel ignition: finite difference discretization; Fortran & C variants; differentiated using ADIFOR, ADIC • Driven cavity: finite difference discretization; C implementation; differentiated using ADIC • Euler code • Based on legacy F77code from D. Whitfield (MSU) • Finite volume discretization • Up to 1,121,320 unknowns • Mapped C-H grid • Fully implicit steady-state • Tools: SNES, DA, ADIFOR

  35. C-H Structured Grid

  36. Algorithmic Performance

  37. Real Performance

  38. Hybrid Method

  39. Hybrid Method (cont.)

  40. TAO: Preliminary Results

  41. For More Information • PETSc: http://www.mcs.anl.gov/petsc/ • TAO: http://www.mcs.anl.gov/tao/ • Automatic Differentiation at Argonne • http://www.mcs.anl.gov/autodiff/ • ADIFOR: http://www.mcs.anl.gov/adifor/ • ADIC: http://www.mcs.anl.gov/adic/ • http://www.autodiff.org

  42. 10 challenges for PDE optimization algorithms • Problem size • Efficiency vs. intrusiveness • “Physics-based” globalizations • Inexact PDE solvers • Approximate Jacobians • Implicitly-defined PDE residuals • Non-smooth solutions • Pointwise inequality constraints • Scalability of adjoint methods to large numbers of inequalities • Time-dependent PDE optimization

  43. 7. Non-smoothness • PDE residual may not depend smoothly on state variables (maybe not even continuously) • Solution-adaptivity • Discontinuity-capturing, front-tracking • Subgrid-scale models • Material property evaluation • Contact problems • Elasto(visco)plasticity • PDE residual may not depend smoothly or continuously on decision variables • Solid modeler- and mesh generator-induced

More Related