200 likes | 394 Views
Automatic Differentiation: Introduction. Automatic differentiation (AD) is a technology for transforming a subprogram that computes some function into a subprogram that computes the derivatives of that function
E N D
Automatic Differentiation: Introduction • Automatic differentiation (AD) is a technology for transforming a subprogram that computes some function into a subprogram that computes the derivatives of that function • Derivatives used in optimization, nonlinear solvers, sensitivity analysis, uncertainty quantification • Forward mode of AD is efficient for problems with few independent variables or Jacobian-vector products • Reverse mode of AD is efficient for problems with few dependent variables or JTv products • Efficiency of generated code depends on sophistication of underlying compiler analysis and combinatorial algorithms
AD: Current Capabilities • Fortran 77: ADIFOR 2.0/3.0 • Robust, mature tool with excellent language coverage • Excellent compiler analysis • Efficient forward mode (small number of independents) • Adequate reverse mode (small number of dependents) • C/C++: ADIC 2.0 • Semi-mature tool with full C language coverage • Sophisticated differentiation algorithms • Efficient forward mode • Fortran 90: OpenAD/F • New tool with partial language coverage • Sophisticated differentiation algorithms • Accurate and novel compiler analysis • Innovative templating mechanism • Efficient forward and reverse modes
AD: Application Highlight Sensitivity of flow through Drake passage to bottom topography, using MIT shallow water model
AD: Future Capabilities • C/C++: ADIC 2.x • Enhanced support for C++ (basic templating, operator overloading) • Fortran 90: OpenAD/F • Improved language coverage (user-defined types, pointers, etc.) • Both tools • New differentiation algorithms • New checkpointing mechanisms • Advanced compiler analysis • Efficient forward and reverse modes • Integration with CSCAPES coloring algorithms • Ease of use through integration with PETSc and Zoltan toolkits
Load Balancing: Introduction Goals: • Provide software and algorithms for load balancing (partitioning) that can easily be used by parallel applications. • Load balancing: distribute work evenly among processors while minimizing communication cost. Reduces parallel run time. • Static load balancing (often called “partitioning”) • Application computation and communication patterns do not change • Partition and distribute data once • Dynamic load balancing • In dynamic or adaptive applications, computation and communication change over time. • Load balancing should be invoked at certain intervals. • Try to reduce data migration (application data to move)
Load Balancing: Current Capabilities • Zoltan: Software toolkit for parallel data management and load balancing • Available at http://www.cs.sandia.gov/Zoltan • Collection of many load-balancing methods • Geometric: RCB, space filling curves • Graph and hypergraph partitioning • Data-structure neutral interface • Call-back functions • Single, common interface for many methods • Allows applications to “plug and play” • Portable, parallel code (MPI) • Used in many DOE and Sandia applications • Can run on thousands of processors
1 1 1 1 Rg02 Rg2 C02 C2 2 R R 2 L2 R2 C C 2 2 2 1 2 1 1 = INDUCTOR R Vs 1 SOURCE_VOLTAGE 1 1 2 Rl Cm012 Cm12 1 A x b 2 R 2 C C 2 Rs L1 R1 R 2 2 1 2 1 1 1 Cell Modeling 1 INDUCTOR R Linear solvers & preconditioners 1 Rg1 Rg01 C01 C1 R R 2 2 C C 2 2 Parallel electronics networks Particle methods Multiphysics simulations Crash simulations Adaptive mesh refinement Load Balancing: Applications • Large variety of applications, requirements, data structures.
Load Balancing: Future Capabilities • Scalable hypergraph partitioning • Hypergraphs accurately model communication volume • We aim to improve scalability to thousands of processors • 2d matrix partitioning • Reduce communication compared to standard 1d distribution • Multiconstraint partitioning • Multi-physics simulation • Complex objectives partitioning • E.g., simultaneously balance computation and memory • Parallel sparse matrix ordering (nested dissection)
Reordering Transformations: Introduction • Irregular memory access patterns make performance sensitive to data and iteration orders • Run-time reordering transformations schedule data accesses and iterations to maximize performance • Preliminary work on reordering heuristics shows that hypergraph models outperform graph models • Full sparse tiling: new inspector/executor strategy that exploits inter-iteration locality
RT: Current Capabilities • Open source package implementing several data and iteration reordering heuristics: Data_N_Comp_Reorder • Data reordering heuristics • Breadth first search (graph-based) • Consecutive packing • Partitioning (graph-based) • Breadth first search (hypergraph-based) • Consecutive packing (hypergraph-based) • Partitioning (hypergraph-based) • Iteration reordering heuristics • Breadth first search (hypergraph-based) • Lexicographical sorting and various approximations • Consecutive packing (hypergraph-based) • Partitioning (hypergraph-based) • Full sparse tiling implementation for model problems
RT: Application Highlight • Reordering for a mesh-quality improvement code (FeasNewt – T. Munson) • Hypergraph-BFS data reordering coupled with Cpack iteration reordering offers best performance • Reordering leads to performance within 90% of memory bandwidth limit for sparse matvec
RT: Future Capabilities • New hypergraph-based runtime reordering transformations • Comparison between hypergraph-based and bipartite graph-based runtime reordering transformations • Hypergraph partitioners for load balancing modified to work well for reordering transformations • Hierarchical full sparse tiling for hierarchical parallel systems
Graph Coloring and Matching: Introduction • Graph coloring deals with partitioning a set of binary-related objects into few groups of “independent” objects • Sparsity exploitation in computation of Jacobians and Hessians leads to a variety of graph coloring problems. Sources of problem variations: • Unsymmetric vs symmetric matrix • Direct vs substitution method • Uni- vs bi-directional partitioning • Matching deals with finding a “large” set of independent edges in a graph • Variant matching problems occur in load-balancing, process scheduling, linear solvers, preconditioners, etc. • Orthogonal sources of variation in matching problems: • Bipartite vs general graphs • Cardinality vs weighted problems
GCM: Current Capabilities • Coloring Serial: • Developed novel (greedy) algorithms for distance-1, distance-2, star and acyclic coloring problems. A package implementing these algorithms and corresponding variant ordering routines available. Parallel: • Developed a scheme for parallelizing greedy coloring algorithms on distributed-memory computers. MPI implementations of distance-1 and distance-2 coloring made available via Zoltan. • Matching • Algorithms that compute optimal solutions for matching problems are polynomial in time, but slow and difficult to parallelize. • High quality approximate solutions can be computed in (near) linear time. Approximation techniques make parallelization easier. • Developed fast approximation algorithms for several matching problems. • Efficient implementations of exact matching algorithms available.
GCM: Application Highlights • Coloring • Automatic differentiation (sparse Jacobians and Hessians) • Parallel computation (discovery of concurrency, data migration) • Frequency allocation • Register allocation in compilers, etc • Matching • Numerical preprocessing in sparse linear systems: • permute a matrix such that its diagonal or block diagonal are heavy. • Block triangular decomposition in sparse linear systems: • decompose a system of equations into smaller sets of systems. • Graph partitioning: • guide the coarsening phase of multilevel graph partitioning methods.
GCM: Future Capabilities • Develop and implement star and acyclic bicoloring algorithms for Jacobian computation • Develop parallel algorithms that scale to thousands of processors for the various coloring problems (distance-1, distance-2, star, acyclic) • Integrate coloring software with automatic differentiation tools • Develop petascale parallel matching algorithms based on approximation techniques