Engineering Optimization

Concepts and Applications Engineering Optimization • Fred van Keulen • Matthijs Langelaar • CLA H21.1 • A.vanKeulen@tudelft.nl

Contents • Unconstrained Optimization:Methods for Multiple Variables • 1st order methods: CG • 2nd order methods • Quasi-Newton Methods • Constrained Optimization: Optimality Criteria

Multivariate Unconstrained Optimization Algorithms • Zeroth order (direct search): • Random methods: random jumping / walk / S.A. • Cyclic coordinate search • Powell’s Conjugate Directions method • Nelder and Mead Simplex method • Biologically inspired algorithms (GA, swarms, …) • Conclusions: • Nelder-Mead usually best • Inefficient for N>10

Conjugate directions: guaranteed convergence in N steps for quadratic problems(recall Powell: N cycles of N line searches) Quadratic function: Fletcher-Reeves conjugate gradient method • Based on building set of conjugate directions, combined with line searches • Conjugate directions:

Next search direction: Repeat 3 Restart every (n+1) steps, using step 2 CG practical • Start with abritrary x1 • Set first search direction: • Line search to find next point:

Slower convergence; > N steps • After N steps / bad convergence: restart procedure etc. CG properties • Theoretically converges in N steps or less for quadratic functions • In practice: • Non-quadratic functions • Finite line search accuracy • Round-off errors

Equilibrium: • CG: Line search: Application to mechanics (FE) • Structural mechanics:Quadratic function! • Simple operations on element level. Attractive for large N!

y2 y1 Multivariate Unconstrained Optimization Algorithms (2) • First order methods (descent methods): • Steepest descent method (with line search) • Fletcher-Reeves Conjugate Gradient method • Quasi-Newton methods • Conclusions (for now): • Scaling important for Steepest descent (zig-zag) • For quadratic problem, CG converges in N steps

Unconstrained optimization algorithms • Single-variable methods • Multiple variable methods • 0th order • 1st order • 2nd order

Local approximation: 2nd order Taylor series: • First order sufficiency condition applied to approximation: Newton’s method • Concept: • Construct local quadratic approximation • Minimize approximation • Repeat

Evaluated at x • Update: Newton’s method (2) • Step: • Note: • Finds minimum of quadratic functions in 1 step! • Step includes solving (dense) linear system of equations • If H not positive definite, divergence occurs

Newton: (Evaluated at u = 0) Similarity to linear FE • Linear mechanics problem: (quadratic in u) • Nonlinear mechanics similar, with multiple steps.

Error • Update: • Newton’s method has quadratic convergence (best!) close to optimum: Iteration Newton’s method (3) • To avoid divergence: line search.Search direction:

Remedy: use modified H:with b such that H positive definite Steepest descent b large: Newton b small: Levenberg-Marquardt method • Problems in Newton method: • Bad convergence / divergence when far from optimum • What to do when H singular / not positive definite? • Levenberg-Marquardt method: start with large bk, and decrease gradually:

x2 Defines trust region x1 • Trust region adjusted based on approximation quality: Actual reduction Predicted reduction Trust region methods • Different approach to make Newton’s method more robust: local approximation locally valid:

Trust region methods (2) • Performance: robust, can also deal with neg. def. H • Not sensitive to variable scaling • Similar concept in computational mechanics: arc-length methods • Trust region subproblem is quadratic constrained optimization problem: • Rather expensive • Approximate solution methods popular: dogleg methods, Steihaug method, …

Newton summary • Second order method: Newton’s method • Conclusions: • Most efficient: solves quadratic problem in 1 step • Not robust! • Robustness improvements: • Levenberg-Marquardt (blend Steepest decent / Newton) • Trust region approaches

Newton step: Quasi-Newton step: Update (or ) Quasi-Newton methods • Drawbacks remain for Newton method: • Need to evaluate H (often impractical) • Storage of H, solving system of equations • Alternatives: quasi-Newton methods: • Based on approximating H (or H-1) • Use only first-order derivative information

Operating on H-1 avoids solving a linear system each step. Define: • Update equations: Rank one update Rank two update Quasi-Newton fundamentals • First order Taylor approximation of gradient:

Update functions: rank 1 • Notation: • Broyden’s update: • Drawback: updates not guaranteed to remain positive definite (possible divergence)

Broyden, Fletcher, Goldfarb, Shannon (BFGS): Update functions: rank 2 • Davidson-Fletcher-Powell (DFP): • Both DFP and BFGS are conjugate gradient methods • Rank 2 quasi-Newton methods: best general-purpose unconstrained optimization (hill climbing) methods

Line search: BFGS example • Minimization of quadratic function: Starting point:

BFGS: Line search: Solution found. BFGS example (2)

BFGS: BFGS example (3) Check:

Steepest descent BFGS DFP Nelder-Mead simplex • Nelder-Mead simplex: 210 Newton Levenberg-Marquardt • Steepest descent: >300 (no convergence) • BFGS: 53 • DFP: 260 • Newton: 18 • Levenberg-Marquardt: 29 Comparison on Banana function 210

Summary multi-variable algorithms for unconstrained optimization • Theoretically nothing beats Newton, but: • Expensive computations and storage (Hessian) • Not robust • Quasi-Newton (BFGS) best if gradients available • Zeroth-order methods robust and simple, but inefficient for n > 10 • CG inferior to quasi-Newton methods, but calculations simple: more efficient for n > 1000 • Exact line search important! • Variable scaling can have large impact

Contents • Unconstrained Optimization:Methods for Multiple Variables • 1st order methods: CG • 2nd order methods • Quasi-Newton Methods • Constrained Optimization: Optimality Criteria

First Order Necessity Condition: • Second Order Sufficiency Condition: H positive definite • For convex f in convex feasible domain: condition for global minimum: • Sufficiency Condition: Summary optimality conditions • Conditions for local minimum of unconstrained problem:

f f Interior optimum Boundary optima Boundary optima • Today’s topic: optimality conditions for constrained problems g2 x2 g1 x1

g2 • Feasible direction s:line in direction s remains in X for some finite length: x2 y g1 x1 Feasible perturbations / directions • Consider feasible space X • Feasible perturbation: X x

g2 s x2 g1 x1 Boundary optimum • Necessary condition for boundary optimum: f cannot decrease further in any feasible direction (no feasible direction exists for which f decreases) • Approach for numerical algorithms: move along feasible directions until  condition holds

Equality constrained problem • First, only equality constraints considered: • Simplest case! • Active inequality constraints can be treated as equality constraints • Active inequality constraints can be identified by e.g. Monotonicity Analysis

Problem dimension Equality constrained problem (2) • Each (functionally independent) equality constraint reduces the dimension of the problem: • Solutions can only exist in the feasible subspace Xof dimension n – m (hypersurface) • Examples: n = 3, m = 2: X = line (1-D)n = 3, m = 1: X = surface (2-D)

Description of constraint surface • Why? • Local characterization of constraint surface leads to optimality conditions • Basis for numerical algorithms • Assumptions: • Constraints differentiable, functionally independent • All points are regular points: Constraint gradients hi all linearly independent

Some definitions … • Normal hyperplane: spanned by all gradients, normal to constraint surface • Tangent hyperplane: orthogonal to the normal plane:

Normal hyperplane Tangent hyperplane (line) Normal / tangent hyperplane (2) • Example:

Optimality conditions • Simplest approach: • Eliminate variables using equality constraints • Result: unconstrained problem of dimension n – m • Apply unconstrained optimality conditions • But often not possible / practical: • Elimination fails when variables cannot be explicitly solved from equality constraints • No closed form of objective function (e.g. simulations)

 Local approximation • Approach: build local approximation for constrained case, using (very small) feasible perturbations: m + 1 equations, n + 1 unknowns (f, xi), where n > m  n - m = p degrees of freedom

dependent independent Local approximation (2) • Divide design variables in two subsets: • pdecision/control variables d • m state/solution variables s

Dependent/independent variables • Example:

Local approximation (3) • Eliminating solution variable perturbation (dependent):

Reduced / constrained gradient • f locally expressed in decision variables only: Reduced gradient • Variation of f expressed in decision variables:

 Optimality condition for equality-constrained problem:Reduced gradient zero. Optimality condition • z unconstrained function of decision variables dunconstrained optimality condition can be used:

x1 L Take x2 m Equilibrium: Example

Consider stationary point of L: Lagrange multipliers Lagrange approach • Alternative way to formulate optimality conditions: formulate Lagrangian:

x1 L x2 m Equilibrium: Example

Reduced gradient: m decision variables, n - m solution variables (total n) • Lagrange approach: n design variables, m multipliers (total m + n) Comparison (Often not possible) • Elimination: n – m variables,

3 equations of motion per body Generalized forces Total kinetic energy Generalized displacement Application in mechanics • Lagrangian method widely used to enforce kinematic constraints, e.g. multibody dynamics: • Variational formulation of equations of motion (planar case):

Constrained equations of motion (based on Lagrangian): Same units as force. Interpretation: force required to satisfy constraint. Multibody dynamics (2) • Bodies connected by m joints  2m constraint equations (1 DOF per joint): • System simulation: solve  for qi (3n) and lj (2m) simultaneously

f x2 h Meaning: h f x1 Gradients parallel  tangents parallel h tangent to isolines Geometrical interpretation • For single equality constraint: simple geometrical interpretation of Lagrange optimality condition:

Engineering Optimization