Optimization

Optimization

Issues • What is optimization? • What real life situations give rise to optimization problems? • When is it easy to optimize? • What are we trying to optimize? • What can cause problems when we try to optimize? • What methods can we use to optimize?

One-Dimensional Minimization Golden section search Brent’s method

Golden section search: successively narrowing the brackets of upper and lower bounds Terminating condition: |x3–x1|<e One-Dimensional Minimization • Start with x1,x2,x3 where f2 is smaller than f1 and f3 • Iteration: • Choose x4 somewhere in the larger interval • Two cases for f4: • f4a: [x1,x2,x4] • f4b: [x2,x4,x3] Initial bracketing…

From GSL • Upper bound a, lower bound b, initial estimate x • f(a) > f(x) < f(b) • This condition guarantees that a minimum is contained somewhere within the interval. • On each iteration a new point x' is selected using one of the available algorithms. • If the new point is a better estimate of the minimum, i.e. where f(x') < f(x), then the current estimate of the minimum x is updated. • The new point also allows the size of the bounded interval to be reduced, by choosing the most compact set of points which satisfies the constraint f(a) > f(x) < f(b). • The interval is reduced until it encloses the true minimum to a desired tolerance. • This provides a best estimate of the location of the minimum and a rigorous error estimate.

[GSL] Choosing the golden section as the bisection ratio can be shown to provide the fastest convergence for this type of algorithm. Golden Section Search Guaranteed linear convergence: [x1,x3]/[x1,x4] = 1.618

Golden Section f (reference)

Fibonacci Search (ref) Related… Fi: 0, 1, 1, 2, 3, 5, 8, 13, …

Parabolic Interpolation (Brent)

Brent Details (From GSL) • The minimum of the parabola is taken as a guess for the minimum. • If it lies within the bounds of the current interval then the interpolating point is accepted, and used to generate a smaller interval. • If the interpolating point is not accepted then the algorithm falls back to an ordinary golden section step. • The full details of Brent's method include some additional checks to improve convergence.

Brent(details) • The abscissa x that is the minimum of a parabola through three points (a,f(a)), (b,f(b)), (c,f(c))

Multi-Dimensional Minimization Gradient Descent Conjugate Gradient

Gradient and Hessian f: RnR. If f(x) is of class C2, objective function • Gradient of f • Hessian of f

For one dimensional f(x) Optimality Taylor’s expansion Positive semi-definite Hessian

Multi-Dimensional Optimization Higher dimensional root finding is no easier (more difficult) than minimization

Quasi-Newton Method Taylor’s series of f(x) around xk: B: an approximation to the Hessian matrix The gradient of this approximation: Setting this gradient to zero provides the Newton step: The various quasi-Newton methods (DFP, BFGS, Broyden) differ in their choice of the solution to update B.

Gradient Descent Are the directions always orthogonal? Yes!

Minimize Example minimum

Gradient is perpendicular to level curves and surfaces(proof)

Weakness of Gradient Descent Narrow valley

Any function f(x) can be locally approximated by a quadratic function where Conjugate gradient method is a method that works well on this kind of problems

Conjugate Gradient • An iterative method for solving linear systems Ax=b, where A is symmetric and positive definite • Guaranteed to converge in n steps, where n is the system size • Symmetric A is positive definite if it has (any of these): • All n eigenvalues are positive • All n upper left determinants are positive • All n pivots are positive • xTAx is positive except at x = 0

Details (from wikipedia) • Two nonzero vectors u & v are conjugate w.r.t. A: • {pk} are n mutually conjugate directions. {pk} form a basis of Rn. • x*, the solution to Ax=b, can be expressed in this basis • Therefore, Find pk’s Solve ak’s

The Iterative Method • Equivalent problem: find the minimal of the quadratic function, • Taking the first basis vector p1 to be the gradient of f at x = x0; the other vectors in the basis will be conjugate to the gradient • rk: the residual at kth step, Note that rk is the negative gradient of f at x = xk

The Algorithm

Example Stationary point at [-1/26, -5/26]

Solving Linear Equations • The optimality condition seems to suggest that CG can be used to solve linear equations • CG is only applicable for symmetric positive definite A. • For arbitrary linear systems, solve the normal equation since ATA is symmetric and positive-semidefinite for any A • But, k(ATA) = k(A)^2! Slower convergence, worse accuracy • BiCG (biconjugate gradient) is the approach to use for general A

Multidimensional Minimizer [GSL] • Conjugate gradient • Fletcher-Reeves, Polak-Ribiere • Quasi-Newton • Broyden-Fletcher-Goldfarb-Shanno (BFGS) • Utilizes 2nd order approximation • Steepest descent • Inefficient (for demonstration purpose) • Simplex algorithm (Nelder and Mead) • Without derivative

GSL Example Starting from (5,7) Objective function: paraboloid

Conjugate gradient Converge in 12 iterations Steepest descent Converge in 158 iterations

[Solutions in Numerical Recipe] • Sec.2.7 linbcg (biconjugate gradient): general A • Reference A implicitly through atimes • Sec.10.6 frprmn (minimization) • Model test problem: spacetime, …

Optimization

Optimization

Presentation Transcript

OPTIMIZATION

Optimization

Optimization

Optimization

Optimization

Optimization

Optimization

Optimization

Optimization

Optimization

Optimization

Optimization

Optimization

Optimization

Optimization

Optimization

Optimization

Optimization

Optimization

OPTIMIZATION

Optimization