800 likes | 811 Views
Learn about optimization problems, mathematical background, descent methods like gradient-based optimization. Understand objectives, decision variables, and constraints. Explore derivative-based methods such as steepest descent, Newton's method, and conjugate gradient. Delve into mathematical concepts like quadratic forms, positive definiteness, and symmetric matrices. Discover how derivatives and gradients play a crucial role in optimization algorithms. Master the chain rule, gradients, and smooth functions in the context of optimization.
E N D
Computacion Inteligente Derivative-Based Optimization
Contents • Optimization problems • Mathematical background • Descent Methods • The Method of Steepest Descent • Conjugate Gradient
Objective function – mathematical function which is optimized by changing the values of the design variables. • Design Variables – Those variables which we, as designers, can change. • Constraints – Functions of the design variables which establish limits in individual variables or combinations of design variables.
3 basic ingredients… • an objective function, • a set of decision variables, • a set of equality/inequality constraints. The problem is to search for the values of the decision variables that minimize the objective function while satisfying the constraints…
Obective Decision vector Bounds constrains • Design Variables: decision and objective vector • Constraints: equality and inequality • Bounds: feasible ranges for variables • Objective Function: maximization can be converted to minimization due to the duality principle
Identify the quantity or function, f, to be optimized. • Identify the design variables: x1, x2, x3, …,xn. • Identify the constraints if any exist a. Equalities b. Inequalities • Adjust the design variables (x’s) until f is optimized and all of the constraints are satisfied.
Objective functions may be unimodal or multimodal. • Unimodal – only one optimum • Multimodal – more than one optimum • Most search schemes are based on the assumption of a unimodal surface. The optimum determined in such cases is called a local optimum design. • The global optimum is the best of all local optimum designs.
Existence of global minimum • If f(x) is continuous on the feasible set S which is closed and bounded, then f(x) has a global minimum in S • A set S is closed if it contains all its boundary pts. • A set S is bounded if it is contained in the interior of some circle compact = closed and bounded
x2 x1
saddle point local max
Derivative-based optimization (gradient based) • Capable of determining “search directions” according to an objective function’s derivative information • steepest descent method; • Newton’s method; Newton-Raphson method; • Conjugate gradient, etc. • Derivative-free optimization • random search method; • genetic algorithm; • simulated annealing; etc.
The scalar xTMx= is called a quadratic form. for all x ≠ 0 • A square matrix M is positive definiteif • It is positive semidefiniteif for all x
A symmetric matrix M = MT is positive definite if and only if its eigenvalues λi > 0. (semidefinite ↔ λi ≥ 0) • Proof (→): Let vi the eigenvector for the i-th eigenvalue λi • Then, • which implies λi > 0, prove that positive eigenvalues imply positive definiteness.
Proof. Let’s f be defined as • If we can show that f is always positive then M must be positive definite. We can write this as • Provided that Ux always gives a non zero vector for all values of x except when x = 0 we can write b = U x, i.e. • so f must always be positive • Theorem: If a matrix M = UTU then it is positive definite
f: Rn→ R is a quadratic function if • where Q is symmetric.
It is no necessary for Q be symmetric. • Suposse matrix P non-symmetric Q is symmetric
Suposse matrix P non-symmetric. Example Q is symmetric
Given the quadratic function If Q is positive definite, then f is a parabolic “bowl.”
Two other shapes can result from the quadratic form. • If Q is negative definite, then f is a parabolic “bowl” up side down. • If Q is indefinite then f describes a saddle.
Quadratics are useful in the study of optimization. • Often, objective functions are “close to” quadratic near the solution. • It is easier to analyze the behavior of algorithms when applied to quadratics. • Analysis of algorithms for quadratics gives insight into their behavior in general.
The derivative of f: R → R is a function f ′: R → R given by • if the limit exists.
Definition: A real-valued function f: Rn→ R is said to be continuously differentiable if the partial derivatives • exist for each x in Rnand are continuous functions of x. • In this case, we say f C1(a smoothfunctionC1)
Definition: The gradient of f: in R2→ R: It is a function ∇f: R2→ R2given by In the plane
Definition: The gradient of f: Rn→ R is a function ∇f: Rn→ Rngiven by
The gradient defines (hyper) plane approximating the function infinitesimally
Proposition 1: is maximal choosing intuitive: the gradient points at the greatest change direction Prove it!
Proof: • Assign: • by chain rule:
Proof: • On the other hand for general v:
Proposition 2: let f: Rn→ R be a smooth function C1 around p, • if f has local minimum (maximum) at p then, Intuitive: necessary for local min(max)
We found the best INFINITESIMAL DIRECTION at each point, • Looking for minimum: “blind man” procedure • How can we derive the way to the minimum using this knowledge?
The gradient of f: Rn→ Rmis a function Df: Rn→ Rm×ngiven by called Jacobian Note that for f: Rn→ R , we have ∇f(x) = Df(x)T.
If the derivative of ∇f exists, we say that f is twice differentiable. • Write the second derivative as D2f (or F), and call it the Hessianof f.
The level set of a function f: Rn→ R at level c is the set of points S = {x: f(x) = c}.
Proof of fact: • Imagine a particle traveling along the level set. • Let g(t) be the position of the particle at time t, with g(0) = x0. • Note that f(g(t)) = constant for all t. • Velocity vector g′(t) is tangent to the level set. • Consider F(t) = f(g(t)). We have F′(0) = 0. By the chain rule, • Hence, ∇f(x0) and g′(0) are orthogonal.
Suppose f: R → R is in C1. Then, • o(h) is a term such that o(h) = h → 0 as h → 0. • At x0, f can be approximated by a linear function, and the approximation gets better the closer we are to x0.
Suppose f: R → R is in C2. Then, • At x0, f can be approximated by a quadratic function.
Suppose f: Rn→ R. • If f in C1, then • If f in C2, then
We already know that ∇f(x0) is orthogonal to the level set at x0. • Suppose ∇f(x0) ≠ 0. • Fact: ∇f points in the direction of increasing f.
Consider xα = x0 + α∇f(x0), α > 0. • By Taylor's formula, • Therefore, for sufficiently small , f(xα) > f(x0)
This theorem is the link from the previous gradient properties to the constructive algorithm. • The problem:
We introduce a model for algorithm: Data Step 0: set i = 0 Step 1: if stop, else, compute search direction Step 2: compute the step-size Step 3: set go to step 1