430 likes | 540 Views
Learn about gradient descent, conjugate gradient methods, directional derivatives, and the Wolfe Theorems for optimization problems. Understand how to find local minimums and implement modern optimization techniques.
E N D
Gradient Methods Yaron Lipman May 2003
Preview • Background • Steepest Descent • Conjugate Gradient
Preview • Background • Steepest Descent • Conjugate Gradient
Background • Motivation • The gradient notion • The Wolfe Theorems
Motivation • The min(max) problem: • But we learned in calculus how to solve that kind of question!
Motivation • Not exactly, • Functions: • High order polynomials: • What about function that don’t have an analytic presentation: “Black Box”
Motivation • “real world” problem finding harmonic mapping • General problem: find global min(max) • This lecture will concentrate on finding localminimum.
Background • Motivation • The gradient notion • The Wolfe Theorems
Directional Derivatives: first, the one dimension derivative:
Directional Derivatives : In general direction…
The Gradient: Definition in In the plane
The Gradient Properties • The gradient defines (hyper) plane approximating the function infinitesimally
The Gradient properties • By the chain rule: (important for later use)
The Gradient properties • Proposition 1: is maximal choosing is minimal choosing (intuitive: the gradient point the greatest change direction)
The Gradient properties Proof: (only for minimum case) Assign: by chain rule:
The Gradient properties On the other hand for general v:
The Gradient Properties • Proposition 2: let be a smooth function around P, if f has local minimum (maximum) at p then, (Intuitive: necessary for local min(max))
The Gradient Properties Proof: Intuitive:
The Gradient Properties Formally: for any We get:
The Gradient Properties • We found the best INFINITESIMAL DIRECTIONat each point, • Looking for minimum: “blind man” procedure • How can we derive the way to the minimum using this knowledge?
Background • Motivation • The gradient notion • The Wolfe Theorems
The Wolfe Theorem • This is the link from the previous gradient properties to the constructive algorithm. • The problem:
The Wolfe Theorem • We introduce a model for algorithm: Data: Step 0: set i=0 Step 1: if stop, else, compute search direction Step 2: compute the step-size Step 3: set go to step 1
The Wolfe Theorem The Theorem: suppose C1 smooth, and exist continuous function: And, And, the search vectors constructed by the model algorithm satisfy:
The Wolfe Theorem And Then if is the sequence constructed by the algorithm model, then any accumulation point y of this sequence satisfy:
The Wolfe Theorem The theorem has very intuitive interpretation : Always go in decent direction.
Preview • Background • Steepest Descent • Conjugate Gradient
Steepest Descent • What it mean? • We now use what we have learned to implement the most basic minimization technique. • First we introduce the algorithm, which is a version of the model algorithm. • The problem:
Steepest Descent • Steepest descent algorithm: Data: Step 0: set i=0 Step 1: if stop, else, compute search direction Step 2: compute the step-size Step 3: set go to step 1
Steepest Descent • Theorem: if is a sequence constructed by the SD algorithm, then every accumulation point y of the sequence satisfy: Proof: from Wolfe theorem
Steepest Descent • From the chain rule: • Therefore the method of steepest descent looks like this:
Steepest Descent • The steepest descent find critical point and local minimum. • Implicit step-size rule • Actually we reduced the problem to finding minimum: • There are extensions that gives the step size rule in discrete sense. (Armijo)
Preview • Background • Steepest Descent • Conjugate Gradient
Conjugate Gradient • Modern optimization methods : “conjugate direction” methods. • A method to solve quadratic function minimization: (H is symmetric and positive definite)
Conjugate Gradient • Originally aimed to solve linear problems: • Later extended to general functions under rational of quadratic approximation to a function is quite accurate.
Conjugate Gradient • The basic idea: decompose the n-dimensional quadratic problem into n problems of 1-dimension • This is done by exploring the function in “conjugate directions”. • Definition: H-conjugate vectors:
Conjugate Gradient • If there is an H-conjugate basis then: • N problems in 1-dimension (simple smiling quadratic) • The global minimizer is calculated sequentially starting from x0: