Optimization Methods: Steepest Descent & Conjugate Gradient Preview

Gradient Methods May 2005

Preview • Background • Steepest Descent • Conjugate Gradient

Background • Motivation • The gradient notion • The Wolfe Theorems

Motivation • The min(max) problem: • But we learned in calculus how to solve that kind of question!

Motivation • Not exactly, • Functions: • High order polynomials: • What about function that don’t have an analytic presentation: “Black Box”

Motivation- “real world” problem • Connectivity shapes (isenburg,gumhold,gotsman) • What do we get only from C without geometry?

Motivation- “real world” problem • First we introduce error functionals and then try to minimize them:

Motivation- “real world” problem • Then we minimize: • High dimension non-linear problem. • The authors use conjugate gradient method which is maybe the most popular optimization technique based on what we’ll see here.

Motivation- “real world” problem • Changing the parameter:

Motivation • General problem: find global min(max) • This lecture will concentrate on finding localminimum.

Directional Derivatives: first, the one dimension derivative:

Directional Derivatives : Along the Axes…

Directional Derivatives : In general direction…

Directional Derivatives

The Gradient: Definition in In the plane

The Gradient: Definition

The Gradient Properties • The gradient defines (hyper) plane approximating the function infinitesimally

The Gradient properties • By the chain rule: (important for later use)

The Gradient properties • Proposition 1: is maximal choosing is minimal choosing (intuitive: the gradient points at the greatest change direction)

The Gradient properties Proof: (only for minimum case) Assign: by chain rule:

The Gradient properties On the other hand for general v:

The Gradient Properties • Proposition 2: let be a smooth function around P, if f has local minimum (maximum) at p then, (Intuitive: necessary for local min(max))

The Gradient Properties Proof: Intuitive:

The Gradient Properties Formally: for any We get:

The Gradient Properties • We found the best INFINITESIMAL DIRECTIONat each point, • Looking for minimum: “blind man” procedure • How can we derive the way to the minimum using this knowledge?

The Wolfe Theorem • This is the link from the previous gradient properties to the constructive algorithm. • The problem:

The Wolfe Theorem • We introduce a model for algorithm: Data: Step 0: set i=0 Step 1: if stop, else, compute search direction Step 2: compute the step-size Step 3: set go to step 1

The Wolfe Theorem The Theorem: suppose C1 smooth, and exist continuous function: And, And, the search vectors constructed by the model algorithm satisfy:

The Wolfe Theorem And Then if is the sequence constructed by the algorithm model, then any accumulation point y of this sequence satisfy:

The Wolfe Theorem The theorem has very intuitive interpretation : Always go in decent direction.

Steepest Descent • What it mean? • We now use what we have learned to implement the most basic minimization technique. • First we introduce the algorithm, which is a version of the model algorithm. • The problem:

Steepest Descent • Steepest descent algorithm: Data: Step 0: set i=0 Step 1: if stop, else, compute search direction Step 2: compute the step-size Step 3: set go to step 1

Steepest Descent • Theorem: if is a sequence constructed by the SD algorithm, then every accumulation point y of the sequence satisfy: • Proof: from Wolfe theorem Remark: Wolfe theorem gives us numerical stability if the derivatives aren’t given (are calculated numerically).

Steepest Descent • From the chain rule: • Therefore the method of steepest descent looks like this:

Steepest Descent

Steepest Descent • The steepest descent find critical point and local minimum. • Implicit step-size rule • Actually we reduced the problem to finding minimum: • There are extensions that gives the step size rule in discrete sense. (Armijo)

Steepest Descent • Back with our connectivity shapes: the authors solve the 1-dimension problem analytically. • They change the spring energy and get a quartic polynomial in x

Conjugate Gradient • We from now on assume we want to minimize the quadratic function: • This is equivalent to solve linear problem: • There are generalizations to general functions.

Conjugate Gradient • What is the problem with steepest descent? • We can repeat the same directions over and over… • Conjugate gradient takes at mostn steps.

Conjugate Gradient Search directions – should span

Conjugate Gradient Given , how do we calculate ? (as before)

Conjugate Gradient How do we find ? We want that after n step the error will be 0 :

Conjugate Gradient Here an idea: if then: So if ,

Conjugate Gradient So we look for such that : Simple calculation shows that if we take A - conjugate (- orthogonal)

Optimization Methods: Steepest Descent & Conjugate Gradient Preview