1 / 53

Gradient Methods

Gradient Methods. May 2005. Preview. Background Steepest Descent Conjugate Gradient. Preview. Background Steepest Descent Conjugate Gradient. Background. Motivation The gradient notion The Wolfe Theorems. Motivation. The min(max) problem:

kann
Download Presentation

Gradient Methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Gradient Methods May 2005

  2. Preview • Background • Steepest Descent • Conjugate Gradient

  3. Preview • Background • Steepest Descent • Conjugate Gradient

  4. Background • Motivation • The gradient notion • The Wolfe Theorems

  5. Motivation • The min(max) problem: • But we learned in calculus how to solve that kind of question!

  6. Motivation • Not exactly, • Functions: • High order polynomials: • What about function that don’t have an analytic presentation: “Black Box”

  7. Motivation- “real world” problem • Connectivity shapes (isenburg,gumhold,gotsman) • What do we get only from C without geometry?

  8. Motivation- “real world” problem • First we introduce error functionals and then try to minimize them:

  9. Motivation- “real world” problem • Then we minimize: • High dimension non-linear problem. • The authors use conjugate gradient method which is maybe the most popular optimization technique based on what we’ll see here.

  10. Motivation- “real world” problem • Changing the parameter:

  11. Motivation • General problem: find global min(max) • This lecture will concentrate on finding localminimum.

  12. Background • Motivation • The gradient notion • The Wolfe Theorems

  13. Directional Derivatives: first, the one dimension derivative:

  14. Directional Derivatives : Along the Axes…

  15. Directional Derivatives : In general direction…

  16. Directional Derivatives

  17. The Gradient: Definition in In the plane

  18. The Gradient: Definition

  19. The Gradient Properties • The gradient defines (hyper) plane approximating the function infinitesimally

  20. The Gradient properties • By the chain rule: (important for later use)

  21. The Gradient properties • Proposition 1: is maximal choosing is minimal choosing (intuitive: the gradient points at the greatest change direction)

  22. The Gradient properties Proof: (only for minimum case) Assign: by chain rule:

  23. The Gradient properties On the other hand for general v:

  24. The Gradient Properties • Proposition 2: let be a smooth function around P, if f has local minimum (maximum) at p then, (Intuitive: necessary for local min(max))

  25. The Gradient Properties Proof: Intuitive:

  26. The Gradient Properties Formally: for any We get:

  27. The Gradient Properties • We found the best INFINITESIMAL DIRECTIONat each point, • Looking for minimum: “blind man” procedure • How can we derive the way to the minimum using this knowledge?

  28. Background • Motivation • The gradient notion • The Wolfe Theorems

  29. The Wolfe Theorem • This is the link from the previous gradient properties to the constructive algorithm. • The problem:

  30. The Wolfe Theorem • We introduce a model for algorithm: Data: Step 0: set i=0 Step 1: if stop, else, compute search direction Step 2: compute the step-size Step 3: set go to step 1

  31. The Wolfe Theorem The Theorem: suppose C1 smooth, and exist continuous function: And, And, the search vectors constructed by the model algorithm satisfy:

  32. The Wolfe Theorem And Then if is the sequence constructed by the algorithm model, then any accumulation point y of this sequence satisfy:

  33. The Wolfe Theorem The theorem has very intuitive interpretation : Always go in decent direction.

  34. Preview • Background • Steepest Descent • Conjugate Gradient

  35. Steepest Descent • What it mean? • We now use what we have learned to implement the most basic minimization technique. • First we introduce the algorithm, which is a version of the model algorithm. • The problem:

  36. Steepest Descent • Steepest descent algorithm: Data: Step 0: set i=0 Step 1: if stop, else, compute search direction Step 2: compute the step-size Step 3: set go to step 1

  37. Steepest Descent • Theorem: if is a sequence constructed by the SD algorithm, then every accumulation point y of the sequence satisfy: • Proof: from Wolfe theorem Remark: Wolfe theorem gives us numerical stability if the derivatives aren’t given (are calculated numerically).

  38. Steepest Descent • From the chain rule: • Therefore the method of steepest descent looks like this:

  39. Steepest Descent

  40. Steepest Descent • The steepest descent find critical point and local minimum. • Implicit step-size rule • Actually we reduced the problem to finding minimum: • There are extensions that gives the step size rule in discrete sense. (Armijo)

  41. Steepest Descent • Back with our connectivity shapes: the authors solve the 1-dimension problem analytically. • They change the spring energy and get a quartic polynomial in x

  42. Preview • Background • Steepest Descent • Conjugate Gradient

  43. Conjugate Gradient • We from now on assume we want to minimize the quadratic function: • This is equivalent to solve linear problem: • There are generalizations to general functions.

  44. Conjugate Gradient • What is the problem with steepest descent? • We can repeat the same directions over and over… • Conjugate gradient takes at mostn steps.

  45. Conjugate Gradient Search directions – should span

  46. Conjugate Gradient Given , how do we calculate ? (as before)

  47. Conjugate Gradient How do we find ? We want that after n step the error will be 0 :

  48. Conjugate Gradient Here an idea: if then: So if ,

  49. Conjugate Gradient So we look for such that : Simple calculation shows that if we take A - conjugate (- orthogonal)

More Related