1 / 22

Search and Optimization Methods

Search and Optimization Methods. Based in part on Chapter 8 of Hand, Manilla, & Smyth David Madigan. Introduction. This chapter is about finding the models and parameters that minimize a general score function S Often have to conduct a parameter search for each visited model

Download Presentation

Search and Optimization Methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Search and Optimization Methods Based in part on Chapter 8 of Hand, Manilla, & Smyth David Madigan

  2. Introduction • This chapter is about finding the models and parameters that minimize a general score function S • Often have to conduct a parameter search for each visited model • The number of possible structures can be immense. For example, there are 3.6  1013 undirected graphical models with 10 vertices

  3. Greedy Search 1. Initialize. Chose an initial state Mk 2. Iterate. Evaluate the score function at all adjacent states and move to the best one 3. Stopping Criterion. Repeat step 2 until no further improvement can be made. 4. Multiple Restarts. Repeat 1-3 from different starting points and choose the best solution found.

  4. Systematic Search Heuristics Breadth-first, Depth-first, Beam-search, etc.

  5. Parameter Optimization Finding the parameters  that minimize a score function S() is usually equivalent to the problem of minimizing a complicated function in a high-dimensional space Define the gradient function is S as: When closed form solutions to S()=0 exist, no need for numerical methods.

  6. Gradient-Based Methods 1. Initialize. Choose an initial value for  = 0 2. Iterate. Starting with i=0, let i+1 = i +ivi where v is the direction of the next step and lambda is the distance. Generally choose v to be a direction that improves the score 3. Convergence. Repeat step 2 until S appears to have reached a local minimum. 4. Multiple Restarts. Repeat steps 1-3 from different initial starting points and choose the best minimum found.

  7. Univariate Optimization Let g()=S’(). Newton-Raphson proceeds as follows. Suppose g(s)=0. Then:

  8. 1-D Gradient-Descent •  usually chosen to be quite small • Special case of NR where 1/g’(i) is replaced by a constant

  9. Multivariate Case Curse-of-Dimensionality again. For example, suppose S is defined on a d-dimensional unit hypercube. Suppose we know that all components of  are less than 1/2 at the optimum. if d=1, have eliminated half the parameter space if d=2, have eliminated 1/4 of the parameter space if d=20, have eliminated 1/1,000,000 of the parameter space!

  10. Multivariate Gradient Descent • -g(i) points in the direction of steepest descent • Guaranteed to converge if  small enough • Essentially the same as the back-propagation method used in NNs • Can replace  with second-derivative information (“quasi-Newton” uses approx).

  11. Simplex Search Method Evaluates d+1 points arranged in a hyper-tetrahedron For example, with d=2, evaluates S at the vertices of an equilateral triangle Reflect the triangle in the side opposite the vertex with the highest value Repeat until oscillation occurs, then half the sides of the triangle No calculation of derivatives...

  12. EM for two-component Gaussian mixture Tricky to maximize…

  13. EM for two-component Gaussian mixture, cont. This is the “E-step.” Does a soft assignment of observations to mixture components

  14. EM for two-component Gaussian mixture – Algorithm

  15. EM with Missing Data Let Q(H) denote a probability distribution for the missing data This is a lower bound on l()

  16. EM (continued) In the E-Step, Max is achieved when In the M-Step, need to maximize:

  17. EM Normal Mixture Example

  18. EM Normal Mixture Example

More Related