1 / 16

Continuous optimization Problems and successes

Continuous optimization Problems and successes. Tijl De Bie Intelligent Systems Laboratory MVSE, University of Bristol United Kingdom tijl.debie@bristol.ac.uk. Motivation. Back-propagation algorithm for training neural networks (gradient descent) Support vector machines

makaio
Download Presentation

Continuous optimization Problems and successes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Continuous optimization Problems and successes Tijl De Bie Intelligent Systems LaboratoryMVSE, University of BristolUnited Kingdomtijl.debie@bristol.ac.uk

  2. Motivation • Back-propagation algorithm for training neural networks (gradient descent) • Support vector machines • Convex optimization `boom’ (NIPS, also ICML, KDD...) What explains this success? (Is it really a success?) (Mainly for CP-ers not familiar with continuous optimization)

  3. (Convex) continuous optimization • Continuousoptimization: • Convex optimization:

  4. Convex optimization

  5. Convex optimization • General convex optimization approach • Start with a guess, iteratively improve until optimum found • E.g. Gradient descent, conjugate gradient, Newton method, etc • For constrained convex optimization:Interior point methods • Provably efficient (worst-case, typical case even better) • Iteration complexity: • Complexity per iteration: polynomial • Out-of-the-box tools exist (SeDuMi, SDPT3, MOSEK...) • Purely declarative • Book: Convex Optimization (Boyd & Vandenberghe)

  6. Convex optimization Convex optimization Logdet Cone Programming LP QP Geometric programming SOCP SDP

  7. Linear Programming (LP) • Linear objectiveLinear inequality constraintsAffine equality constraints • Applications: • Relaxations of Integer LP’s • Classification: linear support vector machines (SVM),forms of boosting • (Lots outside DM/ML)

  8. Convex Quadratic Programming (QP) • Convex Quadratic constraints • LP is a special case where • Applications: • Classification/regression: SVM • Novelty detection: minimum volume enclosing hypersphere • Regression + feature selection: lasso • Structured prediction problems

  9. Second-Order Cone Programming (SOCP) • Second Order Cone constraints • QCQP is a special case where • Applications: • Metric learning • Fermat-Weber problem: find a point in a plane with minimal sum of distances to a set of points • Robust linear programming

  10. Semi-Definite Programming (SDP) • Constraints requiring a matrix to be Positive Semi-Definite: • SOCP is a special case: • Applications: • Metric learning • Low rank matrix approximations (dimensionality reduction) • Very tight relaxations of graph labeling problems (e.g. Max-cut) • Semi-supervised learning • Approximate inference in difficult graphical models

  11. Geometric programming • Objective and constraints of the form: • Applications: • Maximum entropy modeling with moment constraints • Maximum likelihood fitting of exponential family distributions

  12. Log Determinant Optimization (Logdet) • Objective is the log determinant of a matrix: • = -volume of parallelepiped spanned by columns of X • Applications: • Novelty detection: minimum volume enclosing ellipsoid • Experimental design / active learning (which labels for which data points are likely to be most informative)

  13. Eigenvalue problems • Eigenvalue problems are not convex optimization problems • Still, a relatively efficient and globally convergent, and a useful primitive: • Dimensionality reduction (PCA) • Finding relations between datasets (CCA) • Spectral clustering • Metric learning • Relaxations of combinatorial problems

  14. The hype • Very popular in conferences like NIPS, ICML, KDD • These model classes are sufficiently rich to do sophisticated things • Sparsity: L1 norm/linear constraints  feature selection • Low-rank of matrices: SDP constraint and trace norm (sparse PCA, labeling problems...) • Declarative nature, little expertise needed • Computational complexity is easy to understand

  15. After the hype • But: • Polynomial-time, often with a high exponentE.g. SDP: and sometimes • Convex constraints can be too limitative • Tendency toward other paradigms: • Convex-concave programming (Few guarantees, but works well in practice) • Submodular optimization(Approximation guarantees, works well in practice)

  16. CP vs Convex Optimization • “CP: Choosing the best model is an art” (Helmut)“CP requires skill and ingenuity” (Barry) • I understand in CP there is a hierarchy of propagation methods, but... • Is there a hierarchy of problem complexities? • How hard is it to see if a constraint will propagate well? • Does it depend on the implementation? • ...

More Related