190 likes | 404 Views
Continuous optimization Problems and successes. Tijl De Bie Intelligent Systems Laboratory MVSE, University of Bristol United Kingdom tijl.debie@bristol.ac.uk. Motivation. Back-propagation algorithm for training neural networks (gradient descent) Support vector machines
E N D
Continuous optimization Problems and successes Tijl De Bie Intelligent Systems LaboratoryMVSE, University of BristolUnited Kingdomtijl.debie@bristol.ac.uk
Motivation • Back-propagation algorithm for training neural networks (gradient descent) • Support vector machines • Convex optimization `boom’ (NIPS, also ICML, KDD...) What explains this success? (Is it really a success?) (Mainly for CP-ers not familiar with continuous optimization)
(Convex) continuous optimization • Continuousoptimization: • Convex optimization:
Convex optimization • General convex optimization approach • Start with a guess, iteratively improve until optimum found • E.g. Gradient descent, conjugate gradient, Newton method, etc • For constrained convex optimization:Interior point methods • Provably efficient (worst-case, typical case even better) • Iteration complexity: • Complexity per iteration: polynomial • Out-of-the-box tools exist (SeDuMi, SDPT3, MOSEK...) • Purely declarative • Book: Convex Optimization (Boyd & Vandenberghe)
Convex optimization Convex optimization Logdet Cone Programming LP QP Geometric programming SOCP SDP
Linear Programming (LP) • Linear objectiveLinear inequality constraintsAffine equality constraints • Applications: • Relaxations of Integer LP’s • Classification: linear support vector machines (SVM),forms of boosting • (Lots outside DM/ML)
Convex Quadratic Programming (QP) • Convex Quadratic constraints • LP is a special case where • Applications: • Classification/regression: SVM • Novelty detection: minimum volume enclosing hypersphere • Regression + feature selection: lasso • Structured prediction problems
Second-Order Cone Programming (SOCP) • Second Order Cone constraints • QCQP is a special case where • Applications: • Metric learning • Fermat-Weber problem: find a point in a plane with minimal sum of distances to a set of points • Robust linear programming
Semi-Definite Programming (SDP) • Constraints requiring a matrix to be Positive Semi-Definite: • SOCP is a special case: • Applications: • Metric learning • Low rank matrix approximations (dimensionality reduction) • Very tight relaxations of graph labeling problems (e.g. Max-cut) • Semi-supervised learning • Approximate inference in difficult graphical models
Geometric programming • Objective and constraints of the form: • Applications: • Maximum entropy modeling with moment constraints • Maximum likelihood fitting of exponential family distributions
Log Determinant Optimization (Logdet) • Objective is the log determinant of a matrix: • = -volume of parallelepiped spanned by columns of X • Applications: • Novelty detection: minimum volume enclosing ellipsoid • Experimental design / active learning (which labels for which data points are likely to be most informative)
Eigenvalue problems • Eigenvalue problems are not convex optimization problems • Still, a relatively efficient and globally convergent, and a useful primitive: • Dimensionality reduction (PCA) • Finding relations between datasets (CCA) • Spectral clustering • Metric learning • Relaxations of combinatorial problems
The hype • Very popular in conferences like NIPS, ICML, KDD • These model classes are sufficiently rich to do sophisticated things • Sparsity: L1 norm/linear constraints feature selection • Low-rank of matrices: SDP constraint and trace norm (sparse PCA, labeling problems...) • Declarative nature, little expertise needed • Computational complexity is easy to understand
After the hype • But: • Polynomial-time, often with a high exponentE.g. SDP: and sometimes • Convex constraints can be too limitative • Tendency toward other paradigms: • Convex-concave programming (Few guarantees, but works well in practice) • Submodular optimization(Approximation guarantees, works well in practice)
CP vs Convex Optimization • “CP: Choosing the best model is an art” (Helmut)“CP requires skill and ingenuity” (Barry) • I understand in CP there is a hierarchy of propagation methods, but... • Is there a hierarchy of problem complexities? • How hard is it to see if a constraint will propagate well? • Does it depend on the implementation? • ...