590 likes | 1.14k Views
Convex Optimization: Part 1 of Chapter 7 Discussion. Presenter: Brian Quanz. About today’s discussion…. Chapter 7 – no separate discussion of convex optimization Discusses with SVM problems Instead: Today: Discuss convex optimization
E N D
Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz
About today’s discussion… • Chapter 7 – no separate discussion of convex optimization • Discusses with SVM problems • Instead: • Today: Discuss convex optimization • Next Week: Discuss some specific convex optimization problems (from text), e.g. SVMs
About today’s discussion… • Mostly follow alternate text: • Convex Optimization, Stephen Boyd and Lieven Vandenberghe • Borrowed material from book and related course notes • Some figures and equations shown here • Available online: http://www.stanford.edu/~boyd/cvxbook/ • Nice course lecture videos available from Stephen Boyd online: http://www.stanford.edu/class/ee364a/ • Corresponding convex optimization tool (discuss later) - CVX: http://www.stanford.edu/~boyd/cvx/
Overview • Why convex? What is convex? • Key examples of linear and quadratic programming • Key mathematical ideas to discuss: ->Lagrange Duality->KKT conditions • Brief concept of interior point methods • CVX – convex opt. made easy
Mathematical Optimization • All learning is some optimization problem-> Stick to canonical form • x = (x1, x2, …, xp ) – opt. variables ; x* • f0 : Rp -> R – objective function • fi : Rp -> R – constraint function
Optimization Example • Well familiar with: regularized regression • Least squares • Add some constraints, ridge, lasso
Why convex optimization? • Can’t solve most OPs • E.g. NP Hard, even high polynomial time too slow • Convex OPs • (Generally) No analytic solution • Efficient algorithms to find (global) solution • Interior point methods (basically Iterated Newton) can be used: • ~[10-100]*max{p3 , p2m, F} ; F cost eval. obj. and constr. f • At worst solve with general IP methods (CVX), faster specialized
What is Convex Optimization? • OP with convex objective and constraint functions • f0 , … , fm are convex = convex OP that has an efficient solution!
Convex Function • Definition: the weighted mean of function evaluated at any two points is greater than or equal to the function evaluated at the weighted mean of the two points
Convex Function • What does definition mean? • Pick any two points x, y and evaluate along the function, f(x), f(y) • Draw the line passing through the two points f(x) and f(y) • Convex if function evaluated on any point along the line between x and y is below the line between f(x) and f(y)
Convex Function Convex!
Convex Function Not Convex!!!
Convex Function • Easy to see why convexity allows for efficient solution • Just “slide” down the objective function as far as possible and will reach a minimum
Convex vs. Non-convex Ex. • Convex, min. easy to find Affine – border case of convexity
Convex vs. Non-convex Ex. • Non-convex, easy to get stuck in a local min. • Can’t rely on only local search techniques
Non-convex • Some non-convex problems highly multi-modal, or NP hard • Could be forced to search all solutions, or hope stochastic search is successful • Cannot guarantee best solution, inefficient • Harder to make performance guarantees with approximate solutions
Determine/Prove Convexity • Can use definition (prove holds) to prove • If function restricted to any line is convex, function is convex • If 2X differentiable, show hessian >= 0 • Often easier to: • Convert to a known convex OP • E.g. QP, LP, SOCP, SDP, often of a more general form • Combine known convex functions (building blocks) using operations that preserve convexity • Similar idea to building kernels
Some common convex OPs • Of particular interest for this book and chapter: • linear programming (LP) and quadratic programming (QP) • LP: affine objective function, affine constraints -e.g. LP SVM, portfolio management
LP Visualization Note: constraints form feasible set -for LP, polyhedra
Quadratic Program • QP: Quadratic objective, affine constraints • LP is special case • Many SVM problems result in QP, regression • If constraint functions quadratic, then Quadratically Constrained Quadratic Program (QCQP)
Second Order Cone Program • Ai = 0 - results in LP • ci = 0 - results in QCQP • Constraint requires the affine functions to lie in 2nd order cone
Semidefinite Programming • Linear matrix inequality (LMI) constraints • Many problems can be expressed using LMIs • LP and SOCP
Building Convex Functions • From simple convex functions to complex: some operations that preserve complexity • Nonnegative weighted sum • Composition with affine function • Pointwise maximum and supremum • Composition • Minimization • Perspective ( g(x,t) = tf(x/t) )
Verifying Convexity Remarks • For more detail and expansion, consult the referenced text, Convex Optimization • Geometric Programs also convex, can be handled with a series of SDPs (skipped details here) • CVX converts the problem either to SOCP or SDM (or a series of) and uses efficient solver
Lagrangian • Standard form: • Lagrangian L: • Lambda, nu, Lagrange multipliers (dual variables)
Lagrange Dual Function • Lagrange Dual found by minimizing L with respect to primal variables • Often can take gradient of L w.r.t. primal var.’s and set = 0 (SVM)
Lagrange Dual Function • Note: Lagrange dual function is the point-wise infimum of family of affine functions of (lambda, nu) • Thus, g is concave even if problem is not convex
Lagrange Dual Function • Lagrange Dual provides lower bound on objective value at solution
Lagrangian as Linear Approximation, Lower Bound • Simple interpretation of Lagrangian • Can incorporate the constraints into objective as indicator functions • Infinity if violated, 0 otherwise: • In Lagrangian we use a “soft” linear approximation to the indicator functions; under-estimator since 0
Lagrange Dual Problem • Why not make the lower bound best possible? • Dual problem: • Always convex opt. problem (even when primal is non-convex) • Weak Duality: d* <= p* (have already seen this)
Strong Duality • If d* = p*, strong duality holds • Does not hold in general • Slater’s Theorem: If convex problem, and strictly feasible point exists, then strong duality holds! (proof too involved, refer to text) • => For convex problems, can use dual problem to find solution
Complementary Slackness • When strong duality holds • Sandwiched between f0(x), last 2 inequalities are equalities, simple! (definition) (since constraints satisfied at x*)
Complementary Slackness • Which means: • Since each term is non-positive, we have complementary slackness: • Whenever constraint is non-active, corresponding multiplier is zero
Complementary Slackness • This can also be described by • Since usually only a few active constraints at solution (see geometry), the dual variable lambda is often sparse • Note: In general no guarantee
Complementary Slackness • As we will see, this is why support vector machines result in solution with only key support vectors • These come from the dual problem, constraints correspond to points, and complementary slackness ensures only the “active” points are kept
Complementary Slackness • However, avoid common misconceptions when it comes to SVM and complementary slackness! • E.g. if Lagrange multiplier is 0, constraint could still be active! (not bijection!) • This means:
KKT Conditions • The KKT conditions are then just what we call that set of conditions required at the solution (basically list what we know) • KKT conditions play important role • Can sometimes be used to find solution analytically • Otherwise can think of many methods as ways of solving KKT conditions
KKT Conditions • Again given strong duality and assuming differentiable, since gradient must be 0 at x* • Thus, putting it all together, for non-convex problems we have
KKT Conditions – non-convex • Necessary conditions
KKT Conditions – convex • Also sufficient conditions: • 1+2 -> xt is feasible. • 3 -> L(x,lt,nt) is convex • 5 -> xt minimizes L(x,lt,nt) so g(lt,nt) = L(xt,lt,nt)
Brief description of interior point method • Solve a series of equality constrained problems with Newton’s method • Approximate constraints with log-barrier (approx. of indicator)
Brief description of interior point method • As t gets larger, approximation becomes better
CVX: Convex Optimization Made Easy • CVX is a Matlab toolbox • Allows you to flexibly express convex optimization problems • Translates these to a general form and uses efficient solver (SOCP, SDP, or a series of these) • http://www.stanford.edu/~boyd/cvx/ • All you have to do is design the convex optimization problem • Plug into CVX, a first version of algorithm implemented • More specialized solver may be necessary for some applications
CVX - Examples • Quadratic program: given H, f, A, and b • cvx_begin variable x(n) minimize (x’*H*x + f’*x) subject to A*x >= bcvx_end