1 / 54

Convex Optimization: Part 1 of Chapter 7 Discussion

Convex Optimization: Part 1 of Chapter 7 Discussion. Presenter: Brian Quanz. About today’s discussion…. Chapter 7 – no separate discussion of convex optimization Discusses with SVM problems Instead: Today: Discuss convex optimization

vanya
Download Presentation

Convex Optimization: Part 1 of Chapter 7 Discussion

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

  2. About today’s discussion… • Chapter 7 – no separate discussion of convex optimization • Discusses with SVM problems • Instead: • Today: Discuss convex optimization • Next Week: Discuss some specific convex optimization problems (from text), e.g. SVMs

  3. About today’s discussion… • Mostly follow alternate text: • Convex Optimization, Stephen Boyd and Lieven Vandenberghe • Borrowed material from book and related course notes • Some figures and equations shown here • Available online: http://www.stanford.edu/~boyd/cvxbook/ • Nice course lecture videos available from Stephen Boyd online: http://www.stanford.edu/class/ee364a/ • Corresponding convex optimization tool (discuss later) - CVX: http://www.stanford.edu/~boyd/cvx/

  4. Overview • Why convex? What is convex? • Key examples of linear and quadratic programming • Key mathematical ideas to discuss: ->Lagrange Duality->KKT conditions • Brief concept of interior point methods • CVX – convex opt. made easy

  5. Mathematical Optimization • All learning is some optimization problem-> Stick to canonical form • x = (x1, x2, …, xp ) – opt. variables ; x* • f0 : Rp -> R – objective function • fi : Rp -> R – constraint function

  6. Optimization Example • Well familiar with: regularized regression • Least squares • Add some constraints, ridge, lasso

  7. Why convex optimization? • Can’t solve most OPs • E.g. NP Hard, even high polynomial time too slow • Convex OPs • (Generally) No analytic solution • Efficient algorithms to find (global) solution • Interior point methods (basically Iterated Newton) can be used: • ~[10-100]*max{p3 , p2m, F} ; F cost eval. obj. and constr. f • At worst solve with general IP methods (CVX), faster specialized

  8. What is Convex Optimization? • OP with convex objective and constraint functions • f0 , … , fm are convex = convex OP that has an efficient solution!

  9. Convex Function • Definition: the weighted mean of function evaluated at any two points is greater than or equal to the function evaluated at the weighted mean of the two points

  10. Convex Function • What does definition mean? • Pick any two points x, y and evaluate along the function, f(x), f(y) • Draw the line passing through the two points f(x) and f(y) • Convex if function evaluated on any point along the line between x and y is below the line between f(x) and f(y)

  11. Convex Function

  12. Convex Function Convex!

  13. Convex Function Not Convex!!!

  14. Convex Function • Easy to see why convexity allows for efficient solution • Just “slide” down the objective function as far as possible and will reach a minimum

  15. Local Optima is Global (simple proof)

  16. Convex vs. Non-convex Ex. • Convex, min. easy to find Affine – border case of convexity

  17. Convex vs. Non-convex Ex. • Non-convex, easy to get stuck in a local min. • Can’t rely on only local search techniques

  18. Non-convex • Some non-convex problems highly multi-modal, or NP hard • Could be forced to search all solutions, or hope stochastic search is successful • Cannot guarantee best solution, inefficient • Harder to make performance guarantees with approximate solutions

  19. Determine/Prove Convexity • Can use definition (prove holds) to prove • If function restricted to any line is convex, function is convex • If 2X differentiable, show hessian >= 0 • Often easier to: • Convert to a known convex OP • E.g. QP, LP, SOCP, SDP, often of a more general form • Combine known convex functions (building blocks) using operations that preserve convexity • Similar idea to building kernels

  20. Some common convex OPs • Of particular interest for this book and chapter: • linear programming (LP) and quadratic programming (QP) • LP: affine objective function, affine constraints -e.g. LP SVM, portfolio management

  21. LP Visualization Note: constraints form feasible set -for LP, polyhedra

  22. Quadratic Program • QP: Quadratic objective, affine constraints • LP is special case • Many SVM problems result in QP, regression • If constraint functions quadratic, then Quadratically Constrained Quadratic Program (QCQP)

  23. QP Visualization

  24. Second Order Cone Program • Ai = 0 - results in LP • ci = 0 - results in QCQP • Constraint requires the affine functions to lie in 2nd order cone

  25. Second Order Cone (Boundary) in R3

  26. Semidefinite Programming • Linear matrix inequality (LMI) constraints • Many problems can be expressed using LMIs • LP and SOCP

  27. Semidefinite Programming

  28. Building Convex Functions • From simple convex functions to complex: some operations that preserve complexity • Nonnegative weighted sum • Composition with affine function • Pointwise maximum and supremum • Composition • Minimization • Perspective ( g(x,t) = tf(x/t) )

  29. Verifying Convexity Remarks • For more detail and expansion, consult the referenced text, Convex Optimization • Geometric Programs also convex, can be handled with a series of SDPs (skipped details here) • CVX converts the problem either to SOCP or SDM (or a series of) and uses efficient solver

  30. Lagrangian • Standard form: • Lagrangian L: • Lambda, nu, Lagrange multipliers (dual variables)

  31. Lagrange Dual Function • Lagrange Dual found by minimizing L with respect to primal variables • Often can take gradient of L w.r.t. primal var.’s and set = 0 (SVM)

  32. Lagrange Dual Function • Note: Lagrange dual function is the point-wise infimum of family of affine functions of (lambda, nu) • Thus, g is concave even if problem is not convex

  33. Lagrange Dual Function • Lagrange Dual provides lower bound on objective value at solution

  34. Lagrangian as Linear Approximation, Lower Bound • Simple interpretation of Lagrangian • Can incorporate the constraints into objective as indicator functions • Infinity if violated, 0 otherwise: • In Lagrangian we use a “soft” linear approximation to the indicator functions; under-estimator since 0

  35. Lagrange Dual Problem • Why not make the lower bound best possible? • Dual problem: • Always convex opt. problem (even when primal is non-convex) • Weak Duality: d* <= p* (have already seen this)

  36. Strong Duality • If d* = p*, strong duality holds • Does not hold in general • Slater’s Theorem: If convex problem, and strictly feasible point exists, then strong duality holds! (proof too involved, refer to text) • => For convex problems, can use dual problem to find solution

  37. Complementary Slackness • When strong duality holds • Sandwiched between f0(x), last 2 inequalities are equalities, simple! (definition) (since constraints satisfied at x*)

  38. Complementary Slackness • Which means: • Since each term is non-positive, we have complementary slackness: • Whenever constraint is non-active, corresponding multiplier is zero

  39. Complementary Slackness • This can also be described by • Since usually only a few active constraints at solution (see geometry), the dual variable lambda is often sparse • Note: In general no guarantee

  40. Complementary Slackness • As we will see, this is why support vector machines result in solution with only key support vectors • These come from the dual problem, constraints correspond to points, and complementary slackness ensures only the “active” points are kept

  41. Complementary Slackness • However, avoid common misconceptions when it comes to SVM and complementary slackness! • E.g. if Lagrange multiplier is 0, constraint could still be active! (not bijection!) • This means:

  42. KKT Conditions • The KKT conditions are then just what we call that set of conditions required at the solution (basically list what we know) • KKT conditions play important role • Can sometimes be used to find solution analytically • Otherwise can think of many methods as ways of solving KKT conditions

  43. KKT Conditions • Again given strong duality and assuming differentiable, since gradient must be 0 at x* • Thus, putting it all together, for non-convex problems we have

  44. KKT Conditions – non-convex • Necessary conditions

  45. KKT Conditions – convex • Also sufficient conditions: • 1+2 -> xt is feasible. • 3 -> L(x,lt,nt) is convex • 5 -> xt minimizes L(x,lt,nt) so g(lt,nt) = L(xt,lt,nt)

  46. Brief description of interior point method • Solve a series of equality constrained problems with Newton’s method • Approximate constraints with log-barrier (approx. of indicator)

  47. Brief description of interior point method • As t gets larger, approximation becomes better

  48. Central Path Idea

  49. CVX: Convex Optimization Made Easy • CVX is a Matlab toolbox • Allows you to flexibly express convex optimization problems • Translates these to a general form and uses efficient solver (SOCP, SDP, or a series of these) • http://www.stanford.edu/~boyd/cvx/ • All you have to do is design the convex optimization problem • Plug into CVX, a first version of algorithm implemented • More specialized solver may be necessary for some applications

  50. CVX - Examples • Quadratic program: given H, f, A, and b • cvx_begin variable x(n) minimize (x’*H*x + f’*x) subject to A*x >= bcvx_end

More Related