510 likes | 682 Views
Review + Announcements. 2/22/08. Presentation schedule. Friday 4/25 (5 max) Tuesday 4/29 (5 max) 1. Miguel Jaller 8:03 1. Jayanth 8:03 2. Adrienne Peltz 8:20 2. Raghav 9;20 3. Olga Grisin 8:37 3. Rhyss 8:37 4. Dan Erceg 8:54 4. Tim *:54
E N D
Review + Announcements 2/22/08
Presentation schedule Friday 4/25 (5 max) Tuesday 4/29 (5 max) 1. Miguel Jaller 8:03 1. Jayanth 8:03 2. Adrienne Peltz 8:20 2. Raghav 9;20 3. Olga Grisin 8:37 3. Rhyss 8:37 4. Dan Erceg 8:54 4. Tim *:54 5. Nick Suhr 9:11 5-6. Lindsey Garret and Mark Yuhas 9:11 6. Christos Boutsidis 9:28 Monday 4/28 4:00 7:00 Pizza included Lisa Pak Christos Boutsidis David Doria. Zhi Zeng Carlos Varun Samrat Matt Adarsh Ramsuhramonian Be on time. Plan your presentation for 15 minutes. Strict schedule. Suggest putting presentation in Your public_html directory in rcs so you can click and go. Monday night class is in Amos Eaton 214 4 to 7.
Other Dates • Project Papers due Friday (or in class Monday if you have a Friday presentation) • Final Tuesday 5/6 3 p.m. Eaton 214 Open book/note (no computers) Comprehensive. Labs fair game too. • Office hours Monday 5/5 10 to 12 (or email)
What did we learn? Theme 1: “There is nothing more practical than a good theory” - Kurt Lewin Algorithm arise out of the optimality conditions.
What did we learn? Theme 2: To solve a harder problem, reduce it to an easier problem that you already know how to solve.
Fundamental Theoretical Ideas • Convex functions and sets • Convex programs • Differentiability • Taylor Series Approximations • Descent Directions Combining these with the ideas of feasible directions provides the basis for optimality conditions.
f(y) f(x) x y Convex Functions A function f is (strictly) convex on a convex set S, if and only if for any x,yS, f(x+(1- )y)(<) f(x)+ (1- )f(y) for all 0 1. f(λx+(1- )y) λx+(1- )y
Convex Sets A set S is convex if the line segment joining any two points in the set is also in the set, i.e., for any x,yS, x+(1- )y S for all 0 1 }. convex not convex convex not convex not convex
Convex Program min f(x) subject to xS where f and S are convex • Make optimization nice • Many practical problems are convex problem • Use convex program as subproblem for nonconvex programs
Theorem : Global Solution of convex program If x* is a local minimizer of a convex programming problem, x* is also a global minimizer. Further more if the objective is strictly convex then x* is the unique global minimizer. Proof: contradiction x* f(y)<f(x*) y
First Order Taylor Series Approximation • Let x=x*+p • Says that a linear approximation of a function works well locally f(x) x*
Second Order Taylor Series Approximation • Let x=x*+p • Says that a quadratic approximation of a function works even better locally f(x) x*
Descent Directions • If the directional derivative is negative then • linesearch will lead to decrease in the function [8,2] d [0,-1]
First Order Necessary Conditions Theorem: Let f be continuously differentiable. If x* is a local minimizer of (1), then
Second Order Sufficient Conditions Theorem: Let f be twice continuously differentiable. If and then x* is a strict local minimizer of (1).
Second Order Necessary Conditions Theorem: Let f be twice continuously differentiable. If x* is a local minimizer of (1) then
Optimality Conditions • First Order Necessary • Second Order Necessary • Second Order Sufficient With convexity the necessary conditions become sufficient.
Easiest Problem Line Search = 1-D Optimization • Optimality conditions based on first and second derivatives • Golden section search (1)
Sometimes can solve linesearch exactly • The exact stepsize can be found
General Optimization algorithm • Specify some initial guess x0 • For k = 0, 1, …… • If xk is optimal then stop • Determine descent direction pk • Determine improved estimate of the solution: xk+1=xk+kpk Last step is one-dimensional search problem called line search
Newton’s Method • Minimizing quadratic has closed form
General nonlinear functions • For non-quadratic f (twice cont. diff): • Approximate by 2nd order TSA • Solve for FONC for quadratic approx.
Basic Newton’s Algorithm • Start with x0 • For k =1,…,K • If xk is optimal then stop • Solve: • Xk+1=xk+p
Final Newton’s Algorithm • Start with x0 • For k =1,…,K • If xk is optimal then stop • Solve: using modified cholesky factorization • Perform linesearch to determine Xk+1=xk+kpk What are pros and cons?
Steepest Descent Algorithm • Start with x0 • For k =1,…,K • If xk is optimal then stop • Perform exact or backtracking linesearch to determine xk+1=xk+kpk
Inexact linesearch can work quite well too! For 0<c1<c2<1 Solution exists for any descent direction if f is bounded below on the linesearch. (Lemma 3.1)
Conditioning Important for gradient methods! 50(x-10)^2+y^2 Cond num =50/1=50 Steepest Descent ZIGZAGS!!! Know Pros and Cons of each approach
Conjugate Gradient (CG) • Method for minimizing quadratic function • Low storage method CG only stores vector information • CG superlinear convergence for nice problems or when properly scaled • Great for solving QP subproblems
Quasi Newton MethodsPros and Cons • Globally converges to a local min always find descent direction • Superlinear convergence • Requires only first order information – approximates Hessian • More complicated than steepest descent • Requires sophisticated linear algebra Have to watch out for numerical error
Quasi Newton MethodsPros and Cons • Globally converges to a local min • Superlinear convergence w/o computing Hessian • Works great in practice. Widely used. • More complicated than steepest descent • Best implementations require sophisticated linear algebra, linesearch, dealing with curvature conditions. Have to watch out for numerical error.
Trust Region Methods • Alternative to line search methods • Optimize quadratic model of objective within the “trust region”
Easiest Problem • Linear equality constraints
Lemma 14.1 Necessary Conditions (Nash + Sofer) • If x* is a local min of f over {x|Ax=b}, and Z is a null matrix • Or equivalently use KKT Conditions Other conditions Generalize similarly
Handy ways to compute Null Space • Variable Reduction Method • Orthogonal Projection Matrix • QR factorization (best numerically) • Z=Null(A) in matlab
Next Easiest Problem • Linear equality constraints Constraints form a polyhedron
x* Inequality Case Inequality problem a2x =b5 a5x = b5 a2 Polyhedron Ax>=b a3x = b3 Inequality FONC: a4x = b4 a1 a1x = b1 Nonnegative Multipliers imply gradient points to the greater than Side of the constraint.
Second Order Sufficient Conditions for Linear Inequalities • If (x*,*) satisfies
Sufficient Conditions for Linear Inequalities where Z+ is a basis matrix for Null(A +) and A + corresponds to nondegenerate active constraints) i.e.
General Constraints Careful : Sufficient conditions are the same as before Necessary conditions have extra constraint qualification to make sure Lagrangian multipliers exist!
Necessary Conditions General • If x* satisfies LICQ and is a local min of f over {x|g(x)>=0,h(x)=0},
Algorithms build on prior Approaches • Linear Equality Constrained: Convert to unconstrained and solve Different ways to represent Null space produce Algorithms in practice
Prior Approaches (cont) • Linear Inequality Constrained: Identify active constraints Solve equality constrained subproblems • Nonlinear Inequality Constrained: Linearize constraints Solve subproblems
Active Set MethodsNW 16.5 Change one item of working set at a time
Interior point algorithms NW 16.6 Traverse interior of set (a little more later)
Gradient Projection NW 16.7 Change many elements of working set at once
Generic inexact penalty problem From To What are penalty problems and why do we use them? Difference between exact and inexact penalties.
Augmented Lagrangian • Consider min f(x) s.t h(x)=0 • Start with L(x, )=f(x)-’h(x) • Add penalty L(x, ,c)=f(x)-’h(x)+μ/2||h(x)||2 • The penalty helps insure that the point is feasible. Why do we like these? How do they work in practice?
Sequential Quadratic Programming (SQP) Basic Idea: QP with constraints are easy. For any guess of active constraints, just have to solve system of equations. So why not solve general problem as a series of constrained QPs. Which QP should be used?
Trust Region Works Great • We only trust approximation locally so limit step to this region by adding constraint to QP Trust region No stepsize needed!
Advanced topics • Duality Theory – Can choose to solve primal or dual problem. Dual is always nice. But there may be a “duality gap” if overall problem is not nice. • Nonsmooth optimization Can do the whole thing again on the basis of subgradients instead of gradients.