230 likes | 239 Views
This final report project explores the use of preconditioning to enhance the stability of the Interior Point Method (IPM) for linearly constrained optimization. The report outlines the problem, approach, results, and recommendations.
E N D
Improving Performance of The Interior Point Methodby Preconditioning Final Report Project by: Ken Ryals For: AMSC 663-664 Fall 2007-Spring 2008 6 May 2008
Outline • Problem • Approach • Results • Recommendations
Problem • Approach • Results • Recommendations Problem Statement • Specific Research Problem: • Optimization of Distributed Command and Control for the OSD A&T • General Research Area: • The Interior Point Method (IPM) for linearly Constrained Optimization • The research focus is to improve the stability of the IPM through preconditioning using the inverse of the constraint matrix and its transpose while maintaining the improvements available from factorization and iterative solution techniques for linear systems.
Problem • Approach • Results • Recommendations Problem: Notation In constrained optimization… • The problem is: minimize cTx varying x subject to: Ax=b with x≥0 • We often make use of the “dual problem” maximizebTy varying y subject toATy+z=cwithz≥0 • We define the “duality gap” μ xz. • If μ=0, both problems yield the same solution. • Finally, in iterative solutions, the residual is indicated as “r”.
Problem • Approach • Results • Recommendations Problem: IPM Stability The IPM solves a sequence of linearly constrained optimization problems in the primal and dual spaces simultaneously to approach the desired solution from within the feasible region. • Solve AD2AtΔy = -Ar ,where D2 ~ diag(x/z). • From Δy we can get Δx and Δz. • Iterate by gradually reducing the “duality gap” towards zero. • As μ=xz 0, the matrix D becomes increasingly ill-conditioned. • Recall: Δx + πΔz = r A Δx = 0 ATΔy + Δz = 0. x is the unknown y is the “dual” of x z is the “slack”
Problem • Approach • Results • Recommendations Problem: Specific Application Solve a Distributed Command and Control optimization problem for the Office of the Secretary of Defense/Acquisition and Technology (OSD/A&T). • The OSD/A&T, has a need for an optimization tool to use with the Distributed Command and Control System for nuclear assets. • Presently the problem is in linear format. • The problem can become more complex in the future. • IPM can handle this form also. • Obtaining the OSD A&T data, even in a sanitized form, has not been possible so far... • As a Risk Mitigation, NETLIB test problems of appropriate dimension have been used as testing surrogates. Since the problem dimensionality is small (~dozens), the key issue is stability.
Problem • Approach • Results • Recommendations Approach: Three Approaches • Examine three major algorithmic components: • Factorization (QR) • To reduce condition numbers of matrices. ex: 108 104 • (A specific) Preconditionner • To take advantage of an apparent synergism between the constraint matrix (A) and behavior of the family of solutions as the duality gap (μ) is reduced. • Iterative solution of linear systems of equations • Using the (Preconditioned) Conjugate Gradient solver.
Problem • Approach • Results • Recommendations Approach: Factorization • We are solving: AD2AtΔy = □ where D2 ~ diag(x/z) • Re-writing:ADDtAtΔy = □ • Let DtAt=QR Q: n by n and orthonormal, R: n by m upper triangular • Substituting for ADDtAt gives: • From which we get: RtQtQR Δy = □ • Since QtQ = I, we have RtR Δy = □ • Where RtR is a Cholesky Decomposition of AD2At obtained without using AD2At but “only half of it” (DtAt) for much better stability.
Problem • Approach • Results • Recommendations Approach: Preconditioning The specific preconditioning being examined in this research is as follows: We are solving: A D2 AtΔy = −Ar What if we pre-multiplied by A-1? A-1 A D2 AtΔy = − A-1 Ar This would give us: D2 AtΔy = − r A is not square, so it isn’t invertible; but AAt is… (AAt)-1 A D2 AtΔy = − (AAt)-1 Ar Conceptually, this gives us: (At)-1D2 AtΔy = − (AT)-1 r which would be a similarity transformation on D2 using At.
Problem • Approach • Results • Recommendations Approach: Iterative Solution • Iterative solvers produce an approximate solution to a linear system of equations. • They use fewer operations; thus, they are faster. • IPM solves a sequence of problems that approximate the “real” problem. • Thus, approximate solutions are fine. • Both the IPM and iterative solvers use a solution “tolerance”, so the approximate solutions to the approximate problems can have similar tolerances.
Problem • Approach • Results • Recommendations Results: Basics Development • Software Components: • Components for all algorithm components • Parallel structure for “fairness” • Test Metrics Defined: • Speed and stability metrics (4) • Ancillary Components: • Data Loading and Storage • Initial Value Validation • Not “Too Good”! • Plotting and plot storage • Statistical Assessment
Problem • Approach • Results • Recommendations Results: Analysis Status • Test-bed Architecture created to flexibly implement eight combinations of the four software components. • Basic IPM • Preconditioned IPM • Factorized IPM • Preconditioned and Factorized IPM With (P)CG and without (“\”)
Problem • Approach • Results • Recommendations Results: Data for Validation • Netlib Test Problems • The last one (BLEND) is beyond the dimension of the OSD problem, so it serves as a good limiting case. • All eight versions of the IPM program achieved identical solutions to these benchmarking problems.
Problem • Approach • Results • Recommendations Results: Sample for “afiro” …
Problem • Approach • Results • Recommendations Results: General Linear Model A General Linear Model (GLM) of the results: A GLM like above fits a multi-dimensional “line” through the data; for example: Iterations ≈101.2 + 44.7*(if_CG) + 24.8 *(if_Precond) - 4.7*(if_QR)
Problem • Approach • Results • Recommendations Results: Rankings A look at the rank of algorithms is: Average ranking over all problems (1=best, 8=worst) Specific Observation: • The “adlittle” problem encountered a numerical instability with the non-CG, preconditioned, non-QR algorithm.
Problem • Approach • Results • Recommendations Results: Conclusions • Using the (QR) decomposition: • Reduced the number of iterations, but increased the execution time slightly (decomposition cost), and • Improves stability (decreases condition number, and Y&Z variability (negligible impact on X variability). • Using the Conjugate Gradient solver: • Required more iterations and time (maybe ?), and • Improved stability (condition and variability) • Using the Preconditionner: • On average (à la GLM), did not improve things, • On occasion (à la rankings), did make things better. Results! Why, man, I have gotten a lot of results. I know several thousand things that won't work. Thomas A. Edison
Problem • Approach • Results • Recommendations Recommendations: Future Algorithm Choice: • Use the QR decomposition with the CG solver: • Gradually lower the CG tolerance as the IPM optimization algorithm decreases the duality gap. • Improve the initial value generator. • An intentionally non-optimal method was used to generate valid starting points to better stress the algorithms. • In re-optimizing the DC&C network, the past solution should provide a better initial point. • The proposed preconditionner did not uniformly improve performance, but showed promise on some problems, so revisit it with “real” data.
Acknowledgements • Thanks to: • My Problem Sponsors • Chris Wright and Pete Pandolfini (JHU/APL) • AMSC 663-664 Instructors • Radu Balan and Aleksey Zimin • My Advisor • D. P. O’Leary • Members of the AMSC 663-664 class For many helpful suggestions and thought provoking questions.
Questions? Contact: KenRyals “at” aol.com Kenneth.Ryals “at” jhuapl.edu
Iterative Solution – CG • The (preconditioned) conjugate gradient is the iterative solverbeing used in this research: • What is CG? • Recall the Steepest Descent Algorithm where each step direction is determined by the local gradient. • The CG Algorithm starts out using the local gradient and then works “smarter” on successive steps by using the local and past information. • The CG algorithm is frequently used with preconditioning which makes it an ideal choice for this project. Ref:An Introduction to the Conjugate Gradient Method Without the Agonizing Pain, Edition 1¼, by Jonathan Richard Shewchuk, August 4, 1994, School of Computer Science, Carnegie Mellon University.