1 / 30

Scalable Stochastic Programming

Scalable Stochastic Programming. Cosmin G. Petra Mathematics and Computer Science Division Argonne National Laboratory petra@mcs.anl.gov Joint work with Mihai Anitescu , Miles Lubin and Victor Zavala. Motivation. Sources of uncertainty in complex energy systems Weather Consumer Demand

Download Presentation

Scalable Stochastic Programming

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scalable Stochastic Programming Cosmin G. Petra Mathematics and Computer Science Division Argonne National Laboratory petra@mcs.anl.gov Joint work with MihaiAnitescu, Miles Lubin and Victor Zavala

  2. Motivation • Sources of uncertainty in complex energy systems • Weather • Consumer Demand • Market prices • Applications @Argonne – Anitescu, Constantinescu, Zavala • Stochastic Unit Commitment with Wind Power Generation • Energy management of Co-generation • Economic Optimization of a Building Energy System

  3. Stochastic Unit Commitment with Wind Power Thermal Units Schedule? Minimize Cost Satisfy Demand/Adopt wind power Have a Reserve Technological constraints • Wind Forecast – WRF(Weather Research and Forecasting) Model • Real-time grid-nested 24h simulation • 30 samples require 1h on 500 CPUs (Jazz@Argonne) Thermal generator Wind farm Slide courtesy of V. Zavala & E. Constantinescu

  4. Optimization under Uncertainty • Two-stage stochastic programming with recourse (“here-and-now”) subj. to. subj. to. continuous discrete Sample average approximation (SAA) subj. to. Sampling Statistical Inference M batches

  5. Solving the SAA problem – PIPS solver • Interior-point methods (IPMs) • Polynomial iteration complexity: (in theory) • IPMs perform better in practice (infeasible primal-dual path-following) • No more than 30-50 iterations have been observed for n less than 10 million • We can confirm that this is still true for n being hundred times larger • Two linear systems solved at each iteration • Direct solvers needs to be used because IPMs linear systems are ill-conditioned and needs to be solved accurately • We solve the SAA problems with a standard IPM (Mehrotra’s predictor-corrector) and specialized linear algebra • PIPS solver Go to ”Insert (View) | Header and Footer" to add your organization, sponsor, meeting name here; then, click "Apply to All"

  6. Linear Algebra of Primal-Dual Interior-Point Methods Convex quadratic problem IPM Linear System Min subj. to. Multi-stage SP Two-stage SP nested arrow-shaped linear system (modulo a permutation) S is the number of scenarios

  7. The Direct Schur Complement Method (DSC) • Uses the arrow shape of H • 1.Implicit factorization 2. Solving Hz=r • 2.1. Backward substitution 2.2. Diagonal Solve 2.3. Forward substitution

  8. Parallelizing DSC – 1. Factorization phase Process 1 Dense linear algebra LAPACK Process 2 Process 1 2. Triangular solves Process p Factorization of the 1st stage Schur complement matrix = BOTTLENECK Sparse linear algebra MA57

  9. Parallelizing DSC – 2. Triangular solves Process 1 Process 1 Process 2 Process 2 Dense linear algebra Process 1 1.Factorization Process p Process p 1st stage backsolve = BOTTLENECK Sparse linear algebra Sparse linear algebra

  10. Implementation of DSC Comm Comm Dense solve forw.subst. Proc 1 Fact Backsolves Dense fact backsolve Dense solve forw.subst. Proc 2 Fact Backsolves Dense fact backsolve Dense solve forw.subst. Proc p Fact Backsolves Dense fact backsolve MPI_Allreduce MPI_Allreduce Computations are replicated on each process. Factorization Triangular solves Go to ”Insert (View) | Header and Footer" to add your organization, sponsor, meeting name here; then, click "Apply to All"

  11. Scalability of DSC Unit commitment 76.7% efficiency butnot always the case Large number of 1st stage variables: 38.6% efficiency on Fusion @ Argonne

  12. BOTTLENECK SOLUTION 1: STOCHASTIC PRECONDITIONER

  13. The Stochastic Preconditioner • The exact structure of C is • IID subset of n scenarios: • The stochastic preconditioner(P. & Anitescu, in COAP 2011) • For C use the constraint preconditioner (Keller et. al., 2000)

  14. Implementation of PSC Comm Comm Comm Dense fact of prcnd. F. B. Proc p+1 Backsolve F. B. Fact Backsolves backsolve Proc 1 F. B. Fact Backsolves backsolve Proc 2 F. B. Proc p Fact Backsolves backsolve MPI_Reduce (to proc p+1) MPI_Reduce (to proc 1) MPI_Allreduce Comm Proc p+1 Prcnd tri. slv. • forw.subst. comm Proc 1 Krylov solve • forw.subst. REMOVES the factorization bottleneck Slightly larger solve bottleneck Proc 2 idle • forw.subst. Proc p idle • forw.subst. MPI_Bcast

  15. The “Ugly” Unit Commitment Problem • DSC on P processes vs PSC on P+1 process Optimal use of PSC – linear scaling • 120 scenarios Factorization of the preconditioner can not be hidden anymore.

  16. Quality of the Stochastic Preconditioner • “Exponentially” better preconditioning (P. & Anitescu, 2011) • Proof: Hoeffding inequality • Assumptions on the problem’s random data • Boundedness • Uniform full rank of and not restrictive

  17. Quality of the Constraint Preconditioner • has an eigenvalue 1 with order of multiplicity . • The rest of the eigenvalues satisfy • Proof: based on Bergamaschiet. al., 2004.

  18. The Krylov Methods Used for • BiCGStab using constraint preconditioner M • Preconditioned Projected CG (PPCG) (Gould et. al., 2001) • Preconditioned projection onto the • Does not compute the basis for Instead,

  19. Performance of the preconditioner • Eigenvalues clustering & Krylov iterations • Affected by the well-known ill-conditioning of IPMs.

  20. SOLUTION 2: PARALELLIZATION OF STAGE 1 LINEAR ALGEBRA

  21. Parallelizing the 1st stage linear algebra • We distribute the 1st stage Schur complement system. • C is treated as dense. • Alternative to PSC for problems with large number of 1st stage variables. • Removes the memory bottleneck of PSC and DSC. • We investigated ScaLapack, Elemental (successor of PLAPACK) • None have a solver for symmetric indefinite matrices (Bunch-Kaufman); • LU or Cholesky only. • So we had to think of modifying either. densesymm. pos. def., sparse full rank.

  22. Cholesky-based -like factorization • Can be viewed as an “implicit” normal equations approach. • In-place implementation inside Elemental: no extra memory needed. • Idea: modify the Cholesky factorization, by changing the sign after processing p columns. • It is much easier to do in Elemental, since this distributes elements, not blocks. • Twice as fast as LU • Works for more general saddle-point linear systems, i.e., pos. semi-def. (2,2) block.

  23. Distributing the 1st stage Schur complement matrix • All processors contribute to all of the elements of the (1,1) dense block • A large amount of inter-process communication occurs. • Each term is too big to fit in a node’s memory. • Possibly more costly than the factorization itself. • Solution: collective MPI_Reduce_scatter calls • Reduce (sum) terms, then partition and send to destination (scatter) • Need to reorder (pack) elements to match matrix distribution • Columns of the Schur complement matrix are distributed as they are calculated

  24. DSC with distributed first-stage For each b=1:B ELEMENTAL Comm ELEMENTAL Comm Proc 1 Fact Backsolves backsolve forw.subst. Fact Backsolves forw.subst. Proc 2 backsolve Fact Backsolves forw.subst. Proc p backsolve MPI_Reduce_scatter MPI_Allreduce Schur complement matrix is computed and reduced block-wise. (B blocks of columns ) Go to ”Insert (View) | Header and Footer" to add your organization, sponsor, meeting name here; then, click "Apply to All"

  25. Reduce operations • Streamlined copying procedure - Lubin and Petra (2010) • Loop over continuous memory and copy elements in send buffer • Avoids divisions and modulus ops needed to compute the positions • “Symmetric” reduce for • Only lower triangle is reduced • Fixed buffer size • A variable number of columns reduced. • Effectively halves the communication (both data & # of MPI calls).

  26. Large-scale performance • First-stage linear algebra: ScaLapack (LU), Elemental(LU), and • Strong scaling of PIPS with and • 90.1% from 64 to 1024 cores • 75.4% from 64 to 2048 cores • > 4,000 scenarios • On Fusion • Lubin, P., Anitescu, in OMS 2011 SAA problem: 1st stage variables: 82,000 Total #: 189 million Thermal units: 1,000 Wind farms: 1,200

  27. Towards real-life models – Economic dispatch with transmission constraints • Current status: ISOs (Independent system operator) use • deterministic wind profiles, market prices and demand • network (transmission) constraints • Outer 1-h timestep 24 horizon simulation • Inner 5-min timestep 1h horizon corrections • Stochastic ED with transmission constraints (V. Zavala et. al. 2010) • Stochastic wind profiles & transmission constraints • Deterministic market prices and demand • 24 horizon with 1h timestep • Kirchoff’s laws are part of the constraints • The problem is huge: KKT systems are 1.8 Bil x 1.8 Bil Load node (bus) Generator

  28. Solving ED with transmission constraints on Intrepid BG/P • 32k wind scenarios (k=1024) • 32k nodes (131,072 cores) on Intrepid BG/P • Hybrid programming model: SMP inside MPI • Sparse 2nd-stage linear algebra: WSMP (IBM) • Dense 1st-stage linear algebra: Elemental with SMP BLAS + OpenMPfor packing/unpacking buffer. • For a 4h Horizon problem very good strong scaling • Lubin, P., Anitescu, Zavala – in proceedings of SC 11.

  29. Stochastic programming – a scalable computation pattern • Scenario parallelization in a hybrid programming model MPI+SMP • DSC, PSC (1st stage < 10,000 variables) • Hybrid MPI/SMP running on Blue Gene/P • 131k cores (96% strong scaling) for Illinois ED problem with grid constraints. 2B variables, maybe largest ever solved? • Close to real-time solutions (24 hr horizon in 1 hr wallclock) • Further development needed, since users aim for • More uncertainty, more detail (x 10) • Faster Dynamics  Shorter Decision Window (x 10) • Longer Horizons (California == 72 hours) (x 3) Go to ”Insert (View) | Header and Footer" to add your organization, sponsor, meeting name here; then, click "Apply to All"

  30. Thank you for your attention! Questions?

More Related