Stochastic Optimization of Complex Energy Systems on High-Performance Computers

Stochastic Optimization of Complex Energy Systems on High-Performance Computers Cosmin G. Petra Mathematics and Computer Science Division Argonne National Laboratory petra@mcs.anl.gov SIAM CSE 2013 Joint work with Olaf Schenk(USI Lugano), Miles Lubin (MIT), Klaus Gaertner(WIAS Berlin)

Outline • Application of HPC to power-grid optimization under uncertainty • Parallel interior-point solver (PIPS-IPM) • structure exploiting • Revisiting linear algebra • Experiments on BG/P with the new features

Stochastic unit commitment with wind power • Wind Forecast – WRF(Weather Research and Forecasting) Model • Real-time grid-nested 24h simulation • 30 samples require 1h on 500 CPUs (Jazz@Argonne) Thermal generator Wind farm Slide courtesy of V. Zavala & E. Constantinescu

Stochastic Formulation • Discrete distribution leads to block-angular LP

Large-scale (dual) block-angular LPs Extensive form • In terminology of stochastic LPs: • First-stage variables (decision now): x0 • Second-stage variables (recourse decision): x1, …, xN • Each diagonal block is a realization of a random variable (scenario)

Computational challenges and difficulties • May require many scenarios (100s, 1,000s, 10,000s …) to accurately model uncertainty • “Large” scenarios (Wi up to 250,000 x 250,000) • “Large” 1st stage (1,000s, 10,000s of variables) • Easy to build a practical instance that requires 100+ GB of RAM to solve  Requires distributed memory • Real-time solution needed in our applications

Linear algebra of primal-dual interior-point methods (IPM) Convex quadratic problem IPM Linear System Min subj. to. 2 solves per IPM iteration - predictor directions - corrector directions Multi-stage SP Two-stage SP nested arrow-shaped linear system (modulo a permutation) N is the number of scenarios

Special Structure of KKT System (Arrow-shaped)

Parallel Solution Procedure for KKT System • Steps 1 and 5 trivially parallel • “Scenario-based decomposition” • Steps 1,2,3 are >95% of total execution time.

Components of Execution Time • Notice break in y-axis scale

Scenario Calculations – Steps 1 and 5 • Each scenario is assigned to an MPI process, which locally performs steps 1 and 5. • Matrices are sparse and symmetric indefinite (symmetric with positive and negative eigenvalues). • Computing is very expensive when solving with the factors of against non-zero columns of and multiplying from left with • 4 hours 10 minutes wall time to solve a 4h-horizon problem with 8k scenarios on 8k nodes. • Need to run under strict time requirements • For example, solve 24h-horizon problem in less than 1h

Revisiting scenario computations for shared-memory • Multiple sparse right-hand sides • Triangular solves phase hard to parallelize in shared-memory (multi-core) • Factorization phase speeds up very well and achieves considerable peak-performance • Our approach: incomplete factorization of • Stop factorization after the elimination of (1,1) block • will sit in the (2,2) block (Schurcomplement)

Implementation • Requires modification of the linear solver • PARDISO (Schenk) -> PARDISO-SC • Pivot perturbations during factorization needed to maintain numerical stability • Errors due to perturbations are absorbed by iterative refinement • This would be extremely expensive in our case (many right-hand sides) • We let errors propagate in the “global” Schur complement C (Step 2) • Factorize the perturbed C (denoted by ) (Step 3) • After Step 1, 2 and 3, we have the factorization of an approximation matrix

Pivot error absorption by preconditioned BiCGStab • Still we have to solve with • “Absorb errors” by solving Kz=r using preconditioned BiCGStab • Numerical experiments showed it is more robust than iterative refinement. • Preconditioner is • Each BiCGStab iteration requires • 2 mat-vecs: Kz • 2 applications of the preconditioner: • One application of the preconditioner resumes to performing “solve” steps 4 and 5 for

Summary of the new approach

Test architecture • “Intrepid” Blue Gene/P supercomputer • 40,960 nodes • Custom interconnect • Each node has quad-core 850 Mhz PowerPC processor, 2 GB RAM • DOE INCITE Award 2012-2013 – 24 million core hours

Numerical experiments • 4h (UC4), 12h(UC12), 24h(UC24) horizon problems • 1 scenario per node (4 cores per scenario) • Large-scale: 12h horizon, up to 32k scenarios and 128k cores (k=1,024) • 16k scenarios – 2.08 billion variables, 1.81 billion constraints, KKT system size = 3.89 billion • LAPACK+SMP ESSL BLAS for first-stage linear systems • PARDISO-SC for second-stage linear systems

Compute SC Times

Time per IPM iteration • UC12, 32k scenarios, 32k nodes (128k cores) • BiCGStab iteration count ranges from 0 to 1.5 • Cost of absorbing factorization perturbation errors is between 10 and 30% of total iteration cost

Solve to completion – UC12 • Before: 4 hours 10 minutes wall time to solve UC4 problem with 8k scenarios on 8k nodes • Now: UC12

Weak scaling

Strong scaling

Conclusions and Future Considerations • Multicore-friendly reformulation of sparse linear algebra computations lead to one order of magnitude faster execution times. • Fast factorization-based computation of SC • Robust and cheap pivot errors absorption via Krylov iterative methods • Parallel efficiency of PIPS remains good. • Performance evaluation on today’s supercomputers • IBM BG/Q • Cray XK7, XC30

Stochastic Optimization of Complex Energy Systems on High-Performance Computers

Stochastic Optimization of Complex Energy Systems on High-Performance Computers

Presentation Transcript

Programming for High Performance Computers

Stochastic optimization and control for Energy Management

Department of Energy High Performance Computing

Stochastic optimization of a timetable

Energy and complex systems

The Optimization of High-Performance Digital Circuits

Export Controls and High Performance Computers

Stochastic optimization of energy systems

Sparse Direct Solvers on High Performance Computers

Real-Time Geo-Registration on High-Performance Computers

Propagation of Uncertainty in Complex Structural Systems using Stochastic Approach

Stochastic Optimization in Electricity Systems

High Performance on the J90 Systems

Stochastic Optimization with Learning For Complex Problems

Performance Modelling of Complex Hardware/Software Systems

Sparse Matrix Methods on High Performance Computers

Stochastic optimization of energy systems

On Statistical Model Checking of Stochastic Systems

Sparse Direct Solvers on High Performance Computers

Sparse Direct Solvers on High Performance Computers

Sparse Direct Methods on High Performance Computers