190 likes | 327 Views
Trends in AO simulations @ ESO: or 10+ years of Octopus. Miska Le Louarn and all the Octopus gal and guys along the years: Clémentine Béchet, Jérémy Braud, Richard Clare, Rodolphe Conan, Visa Korkiakoski, Christophe Vérinaud. Different roles for end-to-end AO sims.
E N D
Trends in AO simulations @ ESO: or 10+ years of Octopus Miska Le Louarn and all the Octopus gal and guys along the years: Clémentine Béchet, Jérémy Braud, Richard Clare, Rodolphe Conan, Visa Korkiakoski, Christophe Vérinaud
Different roles for end-to-end AO sims • Rough concept validation • PYR is better than a SH ! Let’s make it. • TLR / performance definition • What performance can you get with our chosen PYR / DM ? • Provide PSFs to astronomers for science simulations / ETC • System design / component and tolerances specification / CDR / PDR • How well do you need to align the PYR wrt. DM ? • System performance validation • Yes, in lab / on the sky we get what we simulated • If not, why ? • System debugging • Why is this ^%$#@ not working ? • R&D • Frim, calibrations, testing of new concepts • Other • RTC simulation, Atmospheric simulations, WFS studies,…
General principles of Octopus • Atmosphere simulated by von Karman spectrum phase screens (these are pixel maps of turbulence) • Phase at telescope is the sum of the phase screens in one particular direction (geometric propagation) • A Wavefront sensor model measures that phase • Includes usually Fourier Transforms of the phase • From those measurements, commands to the DM(s) are calculated • DM shape is calculated (through WF reconstruction) and subtracted from the incoming phase • Commands are time filtered (simple integrator, or POLC, or…) • Phase screens are shifted, to reproduce temporal evolution (by wind – frozen flow hypothesis) • Go back to the beginning of this slide. And iterate for “some time”. • Many options for several of the steps above…
Archeology: why Octopus ? • OWL: 100m ancestor of the E-ELT, ~year 2001-2002 • Before Octopus, there were a few single CPU simulations (FRI’s aosimul.pro ( yao), CHAOS,ESO Matlab tool…) • Limitations: • 2 GB of RAM (32bit systems), single threaded • 1st challenge: • Simulate 100m SH-SCAO, on cheap desktop machines, with 2GB of RAM / machine, in a “reasonable” time ✓ • 2nd challenge • MAD-like on 100m or ✓ • MCAO (6LGS, 3DMs) for 40m class • 3rd challenge • EPICS (i.e. XAO, 200x200 subaps) for the 42m ✓ • Open to new concepts • Pyramid, Layer Oriented, MOAO, POLC, New reconstructors,…
Octopus: features • Octopus: software to simulate ELT AO / large AO systems • Has to be reasonably fast on LARGE systems. Not optimized for small systems… Still, it works also on small systems. • End-to-end (Monte Carlo), • Many effects (alignments, actuator geometry, …) included • Open to new reconstructors • MVM + Simple Integrator (this is the original scheme – the rest is add ons) • FrIM + Internal model control / POLC • FTR + Simple Integrator • Austrian “Cure3D” and others • Several WFS types • SH (with spatial filter if needed) • PYR (incl. modulation) • SO/LO • OL, SCAO, GLAO, LTAO, MCAO, MOAO can be simulated • LGS specific aspects • including different orders for sensors (e.g. 3x3 NGS sensor) • Image sharpening for TT • Spot elongation with central / side launch / non gaussian profiles,… • Different centroiding algorithms • “Complex” SW to handle all those cases.
Hardware / Software side • Hardware to simulate ELT AO • Linux + cluster of PCs • AO simulation cluster @ ESO: ~60 nodes, up to 128GB of RAM / node • Heterogeneous architecture (some machines faster / newer than others) • Gigabit Ethernet switch (quite old now upgrade 10G considered) • Software (open source, maximum portability & versatility): • Gcc, Mpich2, Gsl, fftw2, scalapack (all open source) • // debugger (ddt – not open source) • Code is very portable. Also tested: • Linux / PC cluster at Arcetri, Leiden (LOFAR project), IBM Blue Gene L (PPC architecture) • Single multi-core workstation • Shows limits of single machine: many cores machine has slower cores than less cores machines • Allows to tackle extremely large systems without changing at all the code. • To simulate bigger systems, just add machines.
Parallelization • Almost everything in Octopus is “somehow” parallelized • Atmospheric propagation • WFS • Several levels of parallelization • multiple WFSs • WFS itself • MVM, Matrix operations, Matrix creations (=calibration), PSF calculations • Parallelization done “explicitly” • Coarse grain parallelization (i.e. big “functional” blocks are parallelized) • This introduces a level of complexity not necessarily seen in “conventional” AO simulators • Parallelization done with MPI • Allows to use many machines (“distributed memory”), and add memory by adding machines • Allows also to use single machine with multiple cores (“shared memory” with some overhead): not optimal but portable. • Although not optimized, the code will run and be useful in different kinds of architectures (shared and distributed memory). BUT Not optimal in shared memory case !
Recent upgrades • Noise optimal reconstructorfor spot elongation (“SCAO”, GLAO, LTAO) with central / side launch • Richard for MVM, Clementine for Frim, All Austrians reconstructors • Spot elongation with non gaussian Na profiles • New MVM reconstructor with MMSE tomography (ATLAS, MAORY). ONERA algorithm being made Octopus compatible. • Significant acceleration (x5 !) of code with large spot elongation • Skipping of PSF calculation, just rms WFE + TT fudge Strehl (acceleration) • Most accelerations have been done through better approximations and improved modeling of the physics • Octopus is a mix of AO physics modeling and computer science optimizations
System based customizations • Each AO system is somehow unique • At some phase of system analysis, particularities of the system need to be integrated • Actual DM geometry IFs • Particular error budget (vibrations, static errors,…) • Particular outputs (TT residuals, EE, PSF…) • Code then “diverges” from main branch (OR enormous array of if this then that) • How to deal with *a lot* of configurations, each somehow special ?
Octopus validations • Recurrent question: “How is Octopus validated” ? • Against other simulators • Several “campaigns” of validation • Yao ( Gemini MCAO), TMT simulator (NFIRAOS), analytical models (Cibola, ONERA, error budget-type formula for fitting/aliasing/temporal delay,…) • NACO simulations compared to Octopus • Against MAD • There are so many variables that you never know for sure (e.g.: integration of X seconds, with constantly variable seeing vs. Y seconds simulation with fixed seeing, Cn2,…) • Satisfactory agreement when “reasonable” assumptions are made • Indirectly • For example, Frim uses an internal AO model. This allowed also to test Octopus methods. Showed impact of SH non-linearities. • The simulation only simulates what you put in… • If the system is not well maintained, simulations and reality will disagree. • The problem is rather: what did you forget to model in the PARTICULAR system you are investigating. (ex: vibrations, Cn2, Telescope…)
Difficulties with Octopus • It is written in C and parallelized • Adding new features is more difficult than with higher level simulation tools • Price to pay for high speed & portability • One could move some things to a higher level language (yorick ?) to simplify – with not much loss of performance • Some Linux knowledge and command line is needed • It is also complex because many concepts are simulated, in parallel. A single thread SCAO SH code would be much simpler. • Many things are “quick and dirty” solutions which need cleaning up • Written by physicist. It is a research code. • I think that’s ok – we are never doing the same system twice, so there is always things to add / change (ERIS is the latest example). • New concept pop up and need new implementation • For example, spot elongation required to break the nice paradigm that all sub-apertures are equal ( impact on parallelization) • LGS with PYR might also introduce some mess • […]
A faster Octopus? • One very efficient way to accelerate is to reduce accuracy • Example: Sphere Model for SPARTA • Reduce pupil sampling ( FOV of subaps gets smaller) • Reduce number of turbulent layers ( ok for Sphere) • Don’t calculate PSF (just Strehl ok for SPARTA use) • No spatial filter ( ok for SPARTA) • […] • Simulation accelerates by factor 5-10 (!). SPHERE @ 120Hz (can be improved) • Octopus cluster allows to run at least 5-10 simulations simultaneously: allows to gain some of the time “lost” (wrt GPU codes) by simply launching many simulations in parallel. • Tested Xeon Phi • Managed to run the code • Very slow (unuseable) for the moment on Phi • Need to improve vectorization • Improve paralellization to use efficiently 100-200 cores • Is it worth the time ??? • Vectorization should be improved for sure (improves also CPU performance) • What’s the future of Xeon Phi ?
A faster Octopus ? • An option is to use more dedicated hardware • TMT / COMPASS-like approach to port the ~whole code to GPUs • Harder to add new concepts into GPU quickly because so specialized • Large porting effort requiring GPU & AO expertise • Lose possibility to go to large cluster (supercomputer) if needed • If a huge AO simulation is needed (for example 2nd Gen MCAO for ELT), we risk being stuck by HW limitations if HW is too specific • This is clearly a risk since we are very influenced by external ideas (≠TMT). We cannot have a dedicated simulation tool per project. • Compromise: Porting parts of Octopus to GPUs is possible without loss of generality (but also with loss of max achievable performance) • Eg. SH could be accelerated “easilly” by porting FFT to GPUs – but with what gain ? • Same for PSF calculation (maybe – it’s large FFTs…) – but with what gain ? • Porting atmospheric propagation would require much more work ( TMT). • Huge effort in terms of manpower is needed for this approach… • Use COMPASS for some cases ?
Octopus external tools • Set of tools to analyze Octopus data • Plot DM shapes, slopes, commands, Ims,… • Pretty much everybody wants different things • Matlab, yorick, IDL,… • Matlab Engine (using Matlab compiler to produce libraries) to call Octopus from Matlab and vice versa • External code can also be used with Octopus • Reconstructors (Frim, Austrians, soon ONERA) • Power spectrum calculators ( Richard) • Analysis of residual phases, slopes, commands,… through dumps to the disk.
Future software directions ? • RTC testing platform • Use Octopus to generate slopes to feed to SPARTA, SPARTA generates commands, Commands sent to Octopus • allows to test SPARTA loops that need true atmospheric data (e.g. r0 estimation, optimizations,…) • “A loop in the computer” • Doesn’t need highest accuracy simulation BUT extreme speed • First “proof of concept” demonstration done with Octopus • GPUs / FPGA / … • To get more speed on simulations in some areas (or complete simulation…) • More… • Calibrations of AOF, AIT of AOF • Algorithms, temporal behavior, […] • PYR with LGS ? • We need to carefully weights what we lose in coding time (optimizing / re-coding, re-engineering) vs. what we gain in simulation time. • Very often not limited by simulation speed but setting up / checking / thinking / gathering and comparing result… • I prefer a set of small evolutions in steps instead of a complete rewrite
Simulated systems Along the years, many systems have been simulated AOF: GRAAL, GALACSI WFM, NFM OWL (100m): SCAO, GLAO, MCAO E-ELT (50m, 42m, 39m): SCAO, GLAO, MCAO, LTAO, XAO, [MOAO] “TMT NFIRAOS” Eris “Gemini-like MCAO” (for Eris) MAD (SCAO, GLAO, MCAO) “NACO” “SPHERE” NAOMI […]
Conclusions • Octopus has shown its ability to deliver simulations on all major AO systems at ESO • It is fast enough on large AO systems and scalable to anything that we can imagine • Many accelerations done recently – so its even faster • With current software & hardware, we can do the study (up to FDR) of any one (maybe 2) complex ELT AO system in addition to ERIS / VLT systems. Today. • More people limited that CPU limited • Well tested (doesn’t mean bug free ;-) ) • Has been demonstrated to be open to new concepts, and able to deliver results on those new concepts in a relatively short time.