390 likes | 602 Views
Enrico Fedrigo. Trends on real time control for adaptive optics. Source of inspiration. Where to detect trends? The Real Time Control Workshop Garching December 4 th , 5 th , 2012 ESO Messenger 151, March 2013, pages 55-57 ESO ELT Telescope RTC Advanced prototype developed
E N D
Enrico Fedrigo Trends on real time control for adaptive optics
Source of inspiration • Where to detect trends? • The Real Time Control Workshop Garching December 4th, 5th , 2012 ESO Messenger 151, March 2013, pages 55-57 • ESO ELT Telescope RTC Advanced prototype developed • ESO ELT Instrument RTC Development Plan Plan under development based on Phase-A Instrument requirements • My own experience Firenze, 26-31 May 2013; AO4ELT3
The Real Time Control Workshop • 2nd meeting, first in Durham 13th, 14th April 2011 • 66 registered participants, 20% commercial • 28 talks, 1 special invited talk, 7 sessions, 1 panel discussion, 2 free-form open discussions • 2 major topics: technology and algorithms • All talks here: http://www.eso.org/sci/meetings/2012/RTCWorkshop.html Firenze, 26-31 May 2013; AO4ELT3
RTC Workshop: Technology • Non deterministic • Multi-core/many-core • Easiest to program, most difficult to optimize • Cheap • Very fast evolution • Deterministic • Massively parallel • Difficult to program (now) • Expensive • High degree of parallelism • Simpler to program but need internal knowledge • Internally deterministic • Lacks I/O (but there is GPUdirect) • Non standard, subject to vendor lock-in • Relatively cheap • Very fast evolution DSP Firenze, 26-31 May 2013; AO4ELT3
FPGA • TMT concept • Based on a commercial card with 8xVirtex6 from Nutaq • Not the best match for MVM • Powerful Firenze, 26-31 May 2013; AO4ELT3
FPGA • Microgate product • Tailored to a specific product, adaptive mirrors, where COTS might not always be the best choice • Delivers the required performance • Custom product: obsolescence managed in-house Firenze, 26-31 May 2013; AO4ELT3
FPGA • ESO SPARTA product • Uses FPGA to manage communication and to compress the input stream (WPU) • Design of 2005, getting obsolete • Still 80us is respectable • It delivers Firenze, 26-31 May 2013; AO4ELT3
CPU • Durham RTC system DARC • CPU-based with support for GPU • Good to test algorithms • Flexible, expandable • Tested on sky • Interfaces to simulator See talk Friday Firenze, 26-31 May 2013; AO4ELT3
CPU • Kiepenheuer Institute AO system • Stock Linux with few tweaks to improve real-time • Correlation Shack-Hartmann • Flexible • Tested on sky • Moving to FPGA? Firenze, 26-31 May 2013; AO4ELT3
CPU • ESO’s SPARTA all-CPU • For VxWorks (partially available on Linux) • Runs on Intel • Can be turned to FPGA • Same supervisor save investments Firenze, 26-31 May 2013; AO4ELT3
ELT: the Telescope RTC • Biggest CPU-based system so far • Based on tweaked BSD • Designed for a specific application Firenze, 26-31 May 2013; AO4ELT3
The Intel Phi • Recent product from Intel • Dedicated to HPC • Modest speed-up promised, • Still you can put 8 of them in one machine • First tests disappointing • Interesting for portability • Roadmap to be verified Firenze, 26-31 May 2013; AO4ELT3
High performance on CPUs Matteo Frigo, creator of FFTW and Cilk Firenze, 26-31 May 2013; AO4ELT3
GPU • TNO proposal for an array of GPUs • MVM (cuBLAS or Fujimoto) • 4xGPU good for ELT SCAO • Uses external API or libraries Firenze, 26-31 May 2013; AO4ELT3
GPU • TMT concept • MVM, 2xGPU per WFS • More GPUs on cluster system • Slow update of the control matrix See talk Wang Friday Firenze, 26-31 May 2013; AO4ELT3
GPU • LESIA project • Real time and simulator on GPU • Addresses the latency problem with GPUdirect • Relies on Nvidia and CUDA See talk Gratadour today Firenze, 26-31 May 2013; AO4ELT3
Technological trends • GPUs are the hottest technological component • The latency problem is being addressed by Nvidia with GPUdirect • The roadmap is robust and proceeding at a fast pace • Prototypes show ELT GLAO/SCAO can be targeted • CPU: going the parallel way • Not always easy to manage • FPGA still important for high performance and special tasks • Communication layer, stream processing • Can implement a complete very high performance (==low latency) system; concepts can target EPICS full MVM • High level tools appearing (OpenCL, C-to-VHDL) • Real Challenge: write a portable software that can benefit from the advantages of each platform Firenze, 26-31 May 2013; AO4ELT3
Transistor density • Transistors, frequency, power, performance, and cores over time (1985-2010). Computer density: still growing Clock speed: halted Performance: growing, slower Power density: halted Credits: Committee on Sustaining Growth in Computing Performance Firenze, 26-31 May 2013; AO4ELT3
Parallelism • Parallelism and distributed computing is needed. The 5 challenges: • Extract parallelism from algorithm, find independent execution branches • Amdahl law: • Locality • Communication • Synchronisation • Load balancing Firenze, 26-31 May 2013; AO4ELT3
Amdahl law 72 P=99%. N=256: How much ‘speedup’? Firenze, 26-31 May 2013; AO4ELT3
Usage trends • GPU • Most groups stay away from the internals of the GPU and use them through standard libraries (BLAS) • Some get into the technology for a further optimisation step • FPGA • Still perceived as “difficult” and expensive • Groups looking into ways to simplify the development • CPU • They have always been there, now increasing scope • MACAO and SPARTA Light for small/mid size systems • DARC/KAOS for small/mid size • FORCE prototype for ELT entry level (GLAO) Firenze, 26-31 May 2013; AO4ELT3
The real issue • Development costs Firenze, 26-31 May 2013; AO4ELT3
RTC Workshop: Algorithms Firenze, 26-31 May 2013; AO4ELT3
Smart algorithms performance CuReD performance See talk Shatokina Friday Firenze, 26-31 May 2013; AO4ELT3
Smart algorithms performance Kazcmarz performance See talk Ramlau today Firenze, 26-31 May 2013; AO4ELT3
Smart algorithms performance SABRE overview Firenze, 26-31 May 2013; AO4ELT3
Frim acceleration • Smart arrangement • Split an on-line part and an off-line part • Applicable to any iterative algorithm On-line • SPARTA does it on the IIR controller: Off-line See poster Bechet • This is how 80µs latency is achieved Firenze, 26-31 May 2013; AO4ELT3
The latency (and jitter) issue • How crucial is the latency and the jitter? • Specifications on latency and jitter must be carefully checked against Top Level Requirements to avoid over-specifying the real time controller Firenze, 26-31 May 2013; AO4ELT3
Trends on algorithms • Smart algorithms are ready • Some tested on sky • Need to characterize them, mapping to different platforms • Biggest question: do we need them? • Brute-force MVM on optimised hardware can be used to implement almost all the foreseen ELT instruments but one • Array of GPUs or FPGAs • Still need them to compute the CM • Anyway, would you throw away a factor 1000 speed-up? • Can make room for more advanced control schemes • It is now a matter of a design decision Firenze, 26-31 May 2013; AO4ELT3
Vibration rejection • A trend (and hot topic) on his own • Several groups at work with different solutions • Two main categories: LQG-based or RLS-based See talk Sivo Friday Firenze, 26-31 May 2013; AO4ELT3
ELT: the Instrument RTC • ESO is developing a development plan Requirements Technological survey Requirements Analysis Community survey LESIA Plan Firenze, 26-31 May 2013; AO4ELT3
ELT: the Instrument RTC • Major drivers: • Compatibility with ELT established standards wherever possible • Obsolescence management, upgradability, maintenance • Scalability both in performance (small to big systems) and in cost (laboratory to instrument systems) • Structure of development, development phases, industrialization • Define need for a common development (a platform) and at which level • Flexibility to accommodate varying requirements/algorithms during the development and AIT phases (maybe with degraded performance) • Strong decoupling between the I/O and computing modes technology choices, allowing separate upgrade paths/roadmaps. • High SW component reusability through loosely-coupled development techniques and standard libraries. Firenze, 26-31 May 2013; AO4ELT3
ELT: the Instrument RTC • Technological survey • Operating system • VxWorks, Linux • Parallel programming and architectures • Cilk, OpenMP, OpenCL, NUMA, SSE • Interconnects (PCIe, GbE) • CPU-based implementations • Accelerators • GPU for soft and hard real time, GPU direct, multi-GPU systems • Role of Phi • FPGA as • protocol offload engine • Computing engine Firenze, 26-31 May 2013; AO4ELT3
10-40-100 GbE Interconnect • Raw UDP • With FPGA-to-FPGA: <1µs latency • Full bandwidth (10Gb) reached • Optimised switching with multicast • <2.5µs latency, switch only • Full bandwidth on all ports (48) reached March 29th, ESO Garching
Concept for successor of SPARTA Cluster DET WPU REC CTR CODE switch switch switch • Communication based on 10-40-100 GbE • Distribution based on UDP or RTPS • Directly managed by FPGAs where latency/jitter is important • Metrology derived from the switch • Switch can deliver low latency (proved by Cisco) Firenze, 26-31 May 2013; AO4ELT3
My own experience 1.5M-MAC 1.5M-MAC 1.5M-MAC 27M-MAC 15M-MAC 4.1G-MAC 12G-MAC 12G-MAC 7.5M-MAC 5 M-MAC 4.1G-MAC • Portability of MACAO code • CPUs catch up: 2007: NAOS Upgrade • Obsolescence of SPARTA • Modularity to fight obsolescence • Real time performance vs feasibility • The rest of the development is the biggest part • Importance of shared development • Lack of closed loop testing tools MACAO VLTI 4 2003 MACAO CRIRES 1 2006 SINFONI 1 - LGS 2004/2006 NAOS LGS 2006 MAD 2007 SPHERE 1 2013 GRAAL 1 2015 1 GALACSI 2015 GRAVITY 4 NAOMI 4 SPARTA ERIS 1 Firenze, 26-31 May 2013; AO4ELT3
The importance of being a Platform Firenze, 26-31 May 2013; AO4ELT3
Conclusions • Heterogeneous computing, with GPUs playing a very important role • MVM coming back • Deterministic transport settling on GbE • Use of embedded systems more and more limited • Emergence and importance of optical bench simulators • Need to find “space” for more complex control schemes • Anti wind-up, saturation management, vibration rejection, modal control. LQG • They add complexity • Use of commodity hardware upgradeability • Maintainability of commodity hardware imposes continuous upgrades • Importance of software development costs • Minimizing it key to success shared developments, collaborations • Portability and modularity recognised but need more development • Need to harness computing power of the different technologies in a portable/maintainable way template programming or metaprogramming • Total Cost of Ownership rarely addressed Firenze, 26-31 May 2013; AO4ELT3