Donald Gavel, Marc Reinig, and Carlos Cabrera UCO/Lick Observatory Laboratory for Adaptive Optics

Tomography forMulti-guidestar Adaptive OpticsAn Architecture for Real-Time Hardware Implementation Donald Gavel, Marc Reinig, and Carlos Cabrera UCO/Lick Observatory Laboratory for Adaptive Optics University of California, Santa Cruz Presentation at the SPIE Optics and Photonics Conference 5903-15 San Diego, CA June 3, 2005

Outline of talk • Introduction: The problem of real-time AO tomography for extremely large telescopes (ELTs): Real-time calculations grow with D4 • An alternative approach using a massively parallized processor (MPP) architecture • Performance study results • Experiment • Simulation • Conclusions SPIE Optics and Photonics, San Diego, Aug. 2005

AO systems are growing in complexity, size, ambition • MCAO • 2-3 conjugate DMs • 5-7 LGS • 3 TTS • MOAO • Up to 20 IFUs each with a DM • 8-9 LGS • 3-5 TTS SPIE Optics and Photonics, San Diego, Aug. 2005

Extrapolating the conventional vector-matrix-multiply AO reconstructor method to ELTs is not feasible H = actuator to sensor influence function matrix Least-squares solution Minimum variance solution General form • Online calculation requires P x M matrix multiply • M = 10,000 subaps x 9 LGS • P = 20,000 acts (MCAO) or 100,000 acts (MOAO) • fs = 1 kHz frame rate Þ ~1011 calcs x 1 kHz = ~105 Gflops = ~105 Keck AO processors! • Offline calculation requires O(M3) flops to (pre)compute the inverse ~1015 calcs --106 sec (12 days) with 1Gflop machine • “Moore’s Law” of computation technology growth: processor capability doubles every 18 months. To get a 105 improvement takes 25 years growth. Let’s say we use 100 x more processors; a 103 improvement takes 15 years. SPIE Optics and Photonics, San Diego, Aug. 2005

Alternative: massively parallel processing • Advantages • Many small processors each do a small part of the task – not taxing to any one processor • Modularity: each processor has a stand-alone task – possibly specialized to one piece of hardware (WFS or DM) • Modularity makes the system easier to diagnose – each part has a “recognizable” task • Modularity makes system design easier – each subsection depends only on parameters associated with it, as opposed to global optimization of a monolithic design • Requires • Lots of small processors, with high speed data paths • Iteration to solution – but what if 1 iteration took only 1 ms? – then we would have time for 1000 iterations per 1 ms data frame cycle! SPIE Optics and Photonics, San Diego, Aug. 2005

1. Wavefront sensor processing • Hartmann sensor:s = Gy • s = vector of slopes • y = vector of phases • G = gradient operator • Problem is overdetermined (more measurements than unknowns), assuming no branch points • High speed algorithms are well known e.g. FFT based algorithm by Poyneer et. al. JOSA-A 2002 is O(n0log(n0)) SPIE Optics and Photonics, San Diego, Aug. 2005

Weiner solution of the wavefront sensor slope-to-phase problem in the Fourier domain • = spatial frequency ~ indicate Fourier transform r0 = Fried’s parameter sn = meas. Noise da = subap diameter Cff = Kolmogorov spectrum Cnn = noise spectrum SPIE Optics and Photonics, San Diego, Aug. 2005

2. Tomographic reconstruction • Guidestars probe the atmosphere: • where • y = vector of all WFS phase measurements • x = value of dOPD at each voxel in turbulent volume • A is a forward propagation operator(entries = 0 or 1) • The problem in underdetermined – there are more unknowns than measurements x is an N-vector y is an M-vector A is M x N SPIE Optics and Photonics, San Diego, Aug. 2005

Inverse tomography algorithms Preconditioned conjugate gradient Linear feedback -or- • AT is the back propagation operator • C is the “preconditioner” • affects convergence rate only • P,N is the “postconditioner” • determines the type of solution: • P=I, N=0Þ least squares • P=<xxT>, N=<nnT> Þ min variance • g = constant feedback gain • f(.) = 1st order regression (and other hidden details of the CG algorithm) SPIE Optics and Photonics, San Diego, Aug. 2005

Compute count for inverse tomography • A and AT are massively parallelizable over transverse dimension, guidestars • AT is massively parallelizable over layers per iteration • Optional Fourier domain preconditioning and postconditioning: per iteration SPIE Optics and Photonics, San Diego, Aug. 2005

A Single Voxel Processor • An Array of Voxel Processors Prototype implementation on an FPGA SPIE Optics and Photonics, San Diego, Aug. 2005

Preliminary Results for MPP Timing and Resource Allocation on an FPGA • Timing • Basic clock speed supported: 50 MHz (Xilinx Vertex 4) • Total number of states per iteration: 36 Parameters (current Value) L = Layers (4) NGS = Guide Stars (3) n0 = Sub Apertures (4) A single iteration takes T = 4NGS + 2LNGS + 6 clock cycles Currently this is 36 50MHz clocks = 720 nsec. Per iteration Note: algorithm parallelizes over guidestars For reasons of simplicity and debuging of this first implementation we have not done this yet • Chip count • This implementation: Vertex 4 chip is 20% utilized (2996 of 15360 available logic cells employed) • Scaling to a system with 10,000 subapertures (such as for the 30 meter telescope) would require 500 of these chips • Standard packing density is ~50 chips/board, this equates to 10 circuit boards SPIE Optics and Photonics, San Diego, Aug. 2005

Simulation: extrapolation to the full ELT spatial scale to estimate convergence rates • 7800 subapertures per guidestar • 5 guidestars • 7 layer atmosphere • Fixed feedback gain iteration • A and AT implemented in the spatial domain • Initial atmospheric realizations were random with a Kolmogorov spatial power spectrum. Convergence to 3 digits accuracy in 1ms SPIE Optics and Photonics, San Diego, Aug. 2005

3. Projection and fitting to DMs • MCAO • Requires filtering and weighted integral over layers for each DM • Filters and weights chosen to minimize “Generalized Anisoplanatism” (Tokovinin et. al. JOSA-A 2002) • Massively parallelizable over the Fourier domain and over DMs - L steps to integrate • MOAO • Requires integral over layers for each science direction (DM) • Massively parallelizable over Spatial or Fourier domain and over DMs – L steps to integrate • DM fitting • Deconvolution – massively parallelizable given either spatially invariant or spatially localized actuator influence function • PCG suppresses aperture affects in 2-3 iterations SPIE Optics and Photonics, San Diego, Aug. 2005

Conclusions • The architecture: massive parallel computation • Conceptually simple • Tested with a commercial FPGA; evaluated with simulations – it’s feasible with today’s technology • Under study: FD-PCG – extra computation per iteration traded off against faster convergence rate SPIE Optics and Photonics, San Diego, Aug. 2005

Donald Gavel, Marc Reinig, and Carlos Cabrera UCO/Lick Observatory Laboratory for Adaptive Optics