90 likes | 127 Views
Explore a novel method using LSTM on FPGA for efficient real-time non-rigid motion correction in calcium imaging, achieving significant speedups and energy efficiency compared to conventional algorithms.
E N D
LANMC: LSTM-Assisted Non-Rigid Motion Correction on FPGA for Calcium Image Stabilization Zhe Chen1, Hugh T. Blair2, Jason Cong1 1Computer Science Department, 2Department of Psychology, UCLA zhechen@ucla.edu
Research Background Miniscope Calcium Imaging [1] Monitoring neuron activities at large scale in vivo. Challenge • Non-uniform motion artifacts • Costly and Low Efficient Algorithm Motivation Real-Time Non-Rigid motion correction for calcium imaging IN DEMAND. [1] Denise J. Cai, Daniel Aharoni et al., Nature, 2016
Conventional Non-Rigid Motion Correction Method Processing Steps • 2D Contrast Filter • Remove the bulk of background • Filter size: Cell diameter in image • Piecewise Rigid Motion Correction • Divide overlapping patches • Cross correlation based on FFT/IFFT • Local Maximum -> Motion Vector Algorithm Inefficiency: The operation needs to be repeated for each single patch. It causes algorithm to be costly and inefficient for real-time application.
Proposed Method based on LSTM Inference METHOD: Use long short-term memory (LSTM) inference to predict motion at overlap patches • Offline Training • NoRMCorre -> Get training target • Online Inference • Rigid motion correction + LSTM Inference 95%operation is saved by using 5-node LSTM Accuracy Evaluation:
Implementation: Folding Architecture Leverage the central symmetry of the filter kernel with Folding I4 I0 I1 I2 I3 C2 C1 C0 C1 C0 Save >80% LUT, FF and >60% DSP compared to design w/o folding Performance Evaluation At 300 MHz, FPGA achieves >40x speedup over the CPU
Implementation: Reuse FFT/IFFT and LSTM • Unroll and Pipeline FFT/IFFT Operation • Unroll and Pipeline LSTM Inference Acceleration • Reuse FFT/IFFT IP for H/V Transformation Vivado HLS • Reuse LSTM for H/V Direction and All Patches
Performance Evaluation Processing Latency Energy Efficiency compared to Xeon E52620 CPU Low power high efficient Ultra96 board Consistent speedup of acceleration kernels Simplify algorithm by LSTM inference 82x Speedup Close to 4 orders Gain Conclusion FPGA design realizes real-time non-rigid motion correction for calcium image. Low latency and high energy efficiency suitable for closed-loop feedback stimulation.
Thank you! Acknowledgments