190 likes | 397 Views
An FFT/IFFT Accelerator for OCT Application. Zhenhong Liu. What is OCT?. OCT = Optical Coherence Tomography An optical analogy of Ultrasound Tomography Provide micrometer-resolution Light source is not harmful (unlike x-ray). Optical coherence tomogram of a fingertip
E N D
An FFT/IFFT Accelerator for OCT Application • Zhenhong Liu
What is OCT? • OCT = Optical Coherence Tomography • An optical analogy of Ultrasound Tomography • Provide micrometer-resolution • Light source is not harmful (unlike x-ray) Optical coherence tomogram of a fingertip (http://en.wikipedia.org/wiki/File:HautFingerspitzeOCT.gif)
Data Processing • 3 FFT/IFFT in the algorithm • # of data point is large: 1024/2048
Sample Data • 16-bit int for image data, converted to floating-point during processing • single precision floating-point for background and calibration data • output to a gray scale bmp file, 1024x1024 pixels
Using Fixed-point • Fixed-point number (WL, FL): • Keep twiddle factors (32, 30) • Change the fractional length for input/output data. • Prevent overflow during FFT/IFFT: arithmetic right shift the output by 1 bit after every butterfly operation in FFT or IFFT.
Using Fixed-point (32, 2) (32, 4)
Using Fixed-point (32, 4) (32, 6)
Using Fixed-point f-p (32, 6)
Fixed-point + Approx. Twiddle Factor • Very sensitive to twiddle factor • Simply reduce the fraction length is not effective: • OK for the twiddle factors >> 0 • Large errors for twiddle factors ~ 0
Approx. Twiddle Factor • A suitable approx. multiplier • Finish a multiplication in n iterations • Round A to a number that has n 1’s at most • Store the positions of the 1’s in SRAM • Requires that A does not change often • The larger n is, the more accurate the product is
Hardware Implementation • Original design only supports positive A • need an extra sign bit in SRAM for each entry • xor B with the sign bit. • Support for complex multiplication • two units share one SRAM • no add/sub operation after multiplying • Cannot pipeline the design, use multiple unit in the butterfly unit to increase throughput • n iteration -> n units in a butterfly unit • For IFFT, only need a 1-bit control signal.
Hardware Implementation schematic: One complex multiplying in 2n cycles
Hardware Implementation schematic: Butterfly unit, DIF FFT/IFFT
Hardware Implementation • For N-point FFT/IFFT, each stage takes N/2 cycles • Hardware cost even smaller than using fixed-point accurate multiplier • Should be more power efficient • No visible changes to the output images