220 likes | 243 Views
Achieving Low Latency, Reduced Memory Footprint and Low Power Consumption with Data Streaming Olivier Bockenbach 1 , Ian Wainwright 1 , Murtaza Ali 2 , Mark Nadeski 2 . 1 - ContextVision, Linkoping, Sweden 2 - Texas Instruments, Dallas, TX, USA. Outline Slide. Problem statement
E N D
Achieving Low Latency, Reduced Memory Footprint and Low Power Consumption with Data Streaming Olivier Bockenbach1, Ian Wainwright1, Murtaza Ali2, Mark Nadeski2. 1 - ContextVision, Linkoping, Sweden 2 - Texas Instruments, Dallas, TX, USA
Outline Slide • Problem statement • Technology revolution in medical imaging • Real time imaging in Ultrasound • A data streaming processing framework • Example: temporal filter • Object descriptors • Real Time and low latency scheduling • Results and future plans • Conclusion
Healthcare Revolution • Takes advantage of new acquisition technology • CCD cameras and flat panels in X-Ray • 3 Tesla MRI scanners • Up to 640 detector rows in spiral CT • Surfs the processing power wave • Moore’s law • Reduce die size • New leading edge algorithms • Noise reduction, enhancement • Segmentation, registration
Digital Fluoroscopy • From Film to Real Time • 30-60 fps • 10242 16 bits
Ultrasound Imaging • Real Time • 30-60 fps • 8 bits • Size depending on depth
Ultrasound Imaging pipeline • Varying level of processing complexity • Some introduce latency • Inherently: scan conversion • By design: Speckle reduction • Algorithm • Framework Beam Forming Decimation Log Acquisition Scan Conversion Speckle Reduction Compounding
Case study: IIR temporal filter Live dx Gauss filter Downsample 4x First deriv Block sum Linear coeff. dy History dt t2 y2 x2 xt yt Vx Warp Upsample 16x Smoothing Linear solving Vy Temporal Filter Filtered
Image Based Implementation D S U L W 800x400 8b 200x100 16b 50 x 25 32b TF 1920+120+1278 ~= 3.3MB
Line Based Implementation Buffer pool descriptor • All buffers • In lieu of images • Line pools • Round robin • Adjusted length • Adapted line count • DMA for I/O DMA Image in DDR3
Scheduling the pipeline • Targeting low latency • Line is unit of execution • Trigger on input request fulfilled • Task table • I/O Dependencies • Module description • Built offline • Several algorithms in separate pipelines Up (…) Pools
Wind out phase Load (%) Total image processing time Next Image Wind In Previous Image Drain Current Image Drain Current Image Wind In Current Image Steady State Time (TU) … Apparent image processing time Total Latency
Image Instance B … … Gauss and Downsample Warp Image Temporal Filter … First Order Derivative <Other stages> Image Instance A First Order Derivative
Implementation • On one core of a C6674 DSP from TI • Latency of 62 lines • 42 Cycles per pixel (70% CPU load) • 145 KB for data buffers • 95 KB code and data • ~50% of L2 as SRAM • Input from FPGA • Over Serial RapidIO • Payload of 32 lines SRIO Xilinx FPGA TI C66x DSP
Power Consumption • Power increase • With frequency • 2x in nominal range ~30% ~15%
Power Dissipation Ideally we would put here 2 graphs: one with the power drawn at 70% of usage over of one core and one
Plans for the Future in Ultrasound • Faster imaging • Synthetic aperture • Lower power • Thousands of fps • Faster processors • Higher frequencies • More integration
Conclusion • This study shows the design of an image processing framework aimed at: • Real time low latency • Low memory footprint • Low power consumption • Successful implementation on a TI DSP for a temporal filter in Ultrasound • Promising properties for future applications and systems.