1 / 22

Achieving Low Latency, Reduced Memory Footprint and Low Power Consumption with Data Streaming

Achieving Low Latency, Reduced Memory Footprint and Low Power Consumption with Data Streaming Olivier Bockenbach 1 , Ian Wainwright 1 , Murtaza Ali 2 , Mark Nadeski 2 . 1 - ContextVision, Linkoping, Sweden 2 - Texas Instruments, Dallas, TX, USA. Outline Slide. Problem statement

solberg
Download Presentation

Achieving Low Latency, Reduced Memory Footprint and Low Power Consumption with Data Streaming

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Achieving Low Latency, Reduced Memory Footprint and Low Power Consumption with Data Streaming Olivier Bockenbach1, Ian Wainwright1, Murtaza Ali2, Mark Nadeski2. 1 - ContextVision, Linkoping, Sweden 2 - Texas Instruments, Dallas, TX, USA

  2. Outline Slide • Problem statement • Technology revolution in medical imaging • Real time imaging in Ultrasound • A data streaming processing framework • Example: temporal filter • Object descriptors • Real Time and low latency scheduling • Results and future plans • Conclusion

  3. Healthcare Revolution • Takes advantage of new acquisition technology • CCD cameras and flat panels in X-Ray • 3 Tesla MRI scanners • Up to 640 detector rows in spiral CT • Surfs the processing power wave • Moore’s law • Reduce die size • New leading edge algorithms • Noise reduction, enhancement • Segmentation, registration

  4. Digital Fluoroscopy • From Film to Real Time • 30-60 fps • 10242 16 bits

  5. Ultrasound Imaging • Real Time • 30-60 fps • 8 bits • Size depending on depth

  6. Ultrasound Imaging pipeline • Varying level of processing complexity • Some introduce latency • Inherently: scan conversion • By design: Speckle reduction • Algorithm • Framework Beam Forming Decimation Log Acquisition Scan Conversion Speckle Reduction Compounding

  7. Case study: IIR temporal filter Live dx Gauss filter Downsample 4x First deriv Block sum Linear coeff. dy History dt t2 y2 x2 xt yt Vx Warp Upsample 16x Smoothing Linear solving Vy Temporal Filter Filtered

  8. Image Based Implementation D S U L W 800x400 8b 200x100 16b 50 x 25 32b TF 1920+120+1278 ~= 3.3MB

  9. Line Based Implementation Buffer pool descriptor • All buffers • In lieu of images • Line pools • Round robin • Adjusted length • Adapted line count • DMA for I/O DMA Image in DDR3

  10. Scheduling the pipeline • Targeting low latency • Line is unit of execution • Trigger on input request fulfilled • Task table • I/O Dependencies • Module description • Built offline • Several algorithms in separate pipelines Up (…) Pools

  11. Wind in Phase

  12. Steady State

  13. Wind out phase

  14. Wind out phase Load (%) Total image processing time Next Image Wind In Previous Image Drain Current Image Drain Current Image Wind In Current Image Steady State Time (TU) … Apparent image processing time Total Latency

  15. Image Instance B … … Gauss and Downsample Warp Image Temporal Filter … First Order Derivative <Other stages> Image Instance A First Order Derivative

  16. Implementation • On one core of a C6674 DSP from TI • Latency of 62 lines • 42 Cycles per pixel (70% CPU load) • 145 KB for data buffers • 95 KB code and data • ~50% of L2 as SRAM • Input from FPGA • Over Serial RapidIO • Payload of 32 lines SRIO Xilinx FPGA TI C66x DSP

  17. TI C66x Core

  18. Power Consumption • Power increase • With frequency • 2x in nominal range ~30% ~15%

  19. Power Dissipation Ideally we would put here 2 graphs: one with the power drawn at 70% of usage over of one core and one

  20. Plans for the Future in Ultrasound • Faster imaging • Synthetic aperture • Lower power • Thousands of fps • Faster processors • Higher frequencies • More integration

  21. Conclusion • This study shows the design of an image processing framework aimed at: • Real time low latency • Low memory footprint • Low power consumption • Successful implementation on a TI DSP for a temporal filter in Ultrasound • Promising properties for future applications and systems.

More Related