1 / 42

Sonic Millip3De : Massively Parallel 3D Stacked Accelerator for 3D Ultrasound

Sonic Millip3De : Massively Parallel 3D Stacked Accelerator for 3D Ultrasound. Richard Sampson * Ming Yang † Siyuan Wei † Chaitali Chakrabarti † Thomas F. Wenisch * * University of Michigan † Arizona State University. Portable Medical Imaging Devices.

jamil
Download Presentation

Sonic Millip3De : Massively Parallel 3D Stacked Accelerator for 3D Ultrasound

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sonic Millip3De:Massively Parallel 3D Stacked Accelerator for 3D Ultrasound Richard Sampson* Ming Yang†Siyuan Wei† ChaitaliChakrabarti† Thomas F. Wenisch* *University of Michigan †Arizona State University

  2. Portable Medical Imaging Devices • Medical imaging moving towards portability • MEDICS (X-Ray CT) [Dasika ‘10] • Handheld 2D Ultrasound [Fuller ‘09] • Not just a matter of convenience • Improved patient health [Gunnarsson‘00, Weinreb ‘08] • Access in developing countries • Why ultrasound? • Low transmit power [Nelson ‘10] • No dangers or side-effects

  3. Handheld 3D Ultrasound • 3D has numerous benefits over 2D • Easier to interpret images • Greater volumetric accuracy • … as well as many challenges • 12k transducers, 10M image points • 10-20x beyond state of the art • High raw data bandwidth (6Tb/s) • Major bottleneck in state of the art • Tight handheld power budget (5W)

  4. Why a Custom Accelerator? • Software algorithms load/store intensive • von Neumann designs inefficient • Large system would require over 700 DSPs • General purpose CPUs even less efficient

  5. Contributions • Iterative delay calculation algorithm • Reduces storage by over 400x • Enables streaming data flow • Sonic Millip3De design • Leverages 3D die stacking technology • Transform-select-reduce accelerator framework • Power and image analysis of Sonic Millip3De • Negligible change in image quality • Able to meet 5W power budget by 11nm node

  6. Outline • Introduction • Ultrasound background • Algorithm design • System design • Sonic Millip3De • Select Sub-Unit • Results and analysis • Conclusions

  7. Ultrasound: Transmit and Receive Image Space Receive Raw Channel Data Receive Transducer Focal Points Transmit Transducer

  8. Ultrasound: Transmit and Receive

  9. Ultrasound: Transmit and Receive

  10. Ultrasound: Transmit and Receive

  11. Ultrasound: Transmit and Receive

  12. Ultrasound: Transmit and Receive

  13. Ultrasound: Transmit and Receive

  14. Ultrasound: Transmit and Receive

  15. Ultrasound: Transmit and Receive

  16. Ultrasound: Transmit and Receive

  17. Ultrasound: Transmit and Receive

  18. Ultrasound: Transmit and Receive

  19. Ultrasound: Transmit and Receive

  20. Ultrasound: Transmit and Receive Each transducer stores array of raw receive data

  21. Ultrasound: Image Reconstruction Image reconstructed from data based on round trip delay

  22. Ultrasound: Image Reconstruction Images from each transducer combined to produce full frame

  23. Delay Index Calculation • Iterate through all image points for each transducer and calculate delay index • Often done with lookup tables (LUTs) instead • 50 GB LUT required for target 3D system

  24. Challenges of Handheld 3D Ultrasound • Delay index LUT requires too much storage • New iterative algorithm reduces necessary constant storage by 400x • Peak raw data bandwidth (6Tb/s) infeasible • Sub-aperture multiplexing reduces peak data rate, but requires more transmits • Handheld power budget very tight (5W) • 3D stacked, highly parallel data streaming design reconstructs images efficiently

  25. Iterative Delay Index Calculation • Deltas between adjacent focal points on a scanline form smooth curve • Fit piecewise quadratic approx. to delta function • Two sections sufficient for negligible error Section 1 Section 2

  26. Sub-aperture Multiplexing • Peak raw data bandwidth (6Tb/s) infeasible • Solution: sub-aperture multiplexing • Transmit multiple times from same location • Receive with subset of transducers(sub-aperture) • Sum images together • Prior work: reduce data rate • Our design: also reduces HW and power requirements

  27. System Design

  28. System Design Sonic Millp3De comprises 1,024 parallel pipelines

  29. System Design: Transducers Interchangeable CMOS transducer layer; can use older process

  30. System Design: ADC/Storage Separate storage layer to reduce wire lengths

  31. System Design: Transform-Select-Reduce Accelerator units in fast, low power process

  32. Select Sub-Unit Design Selects sample closest to each focal point using our algorithm

  33. Select Sub-Unit Design Section 1 Section 2 All delays for a scanlineestimated using 9 constants

  34. Select Sub-Unit Design Section 1 Section 2 A(n+1)2 + B(n+1) + C = (An2 + Bn + C) + 2An + (A+B) Adders calculate next iteration of quadratic approximation

  35. Select Sub-Unit Design Section 1 Section 2 Decrementor selects sample for next image focal point

  36. Select Sub-Unit Design Section 1 Section 2 Section decrementor indicates when to change constants

  37. Outline • Introduction • Ultrasound background • Algorithm design • System design • Sonic Millip3De • Select Sub-Unit • Results and analysis • Conclusions

  38. System Parameters

  39. Image Quality Comparison Simulations using Field II [Jensen ‘92, ‘95] Ideal Our Design (12 bit) 11 bit Our design has negligible difference from ideal system

  40. Power Analysis and Scaling Can meet 5W by 11nm node

  41. Conclusions • 3D die stacked Sonic Millip3De design is able to meet 5W power budget by 11nm • Algorithm/HW co-design enables order-of-magnitude gains • Power and output quality goals often in conflict • Need guidance from domain experts to balance • Architects have much to offer for application-specific system designs

  42. Questions? Special thanks to: Brian Fowlkes Oliver Kripfgans Ron Dreslinski

More Related