420 likes | 649 Views
Sonic Millip3De : Massively Parallel 3D Stacked Accelerator for 3D Ultrasound. Richard Sampson * Ming Yang † Siyuan Wei † Chaitali Chakrabarti † Thomas F. Wenisch * * University of Michigan † Arizona State University. Portable Medical Imaging Devices.
E N D
Sonic Millip3De:Massively Parallel 3D Stacked Accelerator for 3D Ultrasound Richard Sampson* Ming Yang†Siyuan Wei† ChaitaliChakrabarti† Thomas F. Wenisch* *University of Michigan †Arizona State University
Portable Medical Imaging Devices • Medical imaging moving towards portability • MEDICS (X-Ray CT) [Dasika ‘10] • Handheld 2D Ultrasound [Fuller ‘09] • Not just a matter of convenience • Improved patient health [Gunnarsson‘00, Weinreb ‘08] • Access in developing countries • Why ultrasound? • Low transmit power [Nelson ‘10] • No dangers or side-effects
Handheld 3D Ultrasound • 3D has numerous benefits over 2D • Easier to interpret images • Greater volumetric accuracy • … as well as many challenges • 12k transducers, 10M image points • 10-20x beyond state of the art • High raw data bandwidth (6Tb/s) • Major bottleneck in state of the art • Tight handheld power budget (5W)
Why a Custom Accelerator? • Software algorithms load/store intensive • von Neumann designs inefficient • Large system would require over 700 DSPs • General purpose CPUs even less efficient
Contributions • Iterative delay calculation algorithm • Reduces storage by over 400x • Enables streaming data flow • Sonic Millip3De design • Leverages 3D die stacking technology • Transform-select-reduce accelerator framework • Power and image analysis of Sonic Millip3De • Negligible change in image quality • Able to meet 5W power budget by 11nm node
Outline • Introduction • Ultrasound background • Algorithm design • System design • Sonic Millip3De • Select Sub-Unit • Results and analysis • Conclusions
Ultrasound: Transmit and Receive Image Space Receive Raw Channel Data Receive Transducer Focal Points Transmit Transducer
Ultrasound: Transmit and Receive Each transducer stores array of raw receive data
Ultrasound: Image Reconstruction Image reconstructed from data based on round trip delay
Ultrasound: Image Reconstruction Images from each transducer combined to produce full frame
Delay Index Calculation • Iterate through all image points for each transducer and calculate delay index • Often done with lookup tables (LUTs) instead • 50 GB LUT required for target 3D system
Challenges of Handheld 3D Ultrasound • Delay index LUT requires too much storage • New iterative algorithm reduces necessary constant storage by 400x • Peak raw data bandwidth (6Tb/s) infeasible • Sub-aperture multiplexing reduces peak data rate, but requires more transmits • Handheld power budget very tight (5W) • 3D stacked, highly parallel data streaming design reconstructs images efficiently
Iterative Delay Index Calculation • Deltas between adjacent focal points on a scanline form smooth curve • Fit piecewise quadratic approx. to delta function • Two sections sufficient for negligible error Section 1 Section 2
Sub-aperture Multiplexing • Peak raw data bandwidth (6Tb/s) infeasible • Solution: sub-aperture multiplexing • Transmit multiple times from same location • Receive with subset of transducers(sub-aperture) • Sum images together • Prior work: reduce data rate • Our design: also reduces HW and power requirements
System Design Sonic Millp3De comprises 1,024 parallel pipelines
System Design: Transducers Interchangeable CMOS transducer layer; can use older process
System Design: ADC/Storage Separate storage layer to reduce wire lengths
System Design: Transform-Select-Reduce Accelerator units in fast, low power process
Select Sub-Unit Design Selects sample closest to each focal point using our algorithm
Select Sub-Unit Design Section 1 Section 2 All delays for a scanlineestimated using 9 constants
Select Sub-Unit Design Section 1 Section 2 A(n+1)2 + B(n+1) + C = (An2 + Bn + C) + 2An + (A+B) Adders calculate next iteration of quadratic approximation
Select Sub-Unit Design Section 1 Section 2 Decrementor selects sample for next image focal point
Select Sub-Unit Design Section 1 Section 2 Section decrementor indicates when to change constants
Outline • Introduction • Ultrasound background • Algorithm design • System design • Sonic Millip3De • Select Sub-Unit • Results and analysis • Conclusions
Image Quality Comparison Simulations using Field II [Jensen ‘92, ‘95] Ideal Our Design (12 bit) 11 bit Our design has negligible difference from ideal system
Power Analysis and Scaling Can meet 5W by 11nm node
Conclusions • 3D die stacked Sonic Millip3De design is able to meet 5W power budget by 11nm • Algorithm/HW co-design enables order-of-magnitude gains • Power and output quality goals often in conflict • Need guidance from domain experts to balance • Architects have much to offer for application-specific system designs
Questions? Special thanks to: Brian Fowlkes Oliver Kripfgans Ron Dreslinski