1 / 9

A systolic array for a 2D-FIR filter for image processing

A systolic array for a 2D-FIR filter for image processing. Sebastian Siegel ECE 734 . Outline. Why Systolic Arrays (SA)? Design Issues Approach Solution Result. Why Systolic Arrays? (1). 4-level nested do-loop:. Why Systolic Arrays? (2).

absolom
Download Presentation

A systolic array for a 2D-FIR filter for image processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A systolic array for a 2D-FIR filter for image processing Sebastian Siegel ECE 734 

  2. Outline • Why Systolic Arrays (SA)? • Design Issues • Approach • Solution • Result

  3. Why Systolic Arrays? (1) • 4-level nested do-loop:

  4. Why Systolic Arrays? (2) • Sequential execution on one MAC requires too much timeExample: image: 512x512, filter: 3x32.3 Million operations @ 10 Mhz = 0.23 s • Algorithm in nested do-loop structure Single Assignment Format possible Parallel execution possible • Systematic approach vs. “rocket science”

  5. Design Issues • Recall: • Avoid multiple access to the same databy pipelining it • Minimize execution time and registers • Maximize Usage of Processing Elements (PEs)

  6. Approach (1) Steps: • Rewrite Algorithm in Single Assignment Format (SAF) • Draw and examine Dep. Graph (DG) • Map DG to SA by generating suitable solutions and chose an optimal oneProblem: SA too big  partitioning  data reaccessed or cache needed

  7. Approach (2) Partitioning of the DG generates even more (and better) solutions:

  8. Solution

  9. Result • Fully pipelined SA • 100% PE utilization • SA can be partitioned with relatively small cache and 100% data reuse or without cache and high data reuse • PEs and their interconnections (# of registers per pipeline) independent of filter size • Low latency for the results • Constant I/O rate • Fast MATLAB® implementation

More Related