1 / 20

Data Communication Estimation and Reduction for Reconfigurable Systems

Explore strategies to minimize data communication in hardware descriptions, focusing on the efficient mapping of high-level language applications to hardware. Learn about control nodes, SSA algorithms, and optimization techniques for reducing data movement.

tamekae
Download Presentation

Data Communication Estimation and Reduction for Reconfigurable Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Communication Estimation and Reduction for Reconfigurable Systems Adam Kaplan Philip Brisk Ryan Kastner Computer Science Elec. and Computer Engineering University of California, Los Angeles University of California, Santa Barbara June 4, 2003

  2. From Algorithm to HDL Application specified in system-level language • We focus our efforts on mapping an application written in a high-level language to a hardware description. • We desire this mapping to have optimal characteristics (area, latency, etc.) • In this talk, we focus on the problem of minimizing data communication in the final hardware. HDL (behavioral, structural) Compiler Synthesis and Physical Design

  3. Similar Compilation Projects Hardware compilers • Reconfigurable Architecture • PRISM project – synthesize subset of C to FPGA • Garp compiler (BRASS) – synthesize C toprocessor + FPGA platform • DEFACTO – synthesize SUIF to FPGA (Wildstar) • General Architecture • DeepC compiler – synthesize C to HDL • MATCH compiler – synthesize Matlab to HDL • PICO – synthesize nested loops into VLIW-like functional unit

  4. Our Framework C Code Control Node 1 Control Node 2 Control Node 3 Control Node 4 • From the SUIF IR, we construct a CDFG representation. • Each basic block of the CDFG becomes a separate synthesizable module in the hardware description. SUIF/ MachSUIF Compiler Control Data-Flow Graph (CDFG) Hardware Description

  5. Characterizing Data Communication • Two examples of data communication schemes Control Node 1 Memory (Register Bank, RAM) Control Node 1 Bus Control Node 2 Control Node 3 Control Node 2 Control Node 3 Control Node 4 Control Node 4 Distributed Centralized data communication = wire data communication = storage access

  6. Identifying Data Communication • Global Data Communication = 5 variables • Determine relationship between place(s) where data is defined and where data is used a  … • Naïve method: all use-points of a variable depend on all definitions of that variable • Not all use points “use” a variable b  … a  … b  … a  … c  …  b  c  a Need analysis to minimize the amount of data communication

  7. Minimizing Data Communication a1 … a  … b1 … b  … a2 … a  … b2 … b  … a3 … a  … c1 … c  …  b  b1  c1  c a4 (a2,a3)  a4  a • Must determine relationship between where data is generated and where data is used • Problem formulation: minimize the total number of bits communicated between all pairs of control nodes • SSA (Static Single Assignment) • Changes each variable to have a unique definition point • Must add -nodes to merge definitions

  8. Using SSA to Minimize Data Communication Semi-Pruned Minimal Pruned a1 … a1 … a1 … b1 … b1 … b1 … a2 … a2 … a2 … b2 … b2 … b2 … a3 … a3 … a3 … c1 … c1 … c1 …  b1  b1  b1  c1  c1  c1 a4 (a2,a3) a4 (a2,a3) a4 (a2,a3) b3 (b1,b2) b3 (b1,b2) c2 (c1)  a4  a4  a4 • SSA algorithms • Find location of -nodes • Rename variables • Three main SSA algorithms • Minimal, Pruned – Cytron et al. • Semi-pruned – Briggs et al. • Differ in number and location of -nodes • Minimal – insert -nodes at iterated dominance frontier (IDF) • Semi-pruned – insert -node at IDF if variable live outside some basic block • Pruned – insert -node at IDF if variable live at that time

  9. Experimental Setup CDFG CDFG in SSA form HDL Generation Synopsys Behavioral / Design Compiler SSA Conversion

  10. MediaBench Benchmark Suite • A benchmark suite of DSP applications[Lee et al] • DSP Applications well suited to hardware implementation • Tend to: • be parallelizable • be computationally intensive • often have large basic blocks for (y_pos=ygrid_start-y_fmid-1,res_pos=0; y_pos<0; y_pos+=ygrid_step) { for (x_pos=xgrid_start-x_fmid-1; x_pos<0; x_pos+=xgrid_step,res_pos++) { (*reflect)(filt,x_fdim,y_fdim,x_pos, y_pos,temp,FILTER); sum=0.0; for (y_filt_lin=x_fdim,x_filt=y_im_lin=0; y_filt_lin<=filt_size; y_im_lin+=x_dim,y_filt_lin+=x_fdim) for (im_pos=y_im_lin; x_filt<y_filt_lin; x_filt++,im_pos++) sum+=image[im_pos]*temp[x_filt]; result[res_pos] = sum; } first_col = x_pos+1; (*reflect)(filt,x_fdim,y_fdim,0,y_pos,temp,FILTER); Sample code: internal filter of an image convolver

  11. Results: SSA for Data Comm. Minimization • Edge Weight w(i,j)– number of bits communicated from node i to j • Total Edge Weight (TEW) - corresponds to amount of data communication

  12. Results: SSA for Area Minimization

  13. Relationship Between -nodesand Data Communication

  14. Further Minimizing Data Communication a1 … a1 … b1 … b1 … a2 … a2 … b2 … b2 … a3 … a3 … c1 … c1 …  b1  b1  c1  c1 a4 (a2,a3) a4 (a2,a3) TEW = 4  a4  a4 • Current SSA algorithms place -nodes temporally • In software compilation, live ranges should be short. • Appropriate in hardware? Spatial -node distribution Temporal -node distribution a1 … b1 … a2 … b2 … a3 … c1 …  b1  c1 TEW = 3 a4 (a2,a3)  a4

  15. Effect of -node Distribution Spatial -node placement Temporal -node placement

  16. Spatial -nodes Distribution Algorithm • d – number of uses of -node destination • s – number of -node source values • Number of temporal links • Number of spatial links s = 3 a3(a0,a1,a2)  a3  a3 d = 2

  17. Spatial SSA Results – Num. Spatial -nodes

  18. Spatial SSA Results – TEW after spatial SSA

  19.  area After Spatial SSA (from Synopsys)

  20. Conclusion • In this work, we demonstrate a mapping from compiler IR (CDFG) to hardware description. • SSA binds variables to values, which is useful in reducing data communication between control nodes. • Spatial distribution of phi nodes can reduce data communication, modeled as total edge weight (TEW)by as much as 20%. • However, circuit area sometimes increases… • Future research: refine the model using information fromlater stages of synthesis. • Compiler techniques applied to hardware design can greatly reduce data communication.

More Related