Data Communication Estimation and Reduction for Reconfigurable Systems

Data Communication Estimation and Reduction for Reconfigurable Systems Adam Kaplan Philip Brisk Ryan Kastner Computer Science Elec. and Computer Engineering University of California, Los Angeles University of California, Santa Barbara June 4, 2003

From Algorithm to HDL Application specified in system-level language • We focus our efforts on mapping an application written in a high-level language to a hardware description. • We desire this mapping to have optimal characteristics (area, latency, etc.) • In this talk, we focus on the problem of minimizing data communication in the final hardware. HDL (behavioral, structural) Compiler Synthesis and Physical Design

Similar Compilation Projects Hardware compilers • Reconfigurable Architecture • PRISM project – synthesize subset of C to FPGA • Garp compiler (BRASS) – synthesize C toprocessor + FPGA platform • DEFACTO – synthesize SUIF to FPGA (Wildstar) • General Architecture • DeepC compiler – synthesize C to HDL • MATCH compiler – synthesize Matlab to HDL • PICO – synthesize nested loops into VLIW-like functional unit

Our Framework C Code Control Node 1 Control Node 2 Control Node 3 Control Node 4 • From the SUIF IR, we construct a CDFG representation. • Each basic block of the CDFG becomes a separate synthesizable module in the hardware description. SUIF/ MachSUIF Compiler Control Data-Flow Graph (CDFG) Hardware Description

Characterizing Data Communication • Two examples of data communication schemes Control Node 1 Memory (Register Bank, RAM) Control Node 1 Bus Control Node 2 Control Node 3 Control Node 2 Control Node 3 Control Node 4 Control Node 4 Distributed Centralized data communication = wire data communication = storage access

Identifying Data Communication • Global Data Communication = 5 variables • Determine relationship between place(s) where data is defined and where data is used a  … • Naïve method: all use-points of a variable depend on all definitions of that variable • Not all use points “use” a variable b  … a  … b  … a  … c  …  b  c  a Need analysis to minimize the amount of data communication

Minimizing Data Communication a1 … a  … b1 … b  … a2 … a  … b2 … b  … a3 … a  … c1 … c  …  b  b1  c1  c a4 (a2,a3)  a4  a • Must determine relationship between where data is generated and where data is used • Problem formulation: minimize the total number of bits communicated between all pairs of control nodes • SSA (Static Single Assignment) • Changes each variable to have a unique definition point • Must add -nodes to merge definitions

Using SSA to Minimize Data Communication Semi-Pruned Minimal Pruned a1 … a1 … a1 … b1 … b1 … b1 … a2 … a2 … a2 … b2 … b2 … b2 … a3 … a3 … a3 … c1 … c1 … c1 …  b1  b1  b1  c1  c1  c1 a4 (a2,a3) a4 (a2,a3) a4 (a2,a3) b3 (b1,b2) b3 (b1,b2) c2 (c1)  a4  a4  a4 • SSA algorithms • Find location of -nodes • Rename variables • Three main SSA algorithms • Minimal, Pruned – Cytron et al. • Semi-pruned – Briggs et al. • Differ in number and location of -nodes • Minimal – insert -nodes at iterated dominance frontier (IDF) • Semi-pruned – insert -node at IDF if variable live outside some basic block • Pruned – insert -node at IDF if variable live at that time

Experimental Setup CDFG CDFG in SSA form HDL Generation Synopsys Behavioral / Design Compiler SSA Conversion

MediaBench Benchmark Suite • A benchmark suite of DSP applications[Lee et al] • DSP Applications well suited to hardware implementation • Tend to: • be parallelizable • be computationally intensive • often have large basic blocks for (y_pos=ygrid_start-y_fmid-1,res_pos=0; y_pos<0; y_pos+=ygrid_step) { for (x_pos=xgrid_start-x_fmid-1; x_pos<0; x_pos+=xgrid_step,res_pos++) { (*reflect)(filt,x_fdim,y_fdim,x_pos, y_pos,temp,FILTER); sum=0.0; for (y_filt_lin=x_fdim,x_filt=y_im_lin=0; y_filt_lin<=filt_size; y_im_lin+=x_dim,y_filt_lin+=x_fdim) for (im_pos=y_im_lin; x_filt<y_filt_lin; x_filt++,im_pos++) sum+=image[im_pos]*temp[x_filt]; result[res_pos] = sum; } first_col = x_pos+1; (*reflect)(filt,x_fdim,y_fdim,0,y_pos,temp,FILTER); Sample code: internal filter of an image convolver

Results: SSA for Data Comm. Minimization • Edge Weight w(i,j)– number of bits communicated from node i to j • Total Edge Weight (TEW) - corresponds to amount of data communication

Results: SSA for Area Minimization

Relationship Between -nodesand Data Communication

Further Minimizing Data Communication a1 … a1 … b1 … b1 … a2 … a2 … b2 … b2 … a3 … a3 … c1 … c1 …  b1  b1  c1  c1 a4 (a2,a3) a4 (a2,a3) TEW = 4  a4  a4 • Current SSA algorithms place -nodes temporally • In software compilation, live ranges should be short. • Appropriate in hardware? Spatial -node distribution Temporal -node distribution a1 … b1 … a2 … b2 … a3 … c1 …  b1  c1 TEW = 3 a4 (a2,a3)  a4

Effect of -node Distribution Spatial -node placement Temporal -node placement

Spatial -nodes Distribution Algorithm • d – number of uses of -node destination • s – number of -node source values • Number of temporal links • Number of spatial links s = 3 a3(a0,a1,a2)  a3  a3 d = 2

Spatial SSA Results – Num. Spatial -nodes

Spatial SSA Results – TEW after spatial SSA

 area After Spatial SSA (from Synopsys)

Conclusion • In this work, we demonstrate a mapping from compiler IR (CDFG) to hardware description. • SSA binds variables to values, which is useful in reducing data communication between control nodes. • Spatial distribution of phi nodes can reduce data communication, modeled as total edge weight (TEW)by as much as 20%. • However, circuit area sometimes increases… • Future research: refine the model using information fromlater stages of synthesis. • Compiler techniques applied to hardware design can greatly reduce data communication.

Data Communication Estimation and Reduction for Reconfigurable Systems

Data Communication Estimation and Reduction for Reconfigurable Systems

Presentation Transcript

Reconfigurable Computing - Pipelined Systems

N and P Reduction Estimation

Programming Model and Protocols for Reconfigurable Distributed Systems

Reconfigurable Communication System Design

Model Reduction for Parameter Estimation

Reconfigurable Systems Emerge

Instruction Generation for Hybrid Reconfigurable Systems

Synthesis for Partially Reconfigurable Computing Systems

Data and Estimation Issues

Data-carrier Aided Frequency Offset Estimation for OFDM Systems

Communication and Data Sharing for Dynamic Distributed Systems

ENG6530 Reconfigurable Computing Systems

Reconfigurable Communication System Design

Present and Future of Reconfigurable Systems

Data reduction for S3

Operating Systems for Reconfigurable Computing Systems

Data Communication Systems and Networks

ENG6530 Reconfigurable Computing Systems

High-Level Synthesis for Reconfigurable Systems

Data and Estimation Issues

Present and Future of Reconfigurable Systems