Communication Architecture Synthesis

Communication Architecture Synthesis Alessandro Pinto, University of California at Berkeley apinto@eecs.berkeley.edu

Introduction: what • Communication synthesis • What is the “best way” of transferring information between entities • Problem Formulation • Who wants to communicate with whom • How he aspects the communication mechanism to behave • What is possible to built today • How to build a communication architecture that is cheap and satisfies the entities constraints while being feasible

Introduction: why • On-Chip application • Lots of components connected together • Distance is becoming important • There isn’t a formal approach to the problem • We build tools which are the glue in the platform-based design approach • More or less everything has been done in the design of “functions”

Outline • Overview • Internet vs. On-chipnet • Standard communication structures • VSIA + Standard comm. Platform • Constraint-Driven Communication Synthesis • A theoretic approach (A.Pinto,L.P.Carloni.ASV) • Current research • Conclusion

Scale Wire Scaling “As designs scale to newer technologies, they get smaller and their wires get shorter, and the relative change in the speed of wires to the speed of gates is modest.” M.A.Horowitz et.al “The future of wires” “The real wire problem arises with increasing chip complexity and global communication costs.” M.A.Horowitz et.al “The future of wires” Local Wire Global Wire

What is going to happen • From few closed connected components to a lot of distributed IP’s • A global net is most likely to be traveled in more that one clock cycle • High bandwidth and low power requirements • Fault tolerance requirement • There is need for a correct by construction methodology

Internet Network • Highly distributed • Fault Tolerant • Every node must reachable

Internet major concern was to guarantee connectivity in case of attack Cost is mostly static Buffers memory is not a problem (512MB is still cheap) Number and complexity of protocol layer is important only for performance issues Full connectivity is not the major concern Now cost is mostly dynamic (power) Registers are expensive The protocol must be very light In the future error correction might become a reality Networks vs. On-Chip Networks

The beginning of internet • Back in 1969: the ARPAnet SRI UTAH UCSB UCLA

The beginning of On-Chipnet • A lot of effort on on-chip communication • One of GSRC main theme: Communication-based design • Special section in conferences • Different approaches • Not always really networks • Buses are still widely used

Effect of standardization 1983: TCP/IP was introduced 1987: Commercialization of internet TCP/IP technology is transferred form Inventors to vendors

Standardization • VSIA: Virtual Socket Interface Alliance • Specifies Virtual Component Interface • Peripheral VCI • Basic VCI • Advanced VCI http://www.vsi.org A B VCI Initiator VCI Target VCI Target VCI Initiator Bus Master Bus Slave Any Bus

Standard “Bus” Architecture • ARM (ARM Ltd.)* • CoreFrame (Palmchip) • CoreConnect (IBM) • WichBone (Silicore) • SiliconBackPlane (Sonics)*

AMBA Bus Hierarchy UART Off-Chip RAM HP ARM Core On Chip RAM Bridge APB AHB DMA Keypad AHB: Advanced High-Performance Bus AB: Advanced Peripheral Bus

SiliconBackplane • It’s possible to specify constraints for each point to point communication • It’s possible to specify different OPC • Synthesis of the network that guarantees performances

CPU SDRAM MMC PCMCIA IrDA Bridge Bluetooth USB Based on Intel XScale A Today’s Platform Example Processor USB, MMC, IrDA, Bluetooth PCMCIA, SDRAM

Where are we headed? • Communication is specified in terms of point to point Constraints • A target implementation technology is abstracted into a Library • A synthesis procedure generates a communication architecture • That satisfies the constraints • That uses only the library elements

To Silicon A Platform Integrator Environment ArchPltBuilder RTOS RTOS μC1 μC1 μC2 μC2 DSP DSP HW HW HW MEM MEM MEM

Communication Topology Design Platform-based-design in communication

The Problem M 2 M 4 M 1 M 5 M 3

S.Dey et al., “System-Level Performance Analysis for Designing On-Chip Communication Architectures”, TCAD, Vol. 20, NO. 6, June 2001 S.Dey et al., “Efficient Exploration of the SOC Communication Architecture Design Space”,Int. Conf. On CAD, August 1999 J.M. Daveau et al., “Protocol Selection and Interface Generation for HW-SW Codesign”, TVLSI, Vol. 5, NO.1, March 1997 J.M. Daveau et al., “Synthesis of system level communication by an allocation based approach”, Int. Symposium on System Synthesis, 1995 Previous Work

Abstract Model • System modules communicate by means of point-to-point unidirectional channels • Each channel is characterized by a set of communication constraints(distance, minimum bandwidth) • The specific functionality of each module is “abstracted away” Our Approach

The Goal – Overview of the Method Constraint Propagation Performance Abstraction

b1 Cs(b) cost b2 Cs(b) Length Cr(b) Relay Station Switches clk Abstraction (Intuition) This is the basic library of Communication Components

M 1 M 5 Communication-Constraint Graph • G=(V,A) •  v  V, p(v) is the vertex position in our coordinate system •  a  A: • The arc distance d(a) • The arc bandwidth b(a) v1 v2 a

Library •  = L  N, communication links and communication nodes •  l  L: • d(l) is the link length • b(l) is the link bandwidth • c(l) link cost •  n  N • c(n) is the cost of the communication node  l1 l2 l3 n r

Communication Implementation G=(V,A)  (1,2) (1,2) (2,2) (1,3) (b(a),d(a))=(4,5) n r

Communication Implementation Arc Matching  (1,2) (2,2) (1,3) (4,5) n r

Communication Implementation Arc Segmentation  (1,2) (2,2) (4,5) n r

Communication Implementation Arc Duplication + Arc Segmentation  (1,2) (2,2) n r

Communication Implementation Arc Merging  (1,2) (2,2) (1,3) n r

Implementation Graph • G’(G, ) = (V’  N’, A’) •  v’  V’, v’  V • n’N’, n’ is an instance of an element of N • a’A’, a’ is an instance of an element of L •  a(u,v)  A,  P(a) set of paths s.t. • P(a) connects u’ with v’ without passing through any other computational vertex • P(a) satisfies the bandwidth constraint of a • The cost of and implementation graph is

TheOptimization Problem

Assumptions • Assumption on the Library • d(a1) > d(a2)  b(a1) > b(a2) c(a1) > c(a2) • K-Way Merging structure q*

Candidate Implementations Generate only k-way mergings that could be part of the final implementation Derive mergeability conditionsbased only on positions and bandwidth of the constraint graph and the library

2-Way Mergeablility u1 v1 v2 u2 u1 v1 u2 v2

d(a1) + d(a2) || p(u1) – p(u2)|| + || p(v1) – p(v2)|| a1 || p(u1) – p(u2)|| || p(v1) – p(v2)|| a2 2-Way Mergeablility  {a1, a2} not mergeable u1 v1 v2 u2

(k-1)d(ak) + d(ai) a1 || p(ui) – p(uk)|| + || p(vi) – p(vk)|| || p(u1) – p(uk)|| || p(v1) – p(vk)|| ak 2-Way Mergeablility  {a1… ak} not mergeable u1 v1 vk uk

n-Way Mergeablility b(ai)b(al) + b(aj) {a1… ak} not mergeable u1 v1 vk uk

Expansion Rule If a  A is not mergeable with anyset of k-1 arcs in A, then a is not mergeable with any set of k arcs in A Akmax A a

1 2 3 Summary of Pruning Conditions • K-way mergeablility condition over distance • K-way mergeablility condition over bandwidth • Kk+1 mergeability condition All thepruning conditions are based on Positions of ports and Properties of arcs in theConstraint Graph and maximum bandwidth in theLibrary

Solving the Optimization Problem

= (k-1)d(aj) + d(ai) Constrained Distance Sum Matrix Dij= d(ai) + d(aj) aj ak am Dkj + Dmj = d(ak) + d(aj)+ d(am) + d(aj) = 2d(aj) + d(ak) + d(am)

= || p(ui) – p(uk)|| + || p(vi) – p(vk)|| Merging Distance Sum Matrix ij= ||p(ui) – p(uj)|| + ||p(vi) - p(vj)|| aj ak am kj + mj = ||p(uk) – p(uj)|| + ||p(vk) - p(vj)|| + ||p(um) – p(uj)|| + ||p(vm) - p(vj)||=

1 2 3 Algorithm K  1; foundcanditatemapping  TRUE; while (foundcanditatemapping) { for allcolumn jcol() kWayMergingNotFound TRUE; for allsubset of row R={i1…ik}  row() if sum(D,R) < sum(,R) if  l  L s.t. b(l) + bmin(R,j) < sum(B,R) + B[j] { S  S  kmergingimplementation(R,j); kWayMergingNotFound FALSE } if (kWayMergingNotFound) col()  col() \ j; if col() =  foundcanditatemapping FALSE; else k  k+1;}

1 1 0.8 2 1 2.8 2 2 2 A Simple Example D (0,1) (0.4,1) (1.4,1) a1 a2 a3 a1 a1 a2 a3 a2 a3 (0,0) (0.4,0) (1.4,0) B  a1 a2 a3 a1 a2 a3 a1 a2 b(l)=2 d(l)=1 c(l)=1 a3 L

1 1 0.8 2 1 2.8 2 2 2  A Simple Example D (0,1) (0.4,1) (1.4,1) a1 a2 a3 a1 a1 a2 a3 a2 a3 (0,0) (0.4,0) (1.4,0) B  a1 a2 a3 a1 a2 a3 a1 a2 b(l)=2 d(l)=1 c(l)=1 a3 L

Communication Architecture Synthesis