Three-Dimensional Layout of On-Chip Tree-Based Networks

Three-Dimensional Layout of On-Chip Tree-Based Networks Hiroki Matsutani (Keio Univ, Japan) Michihiro Koibuchi (NII, Japan) D. Frank Hsu (Fordham Univ, USA) Hideharu Amano (Keio Univ, Japan)

Outline • Introduction • Network-on-Chip (NoC) • 2-D vs. 3-D • Fat Tree • 2-D layout • 3-D layout • Fat H-Tree • 2-D layout • 3-D layout • Evaluations • Area, Wire length, Energy [Matsutani, IPDPS’07]

Tile architectures MIT RAW Texas U. TRIPS Intel 80-tile NoC Various topologies Mesh, Torus Fat Trees Fat H-Tree (FHT) Network-on-Chip (NoC) Tile (core & router) [Taylor, Micro’02] [Buger, Computer’04] [Vangal, ISSCC’07] 16-core Tile architecture Packet switched network on a chip We proposed FHT as an alternative to Fat Trees [Matsutani, IPDPS’07]

2-D Mesh 2D Topologies: Mesh & Torus • 2-D Torus • 2x bandwidth of mesh RAW [Taylor, IEEE Micro’02] Router Core

Fat Tree (p, q, c) 2D Topologies: Fat Tree p: # of upward links q: # of downward links c: # of core ports Rank-1 Rank-2 In this talk, we focus on 3-D layout scheme of tree-based topologies Fat Tree (2,4,1) Fat Tree (2,4,2) Router Core

2D NoCs Long wires (esp. trees) Wire delay Packets consume power at links according to their wire length 3D NoCs Several small wafers or dices are stacked Vertical link Micro bump Through-wafer via Very short (10-50um) 2D NoC vs. 3D NoC Long horizontal wires in 2D NoCs can be replaced by very short vertical links in 3D NoCs [Ezaki, ISSCC’04] [Burns, ISSCC’01] Next slides show the 3D layout scheme of Fat Tree and FHT

Fat Tree (p, q, c) Fat Tree: 2-D layout p: # of upward links q: # of downward links c: # of core ports Fat Tree (2,4,1) Fat Tree (2,4,2) We preliminarily show the 3D layout scheme of Fat Trees Router Core

2-D coordinates 3-D coordinates Layer-0 Layer-1 Layer-2 Layer-3 Fat Tree: 3-D layout (4-split) transformation Top-rank routers are distributed to each layer Dividing into 4 layers Original 2-D layout

2-D coordinates 3-D coordinates Fat Tree: 3-D layout (4-split) transformation Top-rank links are replaced with vertical interconnects (10-50um) Layer-0 This 3-D layout is evaluated in terms of area, wire, & energy 3-D layout (4-stacked) Original 2-D layout

Outline • Introduction • Network-on-Chip (NoC) • 2-D vs. 3-D • Fat Tree • 2-D layout • 3-D layout • Fat H-Tree • 2-D layout • 3-D layout • Evaluations • Area, Wire length, Power [Matsutani, IPDPS’07]

Fat H-Tree Red Tree (H-Tree) Black Tree (H-Tree) Location of black tree is shifted lower-right direction of red tree Fat H-Tree: Structure [Matsutani, IPDPS’07] Combining two H-Trees (red & black) By shifting the location ofblack tree, the connection pattern of trees is different from the original Fat Trees Router Core Router Core

Fat H-Tree Red Tree (H-Tree) Black Tree (H-Tree) Fat H-Tree: Structure [Matsutani, IPDPS’07] Combining two H-Trees (red & black) Fat H-Tree is formed on red & black trees Router Core Router Core

Fat H-Tree Red Tree (H-Tree) Black Tree (H-Tree) Ring is formed with cores & rank1 routers Torus-level performance by combing only two H-Trees Fat H-Tree: Structure [Matsutani, IPDPS’07] Combining two H-Trees (red & black) Each core is connected to both red & black trees Rank-2 or upper routers are omitted in this figure Router Core Router Core

Fat H-Tree Torus structure  Folded as well as the folded layout of 2-D Torus Fat H-Tree: 2-D layout on VLSI [Matsutani, IPDPS’07] (Long feedback links across the chip) Topologically equivalent Fat H-Tree’s 2-D layout The next slides propose the 3D layout scheme of Fat H-Tree Router Core

Fat H-Tree (Problem) Fat H-Tree has a torus structure Folding so as to keep the torus structure Fat H-Tree: 3-D layout (overview) consisting of red & black trees (step 1) fold it horizontally (step 2) fold it vertically Until the # of folded pieces meets the # of layers the 3-D IC has E.g., four layers  fold twice

Fat H-Tree (Problem) Fat H-Tree has a torus structure Folding so as to keep the torus structure Fat H-Tree: 3-D layout (overview) consisting of red & black trees (step 1) fold it horizontally (step 2) fold it vertically Until the # of folded pieces meets the # of layers the 3-D IC has E.g., four layers  fold twice Here we show the 3D layouts of red & black trees separately

2-D coordinates 3-D coordinates Layer-0 Layer-1 Layer-2 Layer-3 3-D layout (4-stacked) Fat H-Tree: 3-D (Red tree; 4-split) transformation Original 2-D layout

2-D coordinates 3-D coordinates Fat H-Tree: 3-D (Red tree; 4-split) transformation Top-rank links are replaced with vertical interconnects (10-50um) Layer-0 3-D layout (4-stacked) Original 2-D layout

2-D coordinates 3-D coordinates They can be connected via only a vertical link Layer-0 Layer-1 Layer-2 Layer-3 3-D layout (4-stacked) Fat H-Tree: 3-D (Black tree;4-split) transformation Original 2-D layout

2-D coordinates 3-D coordinates 3-D layout (4-stacked) Fat H-Tree: 3-D (Black tree;4-split) transformation The periphery cores are connected to different layers Original 2-D layout

2-D coordinates 3-D coordinates Fat H-Tree: 3-D (Black tree;4-split) transformation Top-rank links are replaced with vertical interconnects (10-50um) Layer-0 The periphery cores are connected to different layers 3-D layout (4-stacked) Original 2-D layout

Fat H-Tree: 3-D layout (4-split) Layer-0 Layer-0 Layer-0 Red tree (3-D) Black tree (3-D) Fat H-Tree (3-D) The 3-D layout of Fat H-Tree can be formed by superimposing 3-D layouts of red & black trees

2-D layout 64-core 3-D layout 16-core x 4-layer Vertical interconnects Evaluations: 2-D vs. 3-D L mm L/2 mm

Network logic area: # of routers FT1: Fat tree(2,4,1) FT2: Fat tree(2,4,2) FHT: Fat H-Tree • 3-D mesh/torus: node degree 7 • Fat H-Tree: node degree 5 • Fat Tree (2,4,2): node degree 6 # of routers & their ports in trees are less than mesh/torus

Network logic area: 2-Dvs. 3-D • Network logic area • Routers, NIs • Inter-wafer vias • Wormhole router • 1-flit = 64-bit • 3-stage pipeline • Network interface • FIFO buffer • Packet forwarding (Fat H-Tree only) • Inter-wafer via • 1-10um square • 100um per layer per 1-bit signal Arbiter FIFO [Davis, DToC’05] 5x5 XBAR FIFO 2 Typical wormhole router [Matsutani, ASPDAC’08] Inter-wafer via area is calculated according to # of vertical links Synthesized with a 90nm CMOS

3D torus Inter-wafer via area (+7.8%) 2D torus Network logic area: Overhead of 3D Synthesis result of 64-core (16-core x 4) 3D layout of trees  area overheat is modest (at most 7.8%) FT1: Fat Tree(2,4,1) FT2: Fat Tree(2,4,2) FHT: Fat H-Tree

Total unit-length of links Core router Router router Total wire length of all links How many unit-links is required ? 1-unit = distance between neighboring cores 1-unit link 1-unit link

Total wire length of all links 1-unit FT1: Fat Tree(2,4,1) FT2: Fat Tree(2,4,2) FHT: Fat H-Tree

Total wire length of all links 1-unit 1-unit 4-stacked Wire length of trees is reduced by 25%-50% (close to torus) FT1: Fat Tree(2,4,1) FT2: Fat Tree(2,4,2) FHT: Fat H-Tree

Ave. flit energy Send 1-flit to dest. How much energy[J] ? Parameters 8mm square chip 64-core (16-core x 4) 90nm CMOS Switching energy 1-bit switching @ Router Gate-level sim 0.183 [pJ / hop] Link energy 1-bit transfer @ Link 0.150 [pJ / mm] Via energy 4.34 [fF / via] Energy: NoC’s energy model 8mm [Davis, DToC’05]

Energy: Reduction by going 3D 2-D layout Frequent use of longest links Short hop count  less energy FT1: Fat tree(2,4,1) FT2: Fat tree(2,4,2) FHT: Fat H-Tree

Energy: Reduction by going 3D 2-D layout 3-D layout Moving distance of packets is reduced FT1: Fat tree(2,4,1) FT2: Fat tree(2,4,2) FHT: Fat H-Tree The 3D layout of trees reduces the energy by 30.8%-42.9%

Summary: 3-D layout of trees • Drawbacks of on-chip tree-based topologies • Long links around the root of tree • Wire delay problem • Repeater insertion  additional energy consumption • 3-D layout schemes of Fat Trees & Fat H-Tree • Wire length is reduced by 25%-50% • Area overhead is at most 7.8% • Flit transmission energy is reduced by 30.8%-42.9% In addition, energy-hungry repeater buffers can be removed Need to consider negative impacts of 3-D (cost,heat,yield…)

Thank you for your attention

Backup slides

Energy: Reduction by going 3D 2-D layout (w/o repeaters) 2-D layout (with repeaters) (*) Energy is increased FT1: Fat tree(2,4,1) FT2: Fat tree(2,4,2) FHT: Fat H-Tree (*) Repeater insertion model: N. Weste et.al, “CMOS VLSI Design (3rd ed)”, 2005.

Three-Dimensional Layout of On-Chip Tree-Based Networks

Three-Dimensional Layout of On-Chip Tree-Based Networks

Presentation Transcript

Networks-on-Chip

Networks-on-Chip

On-Chip Networks and Testing

Three-Dimensional Geometry

System Busses / Networks-on-Chip

Networks-on-Chip in a Three-Dimensional Environment: A Performance Evaluation IEEE Transaction on Computers, VOL. 58, NO

Towards Scalable, Energy-Efficient, Bus-Based On-Chip Networks

Networks-on-Chip

Lecture 16: On-Chip Networks

THREE DIMENSIONAL MEDIA

Three Dimensional Graphing

Three-Dimensional Figures

Networks on Chip

Synthesis of Application-Specific On-Chip Networks

On Three-dimensional Rotating Turbulence

Networks-on-Chip

On-Chip Communication: Networks on Chip (NoCs)

Networks-on-Chip

On Tree-Based Convergecasting in Wireless Sensor Networks

Three-Dimensional Layout of On-Chip Tree-Based Networks

Networks-on-Chip

Networks on Chip