360 likes | 537 Views
Tightly-Coupled Multi-Layer. Topologies for 3D NoCs. Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi (NII, JAPAN) Hideharu Amano (Keio Univ, JAPAN). Outline. Network-on-Chip (NoC) Typical 2D topologies 2D vs. 3D XNoTs New class of 3D topologies Definition, Examples
E N D
Tightly-Coupled Multi-Layer Topologies for 3D NoCs Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi (NII, JAPAN) Hideharu Amano (Keio Univ, JAPAN)
Outline • Network-on-Chip (NoC) • Typical 2D topologies • 2D vs. 3D • XNoTs • New class of 3D topologies • Definition, Examples • Deadlock-free routing • Evaluations • Throughput • Area, Energy consumption
Tile architectures MIT RAW Texas U. TRIPS Intel 80-tile NoC Various topologies Mesh, Torus, Tree Large impact on energy, cost, and performance Tile = Processing core + On-chip router Packet switched network Network-on-Chip (NoC) [Taylor, Micro’02] [Buger, Computer’04] [Vangal, ISSCC’07] An example of tile architecture (ASPLA 90nm CMOS process)
2-D Mesh 2D Topologies: Mesh & Torus • 2-D Torus • 2x bandwidth of mesh RAW [Taylor, IEEE Micro’02] Router Core
Fat Tree (p, q, c) 2D Topologies: Fat Tree p: # of upward links q: # of downward links c: # of core ports Network topology should be carefully selected so as to meet the requirements of application Fat Tree (2,4,1) Fat Tree (2,4,2) Router Core
2D NoCs Long wires, distance Wire delay Packets consume power at links according to their wire length 3D NoCs Several small wafers or dices are stacked Vertical link Micro bump Through-wafer via Very short (10-50um) 2D NoC vs. 3D NoC Long horizontal wires in 2D NoCs can be replaced by very short vertical links in 3D NoCs [Ezaki, ISSCC’04] [Burns, ISSCC’01]
Different circuits on each tier Different topologies on each tier How to connect different planar topologies? How to route packets in heterogeneous 3D NoCs? We propose a class of topology for heterogeneous 3D NoCs 3D NoCs that have heterogeneous tiers Custom logic Fat Tree(2,4,1) Tier-3 Cache memory Ring Tier-2 Processor array 2D-Mesh Tier-1 (*) A tier refers a wafer or a die in 3D ICs
Outline • Network-on-Chip (NoC) • Typical 2D topologies • 2D vs. 3D • XNoTs • New class of 3D topologies • Definition, Examples • Deadlock-free routing • Evaluations • Throughput • Area, Energy consumption Multiple network layers are tightly connected by vertical crossbar switches
Vertical bus Merit Small # of vertical link Demerit Low peak performance Vertical crossbar Merit Similar performance to true crossbar Reasonable # of vertical links Existing vertical link designs [Li, ISCA’06] [Kim, ISCA’07] Single bus (only a single transfer at the same time) Segmented buses (multiple transfers at the same time) We assume to use crossbar-based vertical link for 3D NoCs
XNoTs: Multiple planar topologies Connected by crossbars Network-on-Tier (NoT) A planar topology Implemented on a tier Bottom NoT provides connectivity to all cores XNoTs:Xbar-connected Network-on-Tiers Each core and router have a port for a vertical connection Network-on-Tier XNoTs Network-on-Tier A mesh-based NoT Network-on-Tier Router Core
XNoTs: Multiple planar topologies Connected by crossbars Network-on-Tier (NoT) A planar topology Implemented on a tier Bottom NoT provides connectivity to all cores Vertical crossbar XNoTs:Xbar-connected Network-on-Tiers All routers and cores in a same pillar are connected by a crossbar pillar A mesh-based NoT A mesh-based XNoTs Router Core
Examples: all tiers have same topology Mesh-based XNoTs Ring-based XNoTs Tree-based XNoTs
Side view Side view Side view Examples: all tiers have same topology Mesh-based XNoTs Ring-based XNoTs Tree-based XNoTs All routers and cores in a same pillar are connected by a crossbar
Different topologies are used in each tier Examples: Heterogeneous XNoTs (1) Fat Tree(2,4,1) Ring 2D-Mesh
Different topologies are used in each tier Side view Examples: Heterogeneous XNoTs (1) Fat Tree(2,4,1) Ring 2D-Mesh
All tiers cannot provide connectivity to all cores Except for the bottom tier (i.e., “escape” tier) No connectivity Examples: Heterogeneous XNoTs (2) Packets are transferred via bottom tier (tier-1) Top tier (Some links are disconnected) Bottom tier (Full connectivity to all cores) (*) Only the bottom tier must provide full connectivity to all cores
All tiers cannot provide connectivity to all cores Except for the bottom tier (i.e., “escape” tier) Examples: Heterogeneous XNoTs (2) Packets are transferred via bottom tier (tier-1) Top tier (Some links are disconnected) Bottom tier (Full connectivity to all cores) (*) Only the bottom tier must provide full connectivity to all cores
OK! NG! XNoTs: Deadlock-free routing • Intra-tier comm. (X and Y directions) • Existing deadlock-free routing is used within a tier • Only tier-0 must guarantee connectivity to all cores • Inter-tier comm. (Z direction) • Turns from lower-tier to higher-tier are prohibited • Unless the next hop is final destination E.g., dimension-order routing (DOR) Mesh based XNoTs Top view Side view
XNoTs routing Multiple tiers are available Alternative paths are available Path selection policy How to select a single path? Random selection Good load balancing 5-hop 5-hop 5-hop XNoTs: Path selection (random) We also proposed some policy based path selection policies. For more detail, please refer to the paper. Mesh based XNoTs Top view Side view
Outline • Network-on-Chip (NoC) • Typical 2D topologies • 2D vs. 3D • XNoTs • New class of 3D topologies • Definition, Examples • Deadlock-free routing • Evaluations • Throughput • Area, Energy consumption
X-Mesh (4x4 Mesh) x 4 layers X-Torus (4x4 Torus) x 4 layers X-FT141 Fat Tree(1,4,1) x 4 layers X-FT241 Fat Tree(2,4,1) x 4 layers X-FT441 Fat Tree(4,4,1) x 4 layers Evaluation: Target topologies (64-core) X-Mesh Fat Tree (p, q, c) p: # of upward links q: # of downward links c: # of core ports These five topologies are compares with 3D Mesh/Torus
Grid-based topologies 3D-Mesh, X-Mesh 3D-Torus, X-Torus Dimension-order routing Tree-based topologies X-FT141, X-FT241 X-FT441 Up*/down* routing Path selection policy Random Throughput: Simulation environment (Two virtual channels for tori) X-Mesh (4x4x4)
X-Torus X-Mesh X-FT441 X-FT241 X-FT141 Throughput: Simulation results • 3D-Torus • 3D-Mesh • 3D-Torus • 3D-Mesh Grid-based XNoTs Tree-based XNoTs No degradation (X-Mesh = 3D-Mesh, X-Torus = 3D-Torus)
Network area Routers & NIs Inter-tier vias Synthesis of NoC 64-core (16-core x 4) 0.18um CMOS Router architecture 1-flit = 32-bit Wormhole switching 4-stage pipeline Inter-tier vias 1-10um square 25um per layer per 1-bit signal Network logic area Arbiter Buf [Burns, ISSCC’01] [Li, ISCA’06] Buf 2 Input Ports Crossbar Inter-tier via area is calculated according to # of vertical links Typical wormhole router [Matsutani, IPDPS’07]
Synthesis of NoC 64-core (16-core x 4) 0.18um CMOS Router architecture 1-flit = 32-bit Wormhole switching 4-stage pipeline Inter-tier vias 1-10um square 25um per layer per 1-bit signal Network logic area: Results 3D Mesh/Torus require 2-port for vertical (i.e., up & down) [Burns, ISSCC’01] [Li, ISCA’06] 2 2 Network logic area [mm ] XNoTs require only 1-port for vertical (but # of xbar increases) Inter-tier via area is calculated according to # of vertical links
Ave. flit energy Send 1-flit to dest. How much energy[J] ? Parameters 6mm square chip 64-core (16-core x 4) 0.18um CMOS Switching energy 1-bit switching @ Router Gate-level sim 1.13 [pJ / hop] Link energy 1-bit transfer @ Link 0.67 [pJ / mm] Via energy 4.34 [fF / via] Energy: NoC’s energy model 6mm [Davis, DToC’05]
Parameters 6mm square chip 64-core (16-core x 4) 0.18um CMOS Switching energy 1-bit switching @ Router Gate-level sim 1.13 [pJ / hop] Link energy 1-bit transfer @ Link 0.67 [pJ / mm] Via energy 4.34 [fF / via] Energy: Simulation results Hop count is short in XNoTs low power Ave. Flit energy [pJ] [Davis, DToC’05]
Summary: 3D topologies - XNoTs • Requirements • Different circuits on each layer • Different topologies on each layer • How to connect/route them? • XNoTs • Tiers are connected by crossbars • Arbitrary tiers can be stacked • Current problem / future work • We assumed full crossbar as a baseline • More efficient implementation has been proposed by • We must revise router architecture Fat Tree Ring 2D-Mesh [Kim, ISCA’07]
Control packets In-order delivery is required Data packets In-order delivery is not required Large data streams XNoTs: Path selection (QoS) Deterministic routing Adaptive routing Duato’s Protocol (adaptive) Control packets use tier-1 Duato’s Protocol (adaptive) Dimension-order (deterministic) XNoTs (Side view)
Control packets In-order delivery is required Data packets In-order delivery is not required Large data streams XNoTs: Path selection (QoS) Deterministic routing Adaptive routing Duato’s Protocol (adaptive) Data packets use tier-2 or tier-3 Duato’s Protocol (adaptive) Dimension-order (deterministic) Various QoS controls are possible by path selection algorithm XNoTs (Side view)
Heat dissipation is crucial in 3D ICs Bottom tier Close to the board (good heat dissipation property) Bottom tier first Tier-0 is firstly used if there are alternative paths 3D IC board as heat-sink XNoTs: Path selection (bottom first) Bottom tier XNoTs (Side View)
Ideal throughput: Channel bisection • Number of unidirectional links that cross bisection No degradation (X-Mesh = 3D-Mesh, X-Torus = 3D-Torus)
3D Topologies: 3D-Mesh Tier-3 Tier-2 Tier-1 Tier-0 2D-Mesh (8x8=64) 3D-Mesh (4x4x4=64) Average hop count: 5.33 Channel bisection: 16 Number of routers: 64 Node degree: 5 Average hop count: 4.00 Channel bisection: 32 Number of routers: 64 Node degree: 7