1 / 37

A Novel 3D Layer-Multiplexed On-Chip Network

A Novel 3D Layer-Multiplexed On-Chip Network. Rohit Sunkam Ramanujam Bill Lin Electrical and Computer Engineering University of California, San Diego. Presenter: Anjie Cao. Networks-on-Chip. Chip-multiprocessors (CMPs) increasingly popular 2D-mesh networks often used as on-chip fabric.

alta
Download Presentation

A Novel 3D Layer-Multiplexed On-Chip Network

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Novel 3D Layer-Multiplexed On-Chip Network Rohit Sunkam Ramanujam Bill Lin Electrical and Computer Engineering University of California, San Diego Presenter: Anjie Cao

  2. Networks-on-Chip • Chip-multiprocessors (CMPs) increasingly popular • 2D-mesh networks often used as on-chip fabric 12.64mm I/O Area single tile 1.5mm 2.0mm 21.72mm Tilera Tile64 Intel 80-core I/O Area

  3. 3D Integrated Circuits Through Silicon Via Device layer 2 ≥ 2 active device layers Short inter-layer distances Device layer 1 • Reduced chip footprint • Reduced wire delays • High inter-layer bandwidth • High transistor packing density

  4. Natural Progression: 3D Mesh for 3D chip-multiprocessor 3D Mesh 2D Mesh What routing algorithms to use for 3D mesh networks?

  5. Outline Oblivious routing on a 3D mesh Layer-multiplexed 3D architecture Evaluation

  6. Oblivious Routing Objectives • Maximize throughput • Distribute traffic evenly on network links • Maximize worst-case throughput as traffic is application dependent • Minimize hop count • Minimize routing delay between source and destination • Reduce power

  7. Routing Algorithms for 3D Mesh Networks • Valiant Routing • Optimal worst-case throughput • Poor latency 2 VAL • Dimension Ordered Routing • Minimal latency • Poor worst-case throughput • Ideal routing algorithm • Minimal latency • Maximum worst-case throughput Average hop count (normalized to minimal) 1 IDEAL DOR O1TURN • O1TURN Routing • Minimal latency • Poor worst-case throughput 0.5 0.25 Worst-case throughput (fraction of network capacity)

  8. Randomized Partially-Minimal Routing (RPM) Z Y X Random intermediate layer Destination Source Phase-2Z Intermediate layer to the destination Phase-1Z Source to the intermediate layer XYorYX routing on the intermediate layer

  9. Main Idea • Load-balance uniformly across the vertical layers • 2 phases of vertical routing • Min XY/YX used on each layer

  10. Routing Algorithms for 3D Mesh Networks 2 VAL • Randomized Partially Minimal Routing • Near-optimal worst-case throughput • Low latency Average hop count (normalized to minimal) RPM 1.1 1 IDEAL DOR O1TURN 0.5 0.25 Worst-case throughput (fraction of network capacity)

  11. RPM has Near-optimal Worst-case Throughput RPM is optimal for even radix, within 1/k2 of optimal for odd radix.

  12. Performance of RPM:Average-case Throughput

  13. Outline Oblivious routing on a 3D mesh Layer-multiplexed (LM) 3D architecture Evaluation

  14. Unique Features of 3D ICs 50μm TSV • Inter-layer distances are very small (~50 μm) • Order of magnitude lower than distances between adjacent tiles on a 2D plane (~1500 μm) • Vertical interconnects implemented using Through-Silicon-Vias (TSVs) have very low delay 1500μm

  15. Unique Features of 3D ICs 4 μm • Inter-layer distances are very small (~50 μm) • Order of magnitude lower than distances between adjacent tiles on a 2D plane (~1500 μm) • Vertical wires using Through-Silicon-Vias (TSVs) have very low delay • Vertical bandwidth abundant as TSVs can be densely packed in 2D with small via pitch (~4 μm) 4 μm

  16. Unique Features of 3D ICs • Inter-layer distances are very small (~50 μm) • Order of magnitude lower than distances between adjacent tiles on a 2D plane (~1500 μm) • Vertical wires using Through-Silicon-Vias (TSVs) have very low delay • Vertical wiring abundant as TSVs can be packed in 2D with small via pitch (~4 μm) • Number of device layers likely to remain small (4-5 layers) due to thermal and manufacturing issues

  17. RPM on a 3D Mesh Z Y X Random intermediate layer Destination Source Phase-2Z Intermediate layer to the destination Phase-1Z Source to the intermediate layer * XYorYX routing on the intermediate layer

  18. Proposed Layer-Multiplexed Architecture Phase-2Z Intermediate layer to the destination Phase-1Z Source to the intermediate layer Y Z X RPM routing adapted to the LM architecture : RPM-LM P1 Random intermediate layer P2 P1 P3 P2 P4 P3 Destination * P4 XYorYX routing on the intermediate layer Ejection stage Source Injection stage

  19. Power and Area Savings P1 P2 . . . P3 P1 P1 P2 P2 P4 P3 P3 Conventional 3D Mesh Layer-Multiplexed Architecture P4 P4 • Decouple vertical routing from horizontal routing • Restrict vertical routing to packet injection and packet ejection Packet injection demultiplexer Packet ejection multiplexer • 5x5 crossbar in LM vs. 7x7 crossbar in 3D mesh

  20. Single Hop Vertical Communication • Single hop vertical routing more power efficient than one-layer-per-hop routing • Leverages short inter-layer distances in 3D ICs • Better utilizes available vertical bandwidth

  21. Packet Injection Demultiplexer Route Selection/Load Balancing VC Allocation Credits in from the injection port of routers on layers 1-4 Flit Counters Make sure the traffic uniformly distributed across k layers Switch Arbitration To the injection port of the Layer 1 router P1 . . . P2 Records the total number of flits sent from input to output P3 To the injection port of the Layer 4 router P4

  22. Selection Logic Assign each input with a unique priority. Served the input with high priority first. If we have a new head flit injected from Processor n to layer m, we will select the lowest flit count (n,m)

  23. Packet Ejection Multiplexer Credits out for L1-P1, L2-P1, L3-P1 and L4-P1 Arbiter VCID L1-P1 P1 L2-P1 Router on Layer 1 Packets from layer2 L3-P1 Packets from layer3 Packets from layer4 L4-P1 . . . P2 P3 Credits out for L1-P4, L2-P4, L3-P4 and L4-P4 Multiplexer: decide to accept or not base on its VC ID Arbiter L1-P4 P4 Packets from layer2 L2-P4 Packets from layer3 L3-P4 Packets from layer4 L4-P4

  24. Outline • Oblivious routing on a 3D mesh • Layer-multiplexed 3D architecture • Evaluation • Power and Area • Performance

  25. Power and Area Evaluation • Used Orion 2.0 models for router power and area estimation. • 65nm process at 1V and 1GHz • Buffers • 4VCs/port, 5flits/VC for routers • 5 flits/port for packet injection demultiplexer • 5 flits/port for each packet ejection multiplexer

  26. Power Comparison Setup • 3D mesh • One 7-port router per tile • LM • One 5-port router per tile • One packet injection demultiplexer for every 4 tiles • One packet ejection multiplexer per tile

  27. Power Evaluation 27% power reduction

  28. Area Evaluation 26.5% area reduction

  29. Outline • Oblivious routing on a 3D mesh • Layer-multiplexed 3D architecture • Evaluation • Power and Area • Performance

  30. RPM on a 3D mesh vs. RPM-LM • Worst-case throughput • RPM-LM achieves same (near-optimal) worst-case throughput as RPM • Average-case throughput

  31. Flit-Level Simulation Setup • Ideal throughput evaluation assumes • Ideal single-cycle router • Infinite buffers • No contention in switches, no flow control • Flit-level simulation • PopNet network simulator • 5stage router pipeline • Credit-based flow control • 8 virtual channels, each 5 flits deep • Multi-flit packets injected into the network (5 flits/packet)

  32. Flit-Level Simulation (cont’d) • Network configurations simulated • 4 x 4 x 4 mesh • 8 x 8 x 4 mesh • Four different traffic traces used • Uniform traffic: packet sent to a destination chosen at uniform random • Transpose traffic: perform the left shifting on the destination (x,y,z) → (y,z,x) • Complement traffic: (x,y,z) → (k-x-1, k-y-1, k-z-1) • Worst Case traffic pattern for DOR (DOR-WC): (x,y,z) → (k-z-1, k-y-1, k-x-1)

  33. Uniform Traffic8x8x4 Mesh

  34. Transpose Traffic8x8x4 Mesh

  35. Worst-case Traffic for DOR8x8x4 Mesh

  36. Summary of Contributions Proposed a 3D Layer-multiplexed architecture which is an optimization of a 3D mesh Exploits the optimality of RPM together with the high vertical bandwidth enabled in 3D technology LM architecture consumes 27% less power, occupies 26% less area than a 3D mesh RPM-LM has comparable (marginally better) performance to RPM on a 3D mesh

  37. Thank you

More Related