120 likes | 146 Views
The Alpha 21364 Network Architecture. Mukherjee, Bannon, Lang, Spink, and Webb Summary Slides by Fred Bower ECE 259, Spring 2004. It’s A Small Paper…. …Packed With Detail Overview At High Level 21364 Chip Features and Built-In MP Constructs Network, Routing, and Router Basics More Depth
E N D
The Alpha 21364 Network Architecture Mukherjee, Bannon, Lang, Spink, and Webb Summary Slides by Fred Bower ECE 259, Spring 2004
It’s A Small Paper… • …Packed With Detail • Overview At High Level • 21364 Chip Features and Built-In MP Constructs • Network, Routing, and Router Basics • More Depth • Routing Policies • Deadlock Avoidance Via Routing Policies • What’s In A Router? • Discussion
21364 Overview • 21264 Core With MP Additions • MC = Memory Controller • Router • Directory-Based CC • Runs at Core Clock • Buffering Capability • 1.75 MB L2 Cache Figure 1: The Alpha 21364 Floorplan
The 21364 Network Topology • 2-d Torus • Limited Support for Imperfect Tori • Allows Fault Remapping • Virtual Cut-Through • 316* Packet Router Buffer • Simple, Adaptive Routing • Constrained Within Minimum Rectangle Figure 2: A 12-Processor 21364 Network Configuration *316 Total Packets of Buffer Capacity Divided Unevenly Amongst Classes and Ports
Packet Classes • Seven Packet Classes • Request (3 Flits) • Forward (3 Flits) • Block Response (18 or 19 Flits) • Non-Block Response (2 or 3 Flits) • Write I/O (19 Flits) • Read I/O (3 Flits) • Special (1 or 3 Flits) • Flits Are 32 Bits Data Plus 7 Bits ECC
Routing Policies: Minimum Rectangle • Four Rectangles With Current and Destination At Diagonals • Recall 2-d Torus – All Edges Wrap • Constrain Adaptive Routing To Minimum • Center of Figure 3 Figure 3: Routing Rectangles
Routing Basics • Decode Of Packet Determines Routing • Use Of Lookup Tables For Destination Resolution, Virtual Channel Assignments, and Broadcast Invalidation Clusters • First Flit Has Routing And Packet Information • ECC Checked/Corrected At Each Router • Routers May Rewrite ECC • Routers Send Feedback About Buffer Availability
Avoiding Coherence Deadlocks • Virtual Channels Break Cyclic Dependence • Separate Channel For Each Packet Class • Guarantees Independence of Class Traffic • Additional Ordering Constraint Amongst Classes of Packets • Additional Measures To Preserve I/O Consistency • Force Same-Class Requests To Arrive In-Order Using Deadlock-Free Virtual Channels • Allow I/O Writes To Pass I/O Reads Using Separate Virtual Channels For Reads and Writes • Prevent I/O Reads From Passing I/O Writes To Preserve Ordering Rules
Avoiding Routing Deadlocks • 19 Virtual Channels • 3 Networks For Each of 6 Packet Classes Plus 1 Special • Adaptive, VC0, and VC1 • Adaptive Is First Choice • VC0 and VC1 Provide Guaranteed Drain If Adaptive Blocked • Careful Selection of Rules To Break Deadlocks Within Dimensions and Across Dimensions
Internals Of The Router • Pipelined Design • 9 Pipeline Types Based Upon Input X Output Mapping • Input/Output Either Local, Interprocessor, or I/O • 13 Cycle In To Out Latency • Key To Performance (Smaller Better) • Recall Chip-Side At 1.2 GHz • Network-Side Speed At 800 MHz • Clock Sent With Outgoing Packets
Brief Conclusions • Even With Moderate Constraints, Jelly-Bean MP Is Challenging • Correctness, Deadlock-Avoidance, Buffering, Arbitration, and Performance Require Careful Consideration In Design • This Paper Illustrates Where Network Latency Comes From • Even A Fast Network Seems Slow Compared To Local Access
Discussion • Was 2-d Torus the Right Shape For This Design? • What Are the Limitations Imposed? • How Is the 1.2 GHz Internal/800 MHz External Clock Discrepancy OK? • Is MP Capability Better Than More Aggressive Core Optimizations For the Transistor Cost? • What About SMT, CMP?