90 likes | 234 Views
Shubhendu S. Mukherjee , Peter Bannon , Steven Lang, Aaron Spink, and David Webb Alpha Development Group, Compaq HOT Interconnects 9 (2001) Presented by John Ingalls ECE 259 - March 22, 2010. The Alpha 21364 Network Architecture.
E N D
Shubhendu S. Mukherjee, Peter Bannon, Steven Lang, Aaron Spink, and David Webb Alpha Development Group, Compaq HOT Interconnects 9 (2001) Presented by John Ingalls ECE 259 - March 22, 2010 The Alpha 21364Network Architecture
Alpha 21364 is a 21264 core plus 1.75MB L2 on-die cache, 2-channel Rambus DRAM, I/O controller, and router at 1.2GHz on 180nm. • Up to 128 processors in a system. All can access others’ memory and I/O. • Directory cache coherence protocol. • 2-D Torus interconnection network with adaptive routing and deadlock-free fallback. • Request packets generally 3 flits in size, data packets generally 18 flits in size. Flits have ECC. Summary of Features
Network is 2-D Torus. • Virtual Cut-Through Routing: Blocked packet’s flits will accumulate in buffer. • Adaptive Routing: Minimum rectangle. Source picks either dimension to send on, algorithm then prefers to keep packets on that dimension. Fig. 3: (pg. 2) Notable Features: Network Routing
Avoiding Coherence Deadlock: Separate virtual channels for responses and requests. • Preserving I/O Consistency: Same class must be in same virtual channel, thus same route, thus retain order in that class (i.e. read or write). • 3 Virtual Channels per Dimension per Class: Adaptive, VCO, and VC1. • Adaptive for bulk of traffic, VC0 and VC1 are fixed-route deadlock-free “drain” for blocked adaptive packets. Notable Features: Deadlock Avoidance
Fig. 5: (pg. 3) • VC0 and VC1 mapped at boot time to prohibit cyclic dependency. Packets on VC0/1 can only turn if they are at corner of minimum rectangle; Adaptive virtual channel has no such restriction. Packets can return to adaptive if non-congested. Notable Features: Deadlock Avoidance
13 cycles pin-to-pin, any input to any output. • Pipeline clocked at 1.2GHz, links at 800MHz. • Link clock sent with outgoing packet. • ECC recomputed at every hop. 1-bit recoverable. • Arbitration: Input “local” arbiters show a packet that is ready and not blocked from buffer to “global” arbiters for possible dispatch. Output global arbiters select from input local arbiters. • Least-Recently-Selected selection policy. Also, Rotary Rule prioritizes older packets from network. Coherence Dependence Priority rule. Technical Details: Router Architecture
Good Bad • This was built and shipped (albeit late), which immediately lends it credibility. • Simple introduction to interconnection networks: 5 pages makes the authors explain everything clearly and concisely. • No evaluation of performance. • No comparison against competitors (“ours is better” would help sales). • Configuration around faulty routers is mentioned but never explained. • 5 pages isn’t enough to explain the edge cases.
Keywords: 2-D torus. Adaptive routing with deadlock-free fixed-route virtual channels to prevent network deadlock. Separate virtual channels for requests and responses to prevent coherence deadlock. • This was the last major iteration of the Alpha architecture. Why? What competing product replaced it? How was that competitor better? How could the 21364 have been improved to stay competitive (features, performance)? Conclusion / Further Questions