1 / 10

Architecture and Design of the AlphaServer GS320

Explore the architecture and design of the AlphaServer GS320 by Compaq, focusing on overcoming snooping protocol limitations and directory structure inefficiencies for mid-range multiprocessors, with a detailed overview and solutions for reducing latency and improving memory consistency.

bernadinej
Download Presentation

Architecture and Design of the AlphaServer GS320

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Architecture and Design of the AlphaServer GS320 Gharachorloo, et al. (Compaq) Presented by Curt Harting http://h18002.www1.hp.com/alphaserver/gs320/

  2. Motivation • Make money – server revenue at the time was in 4 – 64 processor systems • Snooping protocols work really well on small systems (<8 processors) but don’t scale well • Directory structures at the time were made for large (>64 processors) systems, but are too slow for mid-range multiprocessors

  3. The problems • Snooping • Limited by bandwidth • Too much for each controller to do per cycle • Directories • Long latency • Too much glue (Amdahl’s Law)

  4. Overview • 32 or 64 processor directory machine • 8 Quad-Processor Building Blocks connected in a crossbar • Each QBB has: • 4 processors (with external L2) • 4 memory modules • 1 I/O interface • 1 Global Port • DTAG • DIR (14 bits per line) • TTT • 4 request types: read, readX, X, X without data

  5. Reducing Latency • No waiting for invalidated copies to ACK on a GETX • No Nack’ing • Directory updates state as soon as the request arrives • Dirty-Sharing • NUMA

  6. The Three Lane Information Super-Highway • Information is passed on three virtual lanes (and an IO lane). • Q0: Carries a message from processor to the block’s home • Point to point ordering must occur • Q1: Carries messages from the home • Point of serialization! Must have total order • Q2: Replies/data

  7. An example Reproduction of Figure 2d

  8. Caveats • Early request race - request gets to the owner before the data does • Solution: Stall the Q1 until the data arrives • Late request race – request for data arrives after a writeback operation • Solution: Buffer victim until a writeback ACK is received • Intra-Node transactions – Check TTT, possible loop through global • Markers – Used to preserve global order

  9. Memory Consistency • A quick very high-level overview: • Separation of data and requests • The previously atomic response has been split into two parts: the commit and the data • Lots of regulations of what can go when (still)

  10. Questions • The total ordering of the Q1 lane “comes naturally in a crossbar switch”? • The GS320 is said to be expandable to 64 processors, but the system detailed in the paper is tailored to 32 processors. How easily can it be expanded? • Addressing has been a major issue in other papers, but it is not discussed in this one. Why?

More Related