1 / 8

Ideas on physical deployment of coprocessor-equipped servers

Ideas on physical deployment of coprocessor-equipped servers. Rainer Schwemmer , Niko Neufeld Many-core architectures for LHCb Wednesday, April 25 th 2012 https://indico.cern.ch/conferenceDisplay.py?confId=184092. The dataflow for the upgraded DAQ. Detector.

meara
Download Presentation

Ideas on physical deployment of coprocessor-equipped servers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ideas on physical deployment of coprocessor-equipped servers Rainer Schwemmer, Niko Neufeld Many-core architectures for LHCb Wednesday, April 25th 2012 https://indico.cern.ch/conferenceDisplay.py?confId=184092

  2. The dataflow for the upgraded DAQ Detector • GBT: custom radiation- hard link over MMF, 3.2 Gbit/s (about 10000) • Input into DAQ network (10/40 Gigabit Ethernet or FDR IB) (1000 to 4000) • Output from DAQ network into compute unit clusters (100 Gbit Ethernet / EDR IB) (200 to 400 links) Readout Units 100 m rock DAQ network Compute Units Deployment of co-processor cards in LHCb - 25/4/12 - Rainer Schwemmer

  3. The compute unit • Receives event-fragments and assembles complete events (actually multiple events simultaneously) • No separate event-builder PC forseen • Runs the selection algorithm • Using some rough back-of-the-envelope estimates and Moore’s law about 1600 servers (dual-socket) of the 2017 will be needed for the baseline upgrade event-filter (10 MHz) • Each server needs to absorb about 8-9 Gigabit (1GB) of data per second (depends on event-size) Deployment of co-processor cards in LHCb - 25/4/12 - Rainer Schwemmer

  4. Challenges on the compute unit • The CU must absorb 8 Gbit/s • Feeding all data through a co-processor card will at least double the through-put to the system bus • Unless “snooping” is used and part of the processing can only be done on co-processor card, this will be tricky at these rates Deployment of co-processor cards in LHCb - 25/4/12 - Rainer Schwemmer

  5. The server of 2017 • It will be based on PCIe Gen 3 as the I/O system • Current generation of Intel CPU (and the next two on the road-map) offer 40 PCIe Gen 3 lanes per socket  theoretical I/O of 320 Gbit/s • Need to verify what this means in practice but should get close to 90% • Data can be DMA-ed directly to processor cache (by-passing slow main-memory) • Optimal use of off-load engines and data distribution requires some control over the data-flow (not good for blind push) Deployment of co-processor cards in LHCb - 25/4/12 - Rainer Schwemmer

  6. Two options for co-processor housing • Monolithic server (need to stay compact, because of limited space in upgrade data-center – power-density in racks need to be kept high for efficient use of container solutions) • Two examples available today: (DELL 720d, SuperMicro2027GR-TRFT) • Support between 2 and 4 GPU cards (passively cooled) 2 U  price between 5.5 and 7 kCHF (single piece, today) Deployment of co-processor cards in LHCb - 25/4/12 - Rainer Schwemmer

  7. Option 2: External PCIe chassis • Chassis with 16 PCIe slots • Includes Internal PCIe switch for assigning slots to PCs • Price N/A • Takes up additional spaceand power • Cables can not be too longTypical range: 7m • Available from multiple vendors (here shown DELL) • Big advantage: separate server and GPU housing, can use any server (no vendor-lockin) Deployment of co-processor cards in LHCb - 25/4/12 - Rainer Schwemmer

  8. Conclusion • No show-stoppers for a co-processor enhanced compute unit • BUT: • Additional constraintslike space and power consumption need to be taken into consideration • Operationally and commercially there is a strong preference for the chassis option • Proof of concept will be necessary for both options • Price Deployment of co-processor cards in LHCb - 25/4/12 - Rainer Schwemmer

More Related