1 / 15

BigBen @ PSC

BigBen @ PSC. BigBen @ PSC. BigBen @ PSC. Compute Nodes 2068 nodes running Catamount (QK) microkernel Seastar interconnect in a 3-D torus configuration No external connectivity (no TCP) All Inter-node communication is over Portals Applications use MPI which is based on Portals.

bella
Download Presentation

BigBen @ PSC

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BigBen @ PSC

  2. BigBen @ PSC

  3. BigBen @ PSC

  4. Compute Nodes 2068 nodes running Catamount (QK) microkernel Seastar interconnect in a 3-D torus configuration No external connectivity (no TCP) All Inter-node communication is over Portals Applications use MPI which is based on Portals Service & I/O Nodes (SIO) Nodes 22 nodes running Suse Linux Also on the Seastar interconnect SIO nodes can have PCI-X hardware installed, defining unique roles for each 2 SIO nodes are externally connected to ETF with 10GigE cards (currently) BigBen Features

  5. Portals Direct I/O (PDIO) Details • Portals-to-TCP routing • PDIO daemons aggregate hundreds of portals data streams into a configurable number of outgoing TCP streams • Heterogenous portals (both QK + Linux nodes) • Explicit Parallelism • Configurable # of Portals receivers (on SIO nodes) • Distributed across multiple 10GigE-connectedService & I/O (SIO) nodes • Corresponding # of TCP streams (to the WAN) • one per PDIO daemon • A Parallel TCP receiver in the Goodhue booth • Supports a variable/dynamic number of connections

  6. Portals Direct I/O (PDIO) Details • Utilizing the ETF network • 10GigE end-to-end • Benchmarked >1Gbps in testing • Inherent flow-control feedback to application • Aggregation protocol allows TCP transmission or even remote file system performance to throttle the data streams coming out of the application (!) • Variable message sizes and file metadata supported • Multi-threaded ring buffer in the PDIO daemon • Allows the Portals receiver, TCP sender, and computation to proceed asynchronously

  7. Portals Direct I/O (PDIO) Config • User-configurable/tunable parameters: • Network targets • Can be different for each job • Number of streams • Can be tuned for optimal host/network utilization • TCP network buffer size • Can be tuned for maximum throughput over the WAN • Ring buffer size/length • Controls total memory utilization of PDIO daemons • Number of portals writers • Can be any subset of the running application’s processes • Remote filename(s) • File metadata are propagated through the full chain, per write

  8. HPC resource and renderer waiting… Compute Nodes ETF network Steering I/O Nodes PSC iGRID

  9. Launch PPM job, PDIO daemons, and iGRID recv’ers Compute Nodes recv recv recv ETF network pdiod pdiod pdiod pdiod pdiod pdiod Steering I/O Nodes PSC iGRID

  10. Aggregate data via Portals Compute Nodes recv recv recv ETF network pdiod pdiod pdiod pdiod pdiod pdiod Steering I/O Nodes PSC iGRID

  11. Route traffic to ETF net Compute Nodes recv recv recv ETF network pdiod pdiod pdiod pdiod pdiod pdiod Steering I/O Nodes PSC iGRID

  12. Recv data @ iGRID Compute Nodes recv recv recv ETF network pdiod pdiod pdiod pdiod pdiod pdiod Steering I/O Nodes PSC iGRID

  13. render Render real-time data Compute Nodes recv recv recv ETF network pdiod pdiod pdiod pdiod pdiod pdiod Steering I/O Nodes PSC iGRID

  14. render Send steering data back to active job Compute Nodes recv input recv recv ETF network pdiod pdiod pdiod pdiod pdiod pdiod Steering I/O Nodes PSC iGRID

  15. render Dynamically update rendering Compute Nodes recv input recv recv ETF network pdiod pdiod pdiod pdiod pdiod pdiod Steering I/O Nodes PSC iGRID

More Related