1 / 11

Advancing the “Persistent” Working Group

Advancing the “Persistent” Working Group. MPI -4 Tony Skjellum tony@cis.uab.edu June 5, 2013. Persistence Goals. Cross-cutting use of “planned transfers” Point to point Collective Generalized requests Allow program to express Locality, static use of resources

vianca
Download Presentation

Advancing the “Persistent” Working Group

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advancing the “Persistent”Working Group MPI-4 Tony Skjellum tony@cis.uab.edu June 5, 2013

  2. Persistence Goals • Cross-cutting use of “planned transfers” • Point to point • Collective • Generalized requests • Allow program to express • Locality, static use of resources • Connectivity of sends/receives • Purpose: Improve speed and scalability for regular, static communication

  3. Concepts review • Drew analogies from MPI/RT • Point-to-point Channels vs half-channels • Slack-1 “bind a send” to “a receive” • Goal: cut the # of instructions in critical path • Reuses “Send_init” and “Recv_init” • Open issues • Multiple slack channels • Rebinding when some arguments change

  4. Concepts review • For all non-blocking collective, we can define “persistent non-blocking collective” • The approach in MPI/RT was to have collective channels as well as point to point • Again, the goal is to allow for static resource management (memory, network, etc), algorithm selection in advance, and single-mode performance of perations

  5. Additional – Differentiated Service • Each communicator is a separate “MPI fabric” or independently ordered overlay network that is all-to-all • Allow communicators to offer differentiated service • No tags matching • No wildcards • Guarantee of posting (F mode), S mode … • Strong or weak or no progress • Independent buffering rules and even short/long cutoffs • Degree of correctness (e.g., OK to make data errors, vs. MPI-1 compliant “all correct”) • QoS concepts could be added as well • Subset of MPI that is usable • Goal: allow each communicator fabric to operate at optimal performance and scalability and even to make error/fault choices to support FT where appropriate • Use MPI_Comm_dup and MPI_Comm_create as prototypes for generating such functions

  6. Steps that were for March (doing in part now in June) • Revive proposals concerning the pt2pt specific calls, and general framework for collective • See if anyone else is interested in helping on committee • Get straw indication on differentiated service concept • Review status of generalized requests and left over ideas in MPI-3, see if persistent generalized requests are useful • Entertain any other ideas about “making” MPI pt2pt and collective faster by using planned transfer

  7. May 2011 Document • We have a 12-page document from 2011 that we held back while MPI-3 was finishing • It contains basic proposals • We’d like to refine these and update the May 2011 document after this working group meeting today • We’d like to flesh out any concerns not yet fully understood • So, first a couple of slides, then we go through the document, and capture action items and ideas and next steps!

  8. Getting back to a specific pt2pt proposal (#1) • Very short numbers of instructions to do a persistent send/receive pair • MPI_Send_init() • MPI_Recv_init() • They are not enough, because they don’t lock down all the resources (they are ‘half’ channels) • Motivation: “Darius Buntinas measurement” – 250+ instructions in shared memory…. Must be reduced

  9. Requirements • The minimum number of instructions in critical path to launch a send for “Nth” time after “setup” • Arguments frozen after first time • All • Most (except starting address, length) • Want the “Start” to track right to “kick DMA” or “kick transfer offload engine”… 1 OS bypass / application bypass instruction • Makes regular programs finer grain in O/H & Latency • Provides its own communication space/order/matching (outside of communicator) • Use all the features you can use in an MPI_Isend; simple datatypes may be faster in practice • Easy changes to existing MPI programs that are already static/data parallel (like 2D Life)

  10. Where we got into the weeds before • How do we create the channel (1 easy, what a bout several?) • How many messages in flight? At least 1. • How do we avoid extra signaling that makes this like stop-and-wait at the LLC layer? More than 1 to fill virtual channel. • Let’s go through the document to refresh and decide next steps…

  11. Q&A

More Related