30 likes | 248 Views
Revoke / Incarnation #s / Matching. Discussion around how to reclaim context IDs (resources that are a part of message matching) after an MPI_Comm_revoke Basic problem: revoke is one-sided and can be called by multiple processes in the communicator
E N D
Revoke / Incarnation #s / Matching • Discussion around how to reclaim context IDs (resources that are a part of message matching) after an MPI_Comm_revoke • Basic problem: revoke is one-sided and can be called by multiple processes in the communicator • There is a race between calling revoke and when all correct processes update their local state to revoked • Need to ensure that all processes have revoked the communicator before context ID can be reused • Scenario: • Communicator with correct processes A, B, and C is revoked • A and B free revoked communicator and create a new communicator using the old context ID • C calls revoke on the old communicator -- what happens at A and B? • OR -- C sends a message to A/B who has posted an ANY_SOURCE receive -- does it match? • Several solutions were discussed: • Incarnation number -- An additional number on each context ID that becomes a part of the matching • Group guards -- Check incoming messages to ensure that the sender is in the group of the communicator • Fault tolerant MPI_Comm_free/create -- Enhance create/free algorithms to quiesce context IDs before they are used
RMA Semantics • Pavan raised a concern about the definition of RMA window memory in the context of shared memory windows • It may be impossible to guarantee that only locations updated in the window are invalid • Suggested weakening the semantic to the entire window being undefined • Requires further discussion
Shared Memory • What happens if a process with shared memory goes down and another process has posted messages using its shared memory? • Yes this is an implementation issue, but is it possible to do anything?