URPC for Shared Memory Multiprocessors

URPC for Shared Memory Multiprocessors Brian Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy ACM TOCS 9 (2), May 1991

IPC Performance • Efficient IPC - central to OS design: encourages systems decomposition across AS • Failure isolation • Extensibility • Modularity • But, performance determines its usability

Kernel-based IPC - problems • Architectural performance barriers – costs of invoking kernels and processor reallocation (70% overhead in LRPC) • Interaction bet/ kernel-based comm. & high-performance user-level thread mgnt.

URPC for SM multiprocessors • Solution – eliminate kernel from path • Use SM for data transfer • Take advantage of P already in AS • Advantages • Msgs. Sent bet/ AS w/o invoking kernel • Avoid unnecessary P reallocation • When necessary, cost amortized • Only P reallocation requires kernel invocation - contrast w/ microkernels!

RPC idea and definition • Apps/OS service comm. through messages vs. procedure calls  RPC • RPC – synchronous lang-level control transfer bet/ programs in disjoints AS whose primary comm mech is a narrow channel Nothing of • narrow channel operations • Processor scheduling mech. interaction w/ data transfer

URPC • Msgs exchanges bet/ AS using SM • User-level thread mgnt integrated w/ user-level msg channel mgnt When a T in a client invokes a procedure in a server • T blocks • P serves another T in same AS • … same on server side User’s view is unchanged

Processor reallocation & context switching • Context switching – switching P bet/ Ts in same AS (15 sec) • Processor reallocation – allocating P to T in another AS (55 sec w/o long-term costs) Costs • Changing mapping registers defining virtual AS (immediate) • Decide the AS • Diminishing benefits from cache and TLB (long-term)

Processor reallocation • Sometimes necessary • Underpowered AS – an AS w/ pending incoming msgs • P balances load by reallocating itself • Detecting incoming msgs and scheduling T done by low-level T in URPC / P scan for incoming msgs only when idle

Example Editor WinMgr FCMgr T1 Call (send/recv WinMgr) Context switch Recv & process reply T1 T2 Call (send/recv FCMgr) Context switch T1 Call (send/recv FCMgr) Processor realloc Recv & process reply T2 Recv & process reply T1 Processor realloc Context switch – terminate T2 Context switch – terminate T1 Time

Design rationale • Data transfer • SM msg channels give same safety guarantees • Processor reallocation Optimistic • Client has other work to do (thread or incoming msgs) • Server has or soon will have a P to use When wrong: P reallocation

Design rationale Threads • High-performance T mgnt necessary for fine-grained parallel programs • This can only be done at user-level • Due to close interaction – comm has to be done at user level Heavyweight / Middleweight / Lightweight threads Lightweight threads  user-level comm.

Performance evaluation

URPC for Shared Memory Multiprocessors

URPC for Shared Memory Multiprocessors

Presentation Transcript

Shared Memory Multiprocessors

User-Level Interprocess Communication for Shared Memory Multiprocessors

Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors

User-Level Interprocess Communication for Shared Memory Multiprocessors

User-Level Interprocess Communication for Shared Memory Multiprocessors

Shared Memory Multiprocessors

Shared Memory Multiprocessors

Adaptive proxies: handling widely-shared data in shared-memory multiprocessors

Shared Memory Multiprocessors

Shared Memory Multiprocessors

Cache Coherence in Shared Memory Multiprocessors

Scalable Reader-Writer Synchronization for Shared-Memory Multiprocessors

Cache Coherence in Bus-Based Shared Memory Multiprocessors

Shared Memory Multiprocessors

Shared Memory Multiprocessors

User-Level Interprocess Communication for Shared Memory Multiprocessors

Shared Memory Multiprocessors

User-Level Interprocess Communication for Shared Memory Multiprocessors

Lecture 18: Shared-Memory Multiprocessors

Shared Memory Multiprocessors