User-Level Interprocess Communication for Shared Memory Multiprocessors

User-Level Interprocess Communication for Shared Memory MultiprocessorsBershad, B. N., Anderson, T. E., Lazowska, E.D., and Levy, H. M. Presented by Chris Eigner

Review of LRPC • RPC concept can be used within a single machine as IPC • Caller/callee in RPC are on same machine…room for optimizations • Run client thread in context of server, avoid scheduler • Argument stacks allocated in shared memory, avoid message copying • Domain caching to reduce context-switch overhead

Problems with RPC/LRPC • Kernel mediates every cross-address space call - 70% of total overhead • Poor performing cross-address space communication • Kernel-level communication + user-level thread management • Opportunity for more SMP optimizations

SMP Optimizations • No need to switch processor to another address space • Remove kernel from equation! • Address spaces share memory directly • Processor reallocation can be avoided • Preserves valuable cache/TLB contexts • Cost can be amortized over independent calls • Inexpensive thread management; orders of magnitude less than kernel-level.

URPC Responsibilities • URPC design isolates three components of IPC • Thread management • Data transfer • Processor reallocation

Thread Management • Context switch • Switching processor to another thread in same address space • Processor reallocation • Reallocating processor to a thread in a different address space • via Processor.Donate

An Example

Data Transfer • Bi-directional shared memory queue • Test-and-set locks (non-spinning) on each end • Client/server model • send, receive, start, stop

Processor Reallocation • URPC makes certain assumptions to reduce processor reallocation • Client has other threads to run or incoming messages • Server has or will have a processor to service message • Allows inexpensive context switch during blocking phase of cross-address call • Enables parallel execution of URPC while avoiding processor reallocation

Performance • Firefly workstation • Four C-VAX processors • 32Mb RAM!!! • Taos OS • Provided kernel level threads • FastThreads • User-level thread library • URPC • Channel management • Message primitives

Performance

Performance worse than LRCP

Performance

Deficiencies • Optimistic assumptions won’t always hold • Single-threaded applications • High-latency I/O • Processor reallocation occurs after two optimization checks (approx. 100 μs) • Is there an idle processor? • Is there an underpowered address space to which it can be reallocated? • Voluntary return of processors can’t be guaranteed • Two processors for single computation, only one active at a time

Summary SMP allows new freedoms in RPC design • No need to switch processor to another address space • Preserves valuable cache/TLB contexts • 1-2 orders of magnitude improvement • But, not ideal for all application types • Single-threaded applications • High-latency I/O

User-Level Interprocess Communication for Shared Memory Multiprocessors