1 / 15

User-Level Interprocess Communication for Shared Memory Multiprocessors

This review discusses the concept of User-Level Interprocess Communication (IPC) for shared memory multiprocessors, focusing on the benefits and challenges of using RPC and LRPC within a single machine. It also explores the opportunities for SMP optimizations and introduces the URPC design for isolating the components of IPC. The performance, limitations, and considerations for SMP-based RPC design are discussed.

rrosner
Download Presentation

User-Level Interprocess Communication for Shared Memory Multiprocessors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. User-Level Interprocess Communication for Shared Memory MultiprocessorsBershad, B. N., Anderson, T. E., Lazowska, E.D., and Levy, H. M. Presented by Chris Eigner

  2. Review of LRPC • RPC concept can be used within a single machine as IPC • Caller/callee in RPC are on same machine…room for optimizations • Run client thread in context of server, avoid scheduler • Argument stacks allocated in shared memory, avoid message copying • Domain caching to reduce context-switch overhead

  3. Problems with RPC/LRPC • Kernel mediates every cross-address space call - 70% of total overhead • Poor performing cross-address space communication • Kernel-level communication + user-level thread management • Opportunity for more SMP optimizations

  4. SMP Optimizations • No need to switch processor to another address space • Remove kernel from equation! • Address spaces share memory directly • Processor reallocation can be avoided • Preserves valuable cache/TLB contexts • Cost can be amortized over independent calls • Inexpensive thread management; orders of magnitude less than kernel-level.

  5. URPC Responsibilities • URPC design isolates three components of IPC • Thread management • Data transfer • Processor reallocation

  6. Thread Management • Context switch • Switching processor to another thread in same address space • Processor reallocation • Reallocating processor to a thread in a different address space • via Processor.Donate

  7. An Example

  8. Data Transfer • Bi-directional shared memory queue • Test-and-set locks (non-spinning) on each end • Client/server model • send, receive, start, stop

  9. Processor Reallocation • URPC makes certain assumptions to reduce processor reallocation • Client has other threads to run or incoming messages • Server has or will have a processor to service message • Allows inexpensive context switch during blocking phase of cross-address call • Enables parallel execution of URPC while avoiding processor reallocation

  10. Performance • Firefly workstation • Four C-VAX processors • 32Mb RAM!!! • Taos OS • Provided kernel level threads • FastThreads • User-level thread library • URPC • Channel management • Message primitives

  11. Performance

  12. Performance worse than LRCP

  13. Performance

  14. Deficiencies • Optimistic assumptions won’t always hold • Single-threaded applications • High-latency I/O • Processor reallocation occurs after two optimization checks (approx. 100 μs) • Is there an idle processor? • Is there an underpowered address space to which it can be reallocated? • Voluntary return of processors can’t be guaranteed • Two processors for single computation, only one active at a time

  15. Summary SMP allows new freedoms in RPC design • No need to switch processor to another address space • Preserves valuable cache/TLB contexts • 1-2 orders of magnitude improvement • But, not ideal for all application types • Single-threaded applications • High-latency I/O

More Related