130 likes | 234 Views
URPC for Shared Memory Multiprocessors. Brian Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy ACM TOCS 9 (2), May 1991. IPC Performance. Efficient IPC - central to OS design: encourages systems decomposition across AS Failure isolation Extensibility Modularity
E N D
URPC for Shared Memory Multiprocessors Brian Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy ACM TOCS 9 (2), May 1991
IPC Performance • Efficient IPC - central to OS design: encourages systems decomposition across AS • Failure isolation • Extensibility • Modularity • But, performance determines its usability
Kernel-based IPC - problems • Architectural performance barriers – costs of invoking kernels and processor reallocation (70% overhead in LRPC) • Interaction bet/ kernel-based comm. & high-performance user-level thread mgnt.
URPC for SM multiprocessors • Solution – eliminate kernel from path • Use SM for data transfer • Take advantage of P already in AS • Advantages • Msgs. Sent bet/ AS w/o invoking kernel • Avoid unnecessary P reallocation • When necessary, cost amortized • Only P reallocation requires kernel invocation - contrast w/ microkernels!
RPC idea and definition • Apps/OS service comm. through messages vs. procedure calls RPC • RPC – synchronous lang-level control transfer bet/ programs in disjoints AS whose primary comm mech is a narrow channel Nothing of • narrow channel operations • Processor scheduling mech. interaction w/ data transfer
URPC • Msgs exchanges bet/ AS using SM • User-level thread mgnt integrated w/ user-level msg channel mgnt When a T in a client invokes a procedure in a server • T blocks • P serves another T in same AS • … same on server side User’s view is unchanged
Processor reallocation & context switching • Context switching – switching P bet/ Ts in same AS (15 sec) • Processor reallocation – allocating P to T in another AS (55 sec w/o long-term costs) Costs • Changing mapping registers defining virtual AS (immediate) • Decide the AS • Diminishing benefits from cache and TLB (long-term)
Processor reallocation • Sometimes necessary • Underpowered AS – an AS w/ pending incoming msgs • P balances load by reallocating itself • Detecting incoming msgs and scheduling T done by low-level T in URPC / P scan for incoming msgs only when idle
Example Editor WinMgr FCMgr T1 Call (send/recv WinMgr) Context switch Recv & process reply T1 T2 Call (send/recv FCMgr) Context switch T1 Call (send/recv FCMgr) Processor realloc Recv & process reply T2 Recv & process reply T1 Processor realloc Context switch – terminate T2 Context switch – terminate T1 Time
Design rationale • Data transfer • SM msg channels give same safety guarantees • Processor reallocation Optimistic • Client has other work to do (thread or incoming msgs) • Server has or soon will have a P to use When wrong: P reallocation
Design rationale Threads • High-performance T mgnt necessary for fine-grained parallel programs • This can only be done at user-level • Due to close interaction – comm has to be done at user level Heavyweight / Middleweight / Lightweight threads Lightweight threads user-level comm.