Remote Procedure Call

Remote Procedure Call An Effective Primitive for Distributed Computing Seth James Nielson

What is RPC? • Procedure calls transfer control within local memory • RPC’s transfer control to remote machines Unused Proc B Proc A Main

Why RPC? • Clean/Simple semantics • Communication efficiency • Generality RPC is an effective primitive for distributed systems because of -

How it Works(Idealized Example) CLIENT SERVER With specialized hardware and encryption key localCall() … … Request c = encrypt(msg) encrypt(msg) Implementation wait… c = encrypt(msg) localCall() … … Response

Early History of RPC • 1976: early reference in literature • 1976-1984: few full implementations • Feb 1984:Cedar RPC • A. Birrell, B. Nelson at Xerox • “Implementing Remote Procedure Calls”

Imagine our Surprise… “In practice, … several areas [of RPC] were inadequately understood”

RPC Design Issues • Machine/communication failures • Address-containing arguments • Integration into existing systems • Binding • Suitable protocols • Data integrity/security

Birrell and Nelson Aims • Primary Aim • Easy distributed computation • Secondary Aims • Efficient (with powerful semantics) • Secure

Fundamental Decisions • No shared address space among computers • Semantics of remote procedure calls should be as close as possible to local procedure calls Note that the first decision partially violates the second…

Binding • Binds an importer to exporter • Interface name: type/instance • Uses Grapevine DB to locate appropriate exporter • Bindings (based on unique ID) break if exporter crashes and restarts

Unique ID • At binding, importer learns of exported interface’s Unique ID (UID) • The UID is initialized by a real-time clock on system start-up • If the system crashes and restarts, the UID will be a new unique number • The change in UID breaks existing connections

How Cedar RPC works Caller Machine Grapevine Callee Machine User User Stub RPCRun. RPCRun. Server Stub Server record export export update setConnect update addmember return import import getConnect lookup bind(A,B) lookup return record x=F(y) F=>3 transmit Check 3 3=>F F(y)

Packet-Level Transport Protocol • Primary goal: minimize time between initiating the call and getting results • NOT general – designed for RPC • Why? possible 10X performance gain • No upper bound on waiting for results • Error Semantics: User does not know if machine crashed or network failed

Creating RPC-enabled Software Client Program User Code Client Machine User Stub Developer RPCRuntime Interface Modules RPCRuntime Server Stub Lupine Server Code Server Machine Server Program

Making it Faster • Simple Calls (common case): all of the arguments fit in a single packet • A server reply and a 2nd RPC operates as an implicit ACK • Explicit ACKs required if call lasts longer or there is a longer interval between calls

Simple Calls Call SERVER CLIENT Response/ACK Call/ACK Response/ACK

Complex Calls Call (pkt 0) SERVER CLIENT ACK pkt 0 Data (pkt 1) ACK pkt 1 Data (pkt 2) Response/ACK ACK or New Call

Keeping it Light • A connection is just shared state • Reduce process creation/swapping • Maintain idle server processes • Each packet has a process identifier to reduce swap • Full scheme results in no processes created/four process swaps per call • RPC directly on top of Ethernet

Elapsed Time Performance

THE NEED FOR SPEED • RPC performance cost is a barrier (Cedar RPC requires .1 sec for a 0 arg call!) • Peregrine RPC (about nine years later) manages a 0 arg call in .0573 seconds!

A Few Definitions • Hardware latency – Sum of call/result network penalty • Network penalty – Time to transmit (greater than…) • Network transmission time – Raw Network Speed • Network RPC – RPC between two machines • Local RPC – RPC between separate threads

Peregrine RPC • Supports full functionality of RPC • Network RPC performance close to HW latency • Also supports efficient local RPC

Messing with the Guts • Three General Optimizations • Three RPC-Specific Optimizations

General Optimization • Transmitted arguments avoid copies • No conversion for client/server with the same data representation • Use of packet header templates that avoid recomputation per call

RPC Specific Optimizations • No thread-specific state is saved between calls in the server • Server arguments are mapped (not copied) • No copying in the critical path of multi-packet arguments

I think this is COOL • To avoid copying arguments from a single-packet RPC, Peregrine arranges instead to use the packet buffer itself as the server thread’s stack • Any pointers are replaced with server-appropriate pointers (Cedar RPC didn’t support this…)

This is cool too • Multi-packet RPC’s use blast protocol (selective retransmission) • Data is transmitted in parallel with data copy • Last packet is mapped into place

Fast Multi-Packet Receive Data 0 Packet 0 buffer (sent last) Is remapped at server Data 3 Header0 Data 3 Data 2 Header3 Data 1 Data 2 Page Boundary Data 0 Header2 Data 1 Packets 1-3 data are copied into buffer at server Header 0 Header1

Peregrine 0-Arg Performance

Peregrine Multi-Packet Performance

Cedar RPC Summary • Cedar RPC introduced practical RPC • Demonstrated easy semantics • Identified major design issues • Established RPC as effective primitive

Peregrine RPC Summary • Same RPC semantics (with addition of pointers) • Significantly faster than Cedar RPC and others • General optimizations (e.g., pre-computed headers) • RPC-Specific (e.g., no copying in multipacket critical path)

Observations • RPC is a very “transparent” mechanism – it acts like a local call • However, RPC requires a deep understanding of hardware to tune • In short, RPC requires sophistication in its presentation as well as its operation to be viable

Remote Procedure Call