340 likes | 524 Views
Remote Procedure Call. An Effective Primitive for Distributed Computing Seth James Nielson. What is RPC?. Procedure calls transfer control within local memory RPC’s transfer control to remote machines. Unused. Proc B. Proc A. Main. Why RPC?. Clean/Simple semantics
E N D
Remote Procedure Call An Effective Primitive for Distributed Computing Seth James Nielson
What is RPC? • Procedure calls transfer control within local memory • RPC’s transfer control to remote machines Unused Proc B Proc A Main
Why RPC? • Clean/Simple semantics • Communication efficiency • Generality RPC is an effective primitive for distributed systems because of -
How it Works(Idealized Example) CLIENT SERVER With specialized hardware and encryption key localCall() … … Request c = encrypt(msg) encrypt(msg) Implementation wait… c = encrypt(msg) localCall() … … Response
Early History of RPC • 1976: early reference in literature • 1976-1984: few full implementations • Feb 1984:Cedar RPC • A. Birrell, B. Nelson at Xerox • “Implementing Remote Procedure Calls”
Imagine our Surprise… “In practice, … several areas [of RPC] were inadequately understood”
RPC Design Issues • Machine/communication failures • Address-containing arguments • Integration into existing systems • Binding • Suitable protocols • Data integrity/security
Birrell and Nelson Aims • Primary Aim • Easy distributed computation • Secondary Aims • Efficient (with powerful semantics) • Secure
Fundamental Decisions • No shared address space among computers • Semantics of remote procedure calls should be as close as possible to local procedure calls Note that the first decision partially violates the second…
Binding • Binds an importer to exporter • Interface name: type/instance • Uses Grapevine DB to locate appropriate exporter • Bindings (based on unique ID) break if exporter crashes and restarts
Unique ID • At binding, importer learns of exported interface’s Unique ID (UID) • The UID is initialized by a real-time clock on system start-up • If the system crashes and restarts, the UID will be a new unique number • The change in UID breaks existing connections
How Cedar RPC works Caller Machine Grapevine Callee Machine User User Stub RPCRun. RPCRun. Server Stub Server record export export update setConnect update addmember return import import getConnect lookup bind(A,B) lookup return record x=F(y) F=>3 transmit Check 3 3=>F F(y)
Packet-Level Transport Protocol • Primary goal: minimize time between initiating the call and getting results • NOT general – designed for RPC • Why? possible 10X performance gain • No upper bound on waiting for results • Error Semantics: User does not know if machine crashed or network failed
Creating RPC-enabled Software Client Program User Code Client Machine User Stub Developer RPCRuntime Interface Modules RPCRuntime Server Stub Lupine Server Code Server Machine Server Program
Making it Faster • Simple Calls (common case): all of the arguments fit in a single packet • A server reply and a 2nd RPC operates as an implicit ACK • Explicit ACKs required if call lasts longer or there is a longer interval between calls
Simple Calls Call SERVER CLIENT Response/ACK Call/ACK Response/ACK
Complex Calls Call (pkt 0) SERVER CLIENT ACK pkt 0 Data (pkt 1) ACK pkt 1 Data (pkt 2) Response/ACK ACK or New Call
Keeping it Light • A connection is just shared state • Reduce process creation/swapping • Maintain idle server processes • Each packet has a process identifier to reduce swap • Full scheme results in no processes created/four process swaps per call • RPC directly on top of Ethernet
THE NEED FOR SPEED • RPC performance cost is a barrier (Cedar RPC requires .1 sec for a 0 arg call!) • Peregrine RPC (about nine years later) manages a 0 arg call in .0573 seconds!
A Few Definitions • Hardware latency – Sum of call/result network penalty • Network penalty – Time to transmit (greater than…) • Network transmission time – Raw Network Speed • Network RPC – RPC between two machines • Local RPC – RPC between separate threads
Peregrine RPC • Supports full functionality of RPC • Network RPC performance close to HW latency • Also supports efficient local RPC
Messing with the Guts • Three General Optimizations • Three RPC-Specific Optimizations
General Optimization • Transmitted arguments avoid copies • No conversion for client/server with the same data representation • Use of packet header templates that avoid recomputation per call
RPC Specific Optimizations • No thread-specific state is saved between calls in the server • Server arguments are mapped (not copied) • No copying in the critical path of multi-packet arguments
I think this is COOL • To avoid copying arguments from a single-packet RPC, Peregrine arranges instead to use the packet buffer itself as the server thread’s stack • Any pointers are replaced with server-appropriate pointers (Cedar RPC didn’t support this…)
This is cool too • Multi-packet RPC’s use blast protocol (selective retransmission) • Data is transmitted in parallel with data copy • Last packet is mapped into place
Fast Multi-Packet Receive Data 0 Packet 0 buffer (sent last) Is remapped at server Data 3 Header0 Data 3 Data 2 Header3 Data 1 Data 2 Page Boundary Data 0 Header2 Data 1 Packets 1-3 data are copied into buffer at server Header 0 Header1
Cedar RPC Summary • Cedar RPC introduced practical RPC • Demonstrated easy semantics • Identified major design issues • Established RPC as effective primitive
Peregrine RPC Summary • Same RPC semantics (with addition of pointers) • Significantly faster than Cedar RPC and others • General optimizations (e.g., pre-computed headers) • RPC-Specific (e.g., no copying in multipacket critical path)
Observations • RPC is a very “transparent” mechanism – it acts like a local call • However, RPC requires a deep understanding of hardware to tune • In short, RPC requires sophistication in its presentation as well as its operation to be viable