Remote Procedure Calls (RPC)

Remote Procedure Calls (RPC) - Swati Agarwal

RPC – an overview • Request / reply mechanism • Procedure call – disjoint address space client server request computation reply

Why RPC? • Function Oriented Protocols • Telnet, FTP • cannot perform “execute function Y with arguments X1, X2 on machine Z” • Construct desired program interface • Build run time environment – format outgoing commands, interface with the IPC facility, parse incoming response

Why RPC ? (cont.) • Why not give transparency to programmers? • Make programmers life easy !! • Distributed applications can be made easier • Solution – Formalize a separate protocol • Idea proposed by J. E. White in 1976

Implementing Remote Procedure Calls- Andrew Birrell, B. J. Nelson • Design issues reflected + how these can be addressed • Goals • Show that RPC can make distributed computation easy • Efficient RPC communication • Provide secure communication with RPC

Issues faced by designers • Binding • Communication protocol • Dealing with failures – network / server crash • Addressable arguments • Integration with existing systems • Data Integrity and security

How it works?

Issue : Binding • Naming - How to specify what to bind to? • Location - How to find the callee’s address, how to specify to the callee the procedure to be invoked? • Possible solutions : - Specify network addresses in applications - Some form of broadcast protocol - Some naming system

Issue : Binding - Solution • Grapevine • Distributed and reliable database • For naming people, machines and services • Used for naming services exported by the server • Solves Naming problem • Primarily used for delivery of messages (mails) • Locating callee similar to locating mailboxes • Addresses Location problem • For authentication

Binding cont..

Binding cont.. • Exporting machine - stateless • Importing – no effect • Bindings broken if exporter crashes • Grapevine allows several binding choices : • Specify network address as instance • Can specify both type and instance of interface • Only type of interface can be specified – most flexible

Issue : Packet-level Transport Protocol • Design specialized protocol? • Minimize latency • Maintaining state information (for connection based) unacceptable – will grow with clients • Required semantics • Exactly once – if call returns • Else report exception

Simple Calls • Arguments / Results fit in one packet

Simple Calls (cont..) • Client retransmits until ack received • Result acts as an ack (Same for the callee, next call packet is a sufficient ack) • Callee maintains table for last call ID • Duplicate call packets can be discarded • This shared state acts as connection – no special connection establishment required • Call ID to be unique – even if caller restarts • Conversation identifier – distinguish m/c incarnations

Advantages.. • No special connection establishment • In Idle state • Callee : only call id table stored • Caller : single counter sufficient (for sequence num) • No concern for state of connection – ping packets not required • No explicit connection termination

Complicated Calls • Caller retransmits until acknowledged • For complicated calls – packet modified for explicit acks • Caller sends probes until gets response • Callee must respond • Type of failure can be judged (communication / server crash) – exception accordingly reported

Complicated Calls (cont.)

Exception Handling • Emulate local procedure exceptions – caller notified • Callee can transmit an exception instead of result packet • Exception packet handled as new call packet, but no new call invoked instead raises exception to appropriate process • Call failed - may be raised by RPCRuntime • Differs from local calls

Processes - optimizations • Process creation and swap expensive • Idle server processes – also handle incoming packets • Packets have source / destination pids • Subsequent call packets can use these • Packets can be dispatched to waiting processes directly from interrupt handler

Other optimization – • Bypass software layers of normal protocol hierarchy for RPC packets • RPC intended to become the dominant communication protocol • Security • Encryption – based security for calls possible • Grapevine can be used as an authentication server

Performance • Measurements made for remote calls between Dorados computers connected by Ethernet (3 Mbps)

Performance Summary • Mainly RPC overhead – not due to local call • For small packets, RPC overhead dominates • For large packets, transmission time dominates • Protocols other than RPC have advantage • High data rate achieved by interleaving parallel remote calls from multiple processes • Exporting / Importing cost unmeasured

Summary • RPC package fully implemented and in use • Package convenient to use • Should encourage development of new distributed applications formerly considered infeasible

Performance of Firefly RPC - M. Schroeder , M. Burrows) • RPC already gained wide acceptance • Goals : • Measure performance of RPC (intermachine) • Analyze implementation and account for latency • Estimate how fast it could be

RPC in Firefly • RPC – primary communication paradigm • Used for all communication with another address space irrespective of same / different machines • Uses stub procedures • Automatically generated from Modula2+ interface definition

Measurements • Null Procedure • No arguments and no results • Measures base latency of RPC mechanism • MaxResult Procedure • Measures server-to-caller throughput by sending maximum packet size allowed • MaxArg Procedure • Same as MaxResult : measures throughput in opposite direction

Latency and Throughput

Latency and Throughput • The base latency of RPC is 2.66 ms • 7 threads can do ~740 calls/sec • Latency for MaxResult is 6.35 ms • 4 threads can achieve 4.65 Mb/sec • Data transfer rate in application since data transfers use RPC

Marshalling Time • Most arguments and results copied directly • Few complex types call library marshalling procedures • Scale linearly with number of arguments and size of arguments / result – for simple arguments

Marshalling Time - Much slower when library marshalling procedures called

Analysis of performance • Steps in fast path (95 % of RPCs) • Caller: obtains buffer (Starter), marshals arguments, transmits packet and waits (Transporter) • Server: unmarshals arguments, calls server procedure, marshals results, sends results • Caller: Unmarshals results, free packet (Ender)

Transporter • Fill RPC header in call packet • Call Sender - fills in other headers • Send packet on Ethernet (queue it, notify Ethernet controller) • Register outstanding call in RPC call table, wait for result packet (not part of RPC fast path) • Packet-arrival interrupt on server • Wake server thread - Receiver • Return result (send+receive)

Reducing Latency • Usage of direct assignments rather than calling library procedures for marshalling • Starter, Transporter and Ender through procedure variables not through table lookup • Interrupt routine wakes up correct thread • OS doesn’t demultiplex incoming packet • For Null(), going through OS takes 4.5 ms

Reducing Latency • Packet buffer management scheme • Server stub can retain call packet for result • Waiting thread contain packet buffer – this packet can be used for retransmission • Packet buffers reside in memory shared by everyone • Security can be an issue • RPC call table also shared

Improvements • Write fast path code in assembly not in Modula2+ • Speeded up by a factor of 3 • Application behavior unchanged

Proposed Improvements • Different Network Controller • Save 11 % on Null() and 28 % on MaxResult • Faster Network – 100 Mbps Ethernet • Null – 4 %, MaxResult – 18% • Faster CPUs • Null – 52 %, MaxResult – 36 % • Omit UDP checksums • Ethernet controller occasionally makes errors • Redesign RPC Protocol

Improvements • Omit layering on IP and UDP • Busy Wait – caller and server threads • Time for wakeup can be saved • Recode RPC run-time routines

Effects of processors

Effect of processors • Problem: 20ms latency for uniprocessor • Uniprocessor has to wait for dropped packet to be resent • Solution: take 100 microsecond penalty on multiprocessor for reasonable uniprocessor performance

Effect of processors • Sharp increase in uniprocessor latency • Firefly RPC implementation of fast path is only for a multiprocessor

Comparison Table

Summary • Concentrates upon the performance of RPC • Understand where time is spent • Resulting performance is good, but not demonstrably better than others • Faster implementations exist but on different processors • Performance would be worse on multi-user computer – packet buffers cannot be shared

Remote Procedure Calls (RPC)

Remote Procedure Calls (RPC)

Presentation Transcript

Remote Procedure Call

Remote Procedure Calls

Remote Procedure Call

Implementing Remote Procedure Calls

Implementing Remote Procedure Calls

Remote Procedure Call and Remote Method Invocation

Chap 35 Remote Procedure Calls

Remote Procedure Call

Implementing Remote Procedure Calls

Remote Procedure Calls

RPC – Remote Procedure Call

Remote Procedure Calls

Implementing Remote Procedure Calls

Remote Procedure Calls

Implementing Remote Procedure Calls, by Birrell and Nelson

Remote Procedure Call

Implementing Remote Procedure Calls

Remote Procedure Call

Implementing Remote Procedure Calls

Remote Procedure Call

Remote Procedure Call