Advanced Operating Systems

Advanced Operating Systems Lecture 10: RPC (Remote Procedure Call) University of Tehran Dept. of EE and Computer Engineering By: Dr. Nasser Yazdani Distributed Operating Systems

Communication in distributed systems • How process communicates in DS. • References • Chapter 2 of the text book • Andrew D. Birrell & Bruce J. Nelson, “Implementing Remote Procedure calls” • Brain Bershad, Thomas Anderson, et.l, “Lightweight Remote Procedure Call” • Chapter of “Computer Network, A system approach”, Section 5.3 Distributed Operating Systems

Outline • Why RPC • Local Procedure Call • Call semantics • Different implementations • Lightweight Procedure Call (LRPC). Distributed Operating Systems

Communication Models • Communication of processes in distributed system environment. • Communication is being done on the higher levels. • Remote procedure call (RPC) • Remote Object Invocation. • Message passing queues • Support for continuous media or stream Distributed Operating Systems

Remote procedure call • A remote procedure call makes a call to a remote service look like a local call • RPC makes transparent whether server is local or remote • RPC allows applications to become distributed transparently • RPC makes architecture of remote machine transparent • Well-known method to transfer control and data among processes running on different machines Distributed Operating Systems

Developing with RPC • Define APIs between modules • Split application based on function, ease of development, and ease of maintenance • Don’t worry whether modules run locally or remotely • Decide what runs locally and remotely • Decision may even be at run-time • Make APIs bullet proof • Deal with partial failures Distributed Operating Systems

Goal of RPC • Big goal: Transparency. • Make distributed computing look like centralized computing • Allow remote services to be called as procedures • Transparency with regard to location, implementation, language • Issues • How to pass parameters • Bindings • Semantics in face of errors • Two classes: integrated into prog, language and separate Distributed Operating Systems

Benefits of RPC • A clean and simple semantic to build distributed computing. • Transparent distributed computing Existing programs don't need to be modified • Efficient communication! • Generality: Enforces well-defined interfaces • Allows portable interfaces Plug together separately written programs at RPC boundaries e.g. NFS and X clients and servers Distributed Operating Systems

Parameter passing in a local procedure call: the stack before the call to read b) The stack while the called procedure is active Conventional Procedure Call Distributed Operating Systems

Parameter Passing • Local procedure parameter passing • Call-by-value • Call-by-reference: arrays, complex data structures • Copy and store • Remote procedure calls simulate this through: • Stubs – proxies • Flattening – marshalling • Related issue: global variables are not allowed in RPCs Distributed Operating Systems

Client and Server Stubs • Principle of RPC between a client and server program. Distributed Operating Systems

Stubs • Client makes procedure call (just like a local procedure call) to the client stub • Server is written as a standard procedure • Stubs take care of packaging arguments and sending messages • Packaging is called marshalling • Stub compiler generates stub automatically from specs in an Interface Definition Language (IDL) • Simplifies programmer task Distributed Operating Systems

Steps of a Remote Procedure Call • Client procedure calls client stub in normal way • Client stub builds message, calls local OS • Client's OS sends message to remote OS • Remote OS gives message to server stub • Server stub unpacks parameters, calls server • Server does work, returns result to the stub • Server stub packs it in message, calls local OS • Server's OS sends message to client's OS • Client's OS gives message to client stub • Stub unpacks result, returns to client Distributed Operating Systems

Passing Value Parameters (1) • Steps involved in doing remote computation through RPC 2-8 Distributed Operating Systems

Passing Value Parameters (2) • Original message on the Pentium • The message after receipt on the SPARC • The message after being inverted. The little numbers in boxes indicate the address of each byte Distributed Operating Systems

Parameter Specification and Stub Generation • A procedure • The corresponding message. Distributed Operating Systems

Doors • The principle of using doors as IPC mechanism. Distributed Operating Systems

Asynchronous RPC (1) • Interconnection in a traditional RPC • The interaction using asynchronous RPC 2-12 • Why Asynchronous RPC? Distributed Operating Systems

Asynchronous RPC (2) • A client and server interacting through two asynchronous RPCs 2-13 Distributed Operating Systems

DCE RPC • Distributed Computing Environment developed by Open Software Foundation. • It is a middleware between network Operating Systems and distributed application. • Distributed file service • Directory service • Security service • Distributed time service Distributed Operating Systems

Binding a Client to a Server • Client-to-server binding in DCE. 2-15 Distributed Operating Systems

Writing a Client and a Server • Uuidgen a program to generate a prototype IDL (Interface Definition Language) file. 2-14 Distributed Operating Systems

Marshalling • Problem: different machines have different data formats • Intel: little endian, SPARC: big endian • Solution: use a standard representation • Example: external data representation (XDR) • Problem: how do we pass pointers? • If it points to a well-defined data structure, pass a copy and the server stub passes a pointer to the local copy • What about data structures containing pointers? • Prohibit • Chase pointers over network • Marshalling: transform parameters/results into a byte stream Distributed Operating Systems

Binding • Problem: how does a client locate a server? • Use Bindings • Server • Export server interface during initialization • Send name, version no, unique identifier, handle (address) to binder • Client • First RPC: send message to binder to import server interface • Binder: check to see if server has exported interface • Return handle and unique identifier to client Distributed Operating Systems

Binding: Comments • Exporting and importing incurs overheads • Binder can be a bottleneck • Use multiple binders • Binder can do load balancing Distributed Operating Systems

Failure Semantics • Client unable to locate server: return error • Lost request messages: simple timeout mechanisms • Lost replies: timeout mechanisms • Make operation idempotent • Use sequence numbers, mark retransmissions • Server failures: did failure occur before or after operation? • At least once semantics (SUNRPC) • At most once • No guarantee • Exactly once: desirable but difficult to achieve Distributed Operating Systems

Failure Semantics • Client failure: what happens to the server computation? • Referred to as an orphan • Extermination: log at client stub and explicitly kill orphans • Overhead of maintaining disk logs • Reincarnation: Divide time into epochs between failures and delete computations from old epochs • Gentle reincarnation: upon a new epoch broadcast, try to locate owner first (delete only if no owner) • Expiration: give each RPC a fixed quantum T; explicitly request extensions • Periodic checks with client during long computations Distributed Operating Systems

Implementation Issues • Choice of protocol [affects communication costs] • Use existing protocol (UDP) or design from scratch • Packet size restrictions • Reliability in case of multiple packet messages • Flow control • Using TCP: too much overhead • Setup time • Keeping communication alive • State information Distributed Operating Systems

Implementation Issues(2) • Copying costs are dominant overheads • Need at least 2 copies per message • From client to NIC and from server NIC to server • As many as 7 copies • Stack in stub – message buffer in stub – kernel – NIC – medium – NIC – kernel – stub – server • Scatter-gather operations can reduce overheads Distributed Operating Systems

RPC vs. LPC • 4 properties of distributed computing that make achieving transparency difficult: • Partial failures • Concurrency • Latency • Memory access Distributed Operating Systems

Case Study: SUNRPC • One of the most widely used RPC systems • Developed for use with NFS • Built on top of UDP or TCP • TCP: stream is divided into records • UDP: max packet size < 8912 bytes • UDP: timeout plus limited number of retransmissions • TCP: return error if connection is terminated by server • Multiple arguments marshaled into a single structure • At-least-once semantics if reply received, at-least-zero semantics if no reply. With UDP tries at-most-once • Use SUN’s eXternal Data Representation (XDR) • Big endian order for 32 bit integers, handle arbitrarily large data structures Distributed Operating Systems

Binder: Port Mapper • Server start-up: create port • Server stub calls svc_register to register prog. #, version # with local port mapper • Port mapper stores prog #, version #, and port • Client start-up: call clnt_create to locate server port • Upon return, client can call procedures at the server Distributed Operating Systems

Rpcgen: generating stubs • Q_xdr.c: do XDR conversion Distributed Operating Systems

Partial failures • In local computing: • if machine fails, application fails • In distributed computing: • if a machine fails, part of application fails • one cannot tell the difference between a machine failure and network failure • How to make partial failures transparent to client? Distributed Operating Systems

Strawman solution • Make remote behavior identical to local behavior: • Every partial failure results in complete failure • You abort and reboot the whole system • You wait patiently until system is repaired • Problems with this solution: • Many catastrophic failures • Clients block for long periods • System might not be able to recover Distributed Operating Systems

Real solution: break transparency • Possible semantics for RPC: • Exactly-once • Impossible in practice • More than once: • Only for idempotent operations • At most once • Zero, don’t know, or once • Zero or once • Transactional semantics • At-most-once most practical • But different from LPC Distributed Operating Systems

Where RPC transparency breaks • True concurrency • Clients run truely concurrently client() { if (exists(file)) if (!remove(file)) abort(“remove failed??”); } • RPC latency is high • Orders of magnitude larger than LPC’s • Memory access • Pointers are local to an address space Distributed Operating Systems

RPC implementation • Stub compiler • Generates stubs for client and server • Language dependent • Compile into machine-independent format • E.g., XDR • Format describes types and values • RPC protocol • RPC transport Distributed Operating Systems

RPC protocol • Guarantee at-most-once semantics by tagging requests and response with a nonce • RPC request header: • Request nonce • Service Identifier • Call identifier • Protocol: • Client resends after time out • Server maintains table of nonces and replies Distributed Operating Systems

RPC transport • Use reliable transport layer • Flow control • Congestion control • Reliable message transfer • Combine RPC and transport protocol • Reduce number of messages • RPC response can also function as acknowledgement for message transport protocol Distributed Operating Systems

Implementing RPC (BiRREL et.l) • The primary purpose to make Distributed computation easy. Remove unnecessary difficulties • Two secondary Aims • Efficiency (A factor of beyond network transmission. • Powerful semantics without loss of simplicity or efficiency. • Secure communication Distributed Operating Systems

Implementing RPC (Options) • Shared address space among computers? • Is it feasible? • Integration of remote address space • Acceptable efficiency? • Keep RPC close to procedure call • No timeout Distributed Operating Systems

Implementing RPC • 1st solid implementation, not 1st mention • – Semantics in the face of failure • – Pointers • – Language issues • – Binding • – Communication issues (networking tangents) • Did you see those HW specs? • “very powerful” == 1000x slower & smaller Distributed Operating Systems

RPC Semantics 1 • Delivery guarantees. • “Maybe call”: • Clients cannot tell for sure whether remote procedure was executed or not due to message loss, server crash, etc. • Usually not acceptable. Distributed Operating Systems

RPC Semantics 2 • “At-least-once” call: • Remote procedure executed at least once, but maybe more than once. • Retransmissions but no duplicate filtering. • Idempotent operations OK; e.g., reading data that is read-only. Distributed Operating Systems

RPC Semantics 3 • “At-most-once” call • Most appropriate for non-idempotent operations. • Remote procedure executed 0 or 1 time, ie, exactly once or not at all. • Use of retransmissions and duplicate filtering. • Example: Birrel et al. implementation. • Use of probes to check if server crashed. Distributed Operating Systems

RPC Implementation (Birrel et al.) Caller Callee User stub RPC runtime RPC runtime Server stub Call packet User Server work rcv call unpk call xmit pck args Result pck result unpk result xmit return rcv return Distributed Operating Systems

Implementing RPC • Extra compiler pass on code generates stubs • – Client/server terminology & thinking Distributed Operating Systems

Implementing RPC (Rendezvous, binding) • How does client find the appropriate server? • Birrell&Nelson use a registry (Grapevine) • Server publishes interface (type & instance) • Client names service (& instance) • Fairly sophisticated then, common now • Simpler schemes: portmap/IANA (WW names) • Still a source of complexity & insecurity Distributed Operating Systems

Implementing RPC (Marshalling) • Representation • Processor arch dependence (big vs little endian) • Always canonical or optimize for instances • Pointers • No shared memory, so what about pointer args? • Make RPC read accesses for all server dereferences? • Slow! • Copy fully expanded data structure across? • Slow & incorrect if data structure is dynamically written • Disallow pointers? Common • Forces programmers to plan remote/local Distributed Operating Systems

Advanced Operating Systems