340 likes | 351 Views
This presentation covers the basics of communication in distributed systems and introduces the concept of Remote Procedure Calls (RPC). It explains the advantages of using RPC to hide communication details and focuses on a high-level interface. The presentation also discusses the challenges of RPC, such as passing parameters and handling failures.
E N D
Introduction to Distributed Systems Slides for CSCI 3171 Lectures E. W. Grundke
References D. E. Comer Computer Networks and Internets, 3rd ed. (Chapter 35: RPC and Middleware) Prentice-Hall 2001 A. Tanenbaum and M. van Steen (TvS) Distributed Systems: Principles and Paradigms Prentice-Hall 2002 G. Coulouris, J. Dollimore and T. Kindberg (CDK) Distributed System: Concepts and Design Addison-Wesley 2001
Acknowledgement Some slides from: TvS: http://www.prenhall.com/divisions/esm/app/author_tanenbaum/custom/dist_sys_1e/ CDK: http://www.cdk3.net/ig/beida/index.html
COMMUNICATION 4.1.1 Layered Protocols Lower-Level Protocols Transport Protocols Higher- Level Protocols Middleware Protocols 4.1.2 Types of Communication • for example an electronic mail system • An electronic mail system is a typical example in which communication is persistent. With persistent communication, a message that has been submitted for transmission is stored by the communication middleware as long as it takes to deliver it to the receiver.
Middleware is… ‘The role of middleware is to ease the task of designing, programming and managing distributed applications by providing a simple, consistent and integrated distributed programming environment. Essentially, middleware is a distributed software layer, or “platform” which abstracts over the complexity and heterogeneity of the underlying distributed environment with its multitude of network technologies, machine architectures, operating systems and programming languages.’ — IEEE Distributed Systems Online <http://dsonline.computer.org/middleware/>
COMMUNICATION • In contrast, with transient communication, a message is stored by the communication system only as long as the sending and receiving application are executing. • Besides being persistent or transient, communication can also be asynchronous or synchronous. • The characteristic feature of asynchronous communication is that a sender continues immediately after it has submitted its message for transmission. This means that the message is (temporarily) stored immediately by the middleware upon submission. • With synchronous communication, the sender is blocked until its request is known to be accepted. There are essentially three points where synchronization can take place.
COMMUNICATION • First, the sender may be blocked until the middleware notifies that it will take over transmission of the request. • Second, the sender may synchronize until its request has been delivered to the intended recipient. • Third, synchronization may take place by letting the sender wait until its request has been fully processed, that is, up the time that the recipient returns are response.
Motivation Writing clients and servers is error-prone (certainly in C!)(much low-level detail, yet common basic patterns) Instead: • hide communications behind a ‘function call’ • specify a high-level interface only • use an automated tool to generate the actual client/server code Advantage: • focus programmer attention on the application, not on the communications • familiar function-calling paradigm
What is RPC? Call a procedure (function, subroutine, method, …) in a program running on a remote machine, while hiding communication details from the programmer. Note: Think C, not java! We deal with objects later!
RPC Birrell and Nelson suggested was allowing programs to call procedures located on other machines. When a process on machine A calls' a procedure on machine B, the calling process on A is suspended, and execution of the called procedure takes place on B. Information can be transported from the caller to the callee in the parameters and can come back in the procedure result. No message passing at all is visible to the programmer. This method is known as Remote Procedure Call, or often just RPC. • The calling and called procedures run on different machines, they execute in different address spaces, which causes complications. • Parameters and results also have to be passed, which can be complicated, especially if the machines are not identical. • Finally, either or both machines can crash and each of the possible failures causes different problems. Problems
Standards for RPC RFC 1057: Remote Procedure Call RFC 1014: External Data Representation Author: Sun Microsystems Inc. Others: see Comer. Sun RPC Demo with the rpcgen tool: • http://www.eng.auburn.edu/department/cse/classes/cse605/examples/rpc/stevens/SUNrpc.html • 20 Oct 2002 archived copy
Conventional Procedure Call • Parameter passing in a local procedure call: the stack before the call to read • The stack while the called procedure is active TvS 2.7
Consider a call in C like count =read(td, but, nbytes); where fd is an.integer indicating a file, buf is an array of characters into which data are read, and nbytes is another integer telling how many bytes to read. If the call is made from the main program, the stack will be as shown in Fig. 4-5(a) before the call. To make the call, the caller pushes the parameters onto the stack in order, last one first, as shown in Fig. 4-5(b). (The reason that C compilers push the parameters in reverse order has to do with print!--by doing so, print!can always locate its first parameter, the format string.) After the read procedure has finished running, it puts the return value in a register, removes the return address, and transfers control back to the caller. The caller then removes the parameters from the stack, returning the stack to the original state it had before the call.
Conventional Parameter Passing Techniques • Call-by-value Several things are worth noting. For one, in C, parameters can be call-by value or call-by-reference. A value parameter, such as fd or nbytes, is simply copied to the stack as shown in Fig. 4-5(b). To the called procedure, a value parameter is just an initialized local variable. The called procedure may modify it, but such changes do not affect the original value at the calling side. • Call-by-reference A reference parameter in C is a pointer to a variable (i.e., the address of the variable), rather than the value of the variable. In the call to read. the second parameter is a reference parameter because arrays are always passed by reference in C. What is actually pushed onto the stack is the address of the character array. If the called procedure uses this parameter to store something into the character array, it does modify the array in the calling procedure. The difference between call-by-value and call-by-reference is quite important for RPC, as we shall see.
Call-by-copy/restore One other parameter passing mechanism also exists, although it is not used in C. It is called call-by-copy/restore. It consists of having the variable copied to the stack by the caller, as in call-by-value, and then copied back after the call, overwriting the caller's original value. Under most conditions, this achieves exactly the same effect as call-by-reference, but in some situations. such as the same parameter being present multiple times in the parameter list. the semantics are different. The call-by-copy/restore mechanism is not used in many languages.
Complications for Remote Calls How to make it look like a function call, but actually use a client and server? Answer: use ‘stubs’ (‘proxies’) How to handle parameters and return values? Platform differences (e.g. endian issues) Pass-by-reference Answer: use ‘external data representation’
Timing (Synchronous RPC) RPC between a client and server program. TvS 2.8
Steps of a Remote Procedure Call • Client procedure calls client stub in normal way • Client stub builds message, calls local OS • Client's OS sends message to remote OS • Remote OS gives message to server stub • Server stub unpacks parameters, calls server • Server does work, returns result to the stub • Server stub packs it in message, calls local OS • Server's OS sends message to client's OS • Client's OS gives message to client stub • Stub unpacks result, returns to client TvS 2.9
Parameter Passing The function of the client stub is to take its parameters, pack them into a message, and send them to the server stub. While this sounds straightforward, it is not quite as simple as it at first appears. In this section we will look at some of the issues concerned with parameter passing in RPC systems. Passing Value Parameters Packing parameters into a message is called parameter marshaling.
Passing Value Parameters Steps involved in doing remote computation through RPC 2-8 TvS 2.10
Parameter Specification and Stub Generation • A procedure • The corresponding message. TvS 2.12
Passing Reference Parameters Reference variables (pointers):pointers to arrayspointers to structures (objects without methods) What if the structure contains other pointers? The server may need a whole ‘graph’ of structures! ‘Parameter marshalling’
Interface Definition Language (IDL) Specifies an interface types constants procedures parameter data types Does not specify an implementation Compiled into client and server stubs
Asynchronous RPC • The interconnection between client and server in a traditional RPC • The interaction using asynchronous RPC 2-12 TvS 2.14
Asynchronous RPC:Deferred Synchronous RPC A client and server interacting through two asynchronous RPCs TvS 2.15
Distributed Computing Environment (DCE) A middleware system Developed by The Open Group (previously OSF) Includes distributed file service directory service security service distributed time service Adopted by Microsoft for distributed computing
DCE: Binding a Client to a Server 2-15 TvS 2.17
Motivation Data in running programs:Not just primitives, but arrays, pointers, lists, trees, etc. In general: complex graphs of interconnected structures or objects Data being transmitted: Sequential! Pointers make no sense. Structures must be flattened. All the heterogeneities must be hidden! (endian, binary formats, etc.) CDK 4.3
What is an External Data Representation? • ‘An agreed standard for the representation of data structures and primitive values.’ Internal to external: ‘marshalling’ External to internal: ‘unmarshalling’ Examples: Sun XDR CORBA’s Common Data Representation (CDR) Java Object Serialization
CORBA CDR Defined in CORBA 2.0 in 1998 Primitive types: Standard data types, both big/little endian, conversion by the receiver. Constructed types: sequence, string, array, struct, enumerated, union (not objects) Data types are not specified in the external format: receiver is assumed to have access to the definition (via IDL). (unlike Java Object Serialization!) CDK 4.3
CORBA CDR Example 0-3 5 Length of string 4-7 ”Smit” “Smith” 8-11 ”h____” 12-15 6 Length of string 16-19 ”Lond” “London” 20-23 ”on__” 24-27 1934 Unsigned long Index in sequence 4 bytes wide Notes of bytes The flattened form represents a Person struct with value:{”Smith”, ”London”, 1934} CDK 4.3