Distributed Operating Systems

Distributed Operating Systems Luai M. Malhis Computer Engineering Department Most of the course materials are taken from http://lass.cs.umass.edu/~shenoy/courses/677 CS677: Distributed OS

Course Outline • Introduction (today) • What, why, why not? • Basics • Interprocess Communication • RPCs, RMI, message- and stream-oriented communication • Processes and their scheduling • Thread/process scheduling, code/process migration • Naming and location management • Entities, addresses, access points CS677: Distributed OS

Course Outline • Canonical problems and solutions • Mutual exclusion, leader election, clock synchronization, … • Resource sharing, replication and consistency • DSM, DFS, consistency issues, caching and replication • Fault-tolerance • Security in distributed Systems • Distributed middleware • Advanced topics: web, multimedia, real-time and mobile systems CS677: Distributed OS

Misc. Course Details • Textbook: Distributed Systems by Tannenbaum and Van Steen, Prentice Hall 2001 • Grading • 4-5 Homeworks (20%), 3-4 programming assignments (35%) • 1 mid-term and 1 final (40%), class participation (5%) • Course mailing list: cs677@cs.umass.edu • You need to add yourself to this list! [ see class web page ] • Pre-requisites • Undergrad course in operating systems • Good programming skills in a high-level prog. language CS677: Distributed OS

Definition of a Distributed System • A distributed system: • Multiple connected CPUs working together • A collection of independent computers that appears to its users as a single coherent system • Examples: parallel machines, networked machines CS677: Distributed OS

Advantages and Disadvantages • Advantages • Communication and resource sharing possible • Economics – price-performace ratio • Relibility, scalability • Potential for incremental growth • Disadvantages • Distribution-aware PLs, OSs and applications • Network connectivity essential • Security and privacy CS677: Distributed OS

Transparency in a Distributed System Different forms of transparency in a distributed system. CS677: Distributed OS

Scalability Problems Examples of scalability limitations. CS677: Distributed OS

Hardware Concepts: Multiprocessors (1) • Multiprocessor dimensions • Memory: could be shared or be private to each CPU • Interconnect: could be shared (bus-based) or switched • A bus-based multiprocessor. CS677: Distributed OS

Multiprocessors (2) • A crossbar switch b) An omega switching network 1.8 CS677: Distributed OS

Homogeneous Multicomputer Systems • Grid b) Hypercube 1-9 CS677: Distributed OS

Distributed Systems Models • Minicomputer model (e.g., early networks) • Each user has local machine • Local processing but can fetch remote data (files, databases) • Workstation model (e.g., Sprite) • Processing can also migrate • Client-server Model (e.g., V system, world wide web) • User has local workstation • Powerful workstations serve as servers (file, print, DB servers) • Processor pool model (e.g., Amoeba, Plan 9) • Terminals are Xterms or diskless terminals • Pool of backend processors handle processing CS677: Distributed OS

Uniprocessor Operating Systems • An OS acts as a resource manager or an arbitrator • Manages CPU, I/O devices, memory • OS provides a virtual interface that is easier to use than hardware • Structure of uniprocessor operating systems • Monolithic (e.g., MS-DOS, early UNIX) • One large kernel that handles everything • Layered design • Functionality is decomposed into N layers • Each layer uses services of layer N-1 and implements new service(s) for layer N+1 CS677: Distributed OS

Uniprocessor Operating Systems • Microkernel architecture • Small kernel • user-level servers implement additional functionality CS677: Distributed OS

Distributed Operating System • Manages resources in a distributed system • Seamlessly and transparently to the user • Looks to the user like a centralized OS • But operates on multiple independent CPUs • Provides transparency • Location, migration, concurrency, replication,… • Presents users with a virtual uniprocessor CS677: Distributed OS

Types of Distributed OSs CS677: Distributed OS

Multiprocessor Operating Systems • Like a uniprocessor operating system • Manages multiple CPUs transparently to the user • Each processor has its own hardware cache • Maintain consistency of cached data CS677: Distributed OS

Multicomputer Operating Systems 1.14 CS677: Distributed OS

Network Operating System 1-19 CS677: Distributed OS

Network Operating System • Employs a client-server model • Minimal OS kernel • Additional functionality as user processes 1-20 CS677: Distributed OS

Middleware-based Systems • General structure of a distributed system as middleware. 1-22 CS677: Distributed OS

Comparison between Systems CS677: Distributed OS

Communication in Distributed Systems • Issues in communication (today) • Message-oriented Communication • Remote Procedure Calls • Transparency but poor for passing references • Remote Method Invocation • RMIs are essentially RPCs but specific to remote objects • System wide references passed as parameters • Stream-oriented Communication CS677: Distributed OS

Communication Between Processes • Unstructured communication • Use shared memory or shared data structures • Structured communication • Use explicit messages (IPCs) • Distributed Systems: both need low-level communication support (why?) CS677: Distributed OS

Communication Protocols • Protocols are agreements/rules on communication • Protocols could be connection-oriented or connectionless 2-1 CS677: Distributed OS

Layered Protocols • A typical message as it appears on the network. 2-2 CS677: Distributed OS

Client-Server TCP • Normal operation of TCP. • Transactional TCP. 2-4 CS677: Distributed OS

Middleware Protocols • Middleware: layer that resides between an OS and an application • May implement general-purpose protocols that warrant their own layers • Example: distributed commit 2-5 CS677: Distributed OS

Client-Server Communication Model kernel kernel kernel kernel • Structure: group of servers offering service to clients • Based on a request/response paradigm • Techniques: • Socket, remote procedure calls (RPC), Remote Method Invocation (RMI) file server process server terminal server client CS677: Distributed OS

Issues in Client-Server Communication • Addressing • Blocking versus non-blocking • Buffered versus unbuffered • Reliable versus unreliable • Server architecture: concurrent versus sequential • Scalability CS677: Distributed OS

Addressing Issues user user user server server server NS • Question: how is the server located? • Hard-wired address • Machine address and process address are known a priori • Broadcast-based • Server chooses address from a sparse address space • Client broadcasts request • Can cache response for future • Locate address via name server CS677: Distributed OS

Blocking versus Non-blocking • Blocking communication (synchronous) • Send blocks until message is actually sent • Receive blocks until message is actually received • Non-blocking communication (asynchronous) • Send returns immediately • Return does not block either • Examples: CS677: Distributed OS

Buffering Issues user user server server • Unbuffered communication • Server must call receive before client can call send • Buffered communication • Client send to a mailbox • Server receives from a mailbox CS677: Distributed OS

Reliability request • Unreliable channel • Need acknowledgements (ACKs) • Applications handle ACKs • ACKs for both request and reply • Reliable channel • Reply acts as ACK for request • Explicit ACK for response • Reliable communication on unreliable channels • Transport protocol handles lost messages ACK Server User reply ACK request Server User reply ACK CS677: Distributed OS

Server Architecture • Sequential • Serve one request at a time • Can service multiple requests by employing events and asynchronous communication • Concurrent • Server spawns a process or thread to service each request • Can also use a pre-spawned pool of threads/processes (apache) • Thus servers could be • Pure-sequential, event-based, thread-based, process-based • Discussion: which architecture is most efficient? CS677: Distributed OS

Scalability • Question:How can you scale the server capacity? • Buy bigger machine! • Replicate • Distribute data and/or algorithms • Ship code instead of data • Cache CS677: Distributed OS

To Push or Pull ? • Client-pull architecture • Clients pull data from servers (by sending requests) • Example: HTTP • Pro: stateless servers, failures are each to handle • Con: limited scalability • Server-push architecture • Servers push data to client • Example: video streaming, stock tickers • Pro: more scalable, Con: stateful servers, less resilient to failure • When/how-often to push or pull? CS677: Distributed OS

Remote Procedure Calls • Goal: Make distributed computing look like centralized computing • Allow remote services to be called as procedures • Transparency with regard to location, implementation, language • Issues • How to pass parameters • Bindings • Semantics in face of errors • Two classes: integrated into prog, language and separate CS677: Distributed OS

Conventional Procedure Call Parameter passing in a local procedure call: the stack before the call to read b) The stack while the called procedure is active CS677: Distributed OS

Parameter Passing • Local procedure parameter passing • Call-by-value • Call-by-reference: arrays, complex data structures • Remote procedure calls simulate this through: • Stubs – proxies • Flattening – marshalling • Related issue: global variables are not allowed in RPCs CS677: Distributed OS

Client and Server Stubs • Principle of RPC between a client and server program. CS677: Distributed OS

Stubs • Client makes procedure call (just like a local procedure call) to the client stub • Server is written as a standard procedure • Stubs take care of packaging arguments and sending messages • Packaging is called marshalling • Stub compiler generates stub automatically from specs in an Interface Definition Language (IDL) • Simplifies programmer task CS677: Distributed OS

Steps of a Remote Procedure Call • Client procedure calls client stub in normal way • Client stub builds message, calls local OS • Client's OS sends message to remote OS • Remote OS gives message to server stub • Server stub unpacks parameters, calls server • Server does work, returns result to the stub • Server stub packs it in message, calls local OS • Server's OS sends message to client's OS • Client's OS gives message to client stub • Stub unpacks result, returns to client CS677: Distributed OS

Example of an RPC 2-8 CS677: Distributed OS

Marshalling • Problem: different machines have different data formats • Intel: little endian, SPARC: big endian • Solution: use a standard representation • Example: external data representation (XDR) • Problem: how do we pass pointers? • If it points to a well-defined data structure, pass a copy and the server stub passes a pointer to the local copy • What about data structures containing pointers? • Prohibit • Chase pointers over network • Marshalling: transform parameters/results into a byte stream CS677: Distributed OS

Binding • Problem: how does a client locate a server? • Use Bindings • Server • Export server interface during initialization • Send name, version no, unique identifier, handle (address) to binder • Client • First RPC: send message to binder to import server interface • Binder: check to see if server has exported interface • Return handle and unique identifier to client CS677: Distributed OS

Failure Semantics • Client unable to locate server: return error • Lost request messages: simple timeout mechanisms • Lost replies: timeout mechanisms • Make operation idempotent • Use sequence numbers, mark retransmissions • Server failures: did failure occur before or after operation? • At least once semantics (SUNRPC) • At most once • No guarantee • Exactly once: desirable but difficult to achieve CS677: Distributed OS

Failure Semantics • Client failure: what happens to the server computation? • Referred to as an orphan • Extermination: log at client stub and explicitly kill orphans • Overhead of maintaining disk logs • Reincarnation: Divide time into epochs between failures and delete computations from old epochs • Gentle reincarnation: upon a new epoch broadcast, try to locate owner first (delete only if no owner) • Expiration: give each RPC a fixed quantum T; explicitly request extensions • Periodic checks with client during long computations CS677: Distributed OS

Implementation Issues • Choice of protocol [affects communication costs] • Use existing protocol (UDP) or design from scratch • Packet size restrictions • Reliability in case of multiple packet messages • Flow control • Copying costs are dominant overheads • Need at least 2 copies per message • From client to NIC and from server NIC to server • As many as 7 copies • Stack in stub – message buffer in stub – kernel – NIC – medium – NIC – kernel – stub – server • Scatter-gather operations can reduce overheads CS677: Distributed OS

Case Study: SUNRPC • One of the most widely used RPC systems • Developed for use with NFS • Built on top of UDP or TCP • TCP: stream is divided into records • UDP: max packet size < 8912 bytes • UDP: timeout plus limited number of retransmissions • TCP: return error if connection is terminated by server • Multiple arguments marshaled into a single structure • At-least-once semantics if reply received, at-least-zero semantics if no reply. With UDP tries at-most-once • Use SUN’s eXternal Data Representation (XDR) • Big endian order for 32 bit integers, handle arbitrarily large data structures CS677: Distributed OS

Distributed Operating Systems