CSCI 6 33 : Advanced Operating Systems Dept. of Computer Science CSU San Marcos

CSCI 633: AdvancedOperating SystemsDept. of Computer ScienceCSU San Marcos Fall 2003 Kayhan Erciyes

The Plan: Applied Stuff • Introduction to distributed systems • Overview, definitions, characteristics, issues, challenges • A practical developer’s overview of networking • Characteristics of IP, TCP, UDP • Writing networked applications: UDP vs. TCP vs. higher level approaches • Then on to more theoretical aspects of distributed computing

The Plan: Theoretical Stuff • Theoretical Foundations • Fundamental Limitations • Causality • Logical clocks (logical, vector, matrix clocks) • Global states • Algorithms for distributed mutual exclusion • Distributed Shared Memory (DSM) • Topics in fault tolerance and reliability

Distributed Systems: Intro • Distributed System: • Autonomous Computers + Network • Communication via message-passing • No shared memory • No global clock • Range: • Two PC’s connected by $25 worth of networking hardware • Beowulf clusters: racks (or stacks) of PCs connected by high-speed networking • Millions of computers, connected by diverse networking technologies ranging from modems to gigabit connections (the Internet) mobile computing distributed computing

Network Operating Systems • Network operating systems extend sequential operating systems to provide: • Resource sharing (files, devices, …) • Interoperability (email, remote command execution, remote login…) • User is generally aware of machine boundaries (do “this” on “that” machine)

DOS vs. NOS • (Virtual) Transparency: The ability to see what you want to see, and not see what you consider to be of no interest • An exaggeration: • DOS’s allow you to “slide” toward a “one large machine” view of the network of computers

Unix/”Distributed Unix” • Unix: pervasive when cheap network technologies became available (1970’s), so a logical choice for building “distributed systems” • Extensions to Unix which provided interprocess communication were a principal building block of early distributed systems • Unix sockets API

Distributed Unix, Cont. • Problems with distributed Unix, though: • Monolithic kernels • Scattered process information • Progress migration, checkpointing difficult • These problems stem from taking a tool in wide use and “molding” it to fit a new need • Why not “kill” Unix and use a modern, “this is what I do well” distributed OS? • Commercial pressures. • Too much code already in use.

Desirable Characteristics • These are the “selling points” for distributed systems, answers to the question “What can they provide for me?” • Scalability • Want to be able to pile on more hardware as needed to tackle bigger problems without rewriting applications • E.g., in Parallel Virtual Machine (PVM), add more processors to share workload of smaller pieces of a large computation

Desirable: Fault Tolerance • Fault Tolerance • Higher availability and more resilience to faults than uniprocessor/shared memory multiprocessor solutions • Redundancy is the key—it’s present in all fault tolerance schemes, e.g., • replicated servers (e.g., for file or database storage) • snapshots of application state for recovery

Desirable: Transparency • (Virtual) Transparency • Want distributed system to appear to be one big, seamless machine, however... • ‘A distributed system is a system on which I cannot get any work done because a machine I’ve never heard of is down.’ -- L. Lamport • Fault tolerance/transparency must be considered together • Don’t want “transparent” components failing and preventing work from getting done

Desirable: Concurrency • Concurrency • Distributed systems can bring a lot of hardware to bear on difficult or time-consuming applications • MIMD situations (e.g. file server, compute server, web server machines) • (Multiple Instruction, Multiple Data—means distributed software components are different) • SIMD situations (e.g. parallel rendering applications for computer graphics) • (Single Instruction, Single Data—means lots of instances of a software component that performs a specific operation)

Desirable: Resource Sharing • Resource Sharing • Resource: display, printer, disk, CD-ROM, applications • “One Cadillac instead of 12 Yugos” • Distributed systems allow resources to be shared freely (or not), regardless of their location in the system • Can drastically reduce cost, improve utilization of resources, reduce administration nightmare • Security issues must be considered; security issues arise in distributed systems which don’t exist in isolated systems

Desirable: “Openness” • “Openness” • Heterogeneous hardware and software can be used to build systems and solve complicated problems • Published protocols and interfaces make putting together the diverse pieces possible • Which protocols are spoken? • What data formats are used? • Where are you? • Example: WWW. Diverse machines “speak” a standard protocol: HTTP. “Open” extensions include CGI (Common Gateway Interface) • Example: Universal Plug and Play (UPnP), Service Location Protocol (SLP) for building highly dynamic client/server systems

Advantages in Brief • The potential for building large, scalable, fault-tolerant “computers” with huge resources from commodity machines • Commodity “supercomputers” • In many circumstances, individual machines can still be used for traditional tasks • E.g., no reason individual users couldn’t read mail on one node of the Beowulf cluster… • Web-based supercomputing

A Few of the Challenges • No shared memory => • an unfamiliar programming model • application state is spread around • existing algorithms may be inappropriate • object-based distributed computing helps • No perfectly synchronized time source => • difficult to order events • difficult to say “do something NOW” to the entire system • We’re stuck with the speed of light (?) • More complicated failure modes than single machines! • Much easier for things to be “half broken”

“Failure” has Many Meanings • Halting failure: component simply stops • Fail-stop: halting failures with ability to detect failures • Omission failure: failure to send/recv message • Network failure: network link breaks • Network partition: network fragments into two or more disjoint subnetworks • Timing failure: action early/late; clock fails, etc. • Byzantine failure: arbitrary “malicious” behavior • This one models random, worst-case behavior

Topic Switch: Networking Basics • Network programming • Goal: be able to implement networked/distributed software rather than just talk about it • solve real problems • design client/server protocols • evaluate proposed solutions experimentally

Network Protocols • Protocol: Set of rules and data formats which make communication possible • A “language” for communication • Protocols are typically constructed using layers, with more abstract services provided by higher-level layers • Bottom layer(s) are the actual network hardware

Networking Performance Parameters • Latency - time to transfer “empty” message • Bandwidth or data transfer rate - how many bits/sec can be transferred (how thick the “pipe” is) message_transfer_time = latency + msg_length / data_transfer_rate • Consider: a modem connection vs. a van of magnetic tapes traveling an interstate highway • QoS: Quality of Service (bandwidth/latency guarantees for particular connections)

OSI Protocol Stack • OSI - Open Systems Interconnect • Application - application interfaces (httpd, ftp) • Presentation - network representation for data • Session - connections, encryption • Transport - message à packets • Network - network-specific packets, routing • Data Link - transmission of packets between “directly” connected machines + error issues • Physical - hardware (“I can touch it”)

Application Application Presentation Presentation Session Session Transport Transport Network Network Data Link Data Link Physical Physical Communication Through Layers

Application UDP TCP IP Physical TCP/IP Protocol Stack • ISO stack is good as a model for understanding networks • Layers in “real” network stacks aren’t so differentiated • TCP/IP stack has won primarily because of the free implementation shipped in early versions of BSD Unix • Addresses above IP are (port, address) combinations Application Transport Network

Transport Protocols • UDP (User Datagram Protocol) • Connectionless • Fast setup • Easy one-to-many communication • Datagram-oriented (fixed size chunks of data) • Packet reordering • Packet loss (no flow control, bad packets dropped) • Packet duplication • (Absolute) maximum datagram length: 64K • Usable maximum is more complicated • 8K is generally safe for modern systems

Transport Protocols, Cont. • TCP (Transmission Control Protocol) • Connection-oriented • Byte stream-oriented • Slower setup • Consumes file handles: one per connection • Flow control, automatic retransmission • No packet reordering (delivery is FIFO) • No packet loss • No duplication • Theoretically “no” limit on size of objects that can be dumped into a TCP stream • In practice, limits exist

Unix Sockets: TCP and UDP from a Programming Perspective • First the standard Unix system calls for C, then from a Java perspective • Unix C Server: • int socket(PF_INET | PF_UNIX, SOCK_STREAM | SOCK_DGRAM, …) • int bind(socket, localaddr …) • int listen(socket, queuelength) • int accept(socket, remoteaddr) • select( … ) allows a set of sockets to be checked to determine if input is available • Allows service of multiple clients without multithreading

Unix Sockets, Cont. • Unix C Client: • int socket(PF_INET | PF_UNIX, SOCK_STREAM | SOCK_DGRAM, …) • int connect(socket, remoteaddr) • Unlike the server, the client typically doesn’t care which port; the system selects one • Then data is transmitted and received (for both client and server) with: • write(socket, message, len, …) • read(socket, buffer, len, …)

SERVER void ServeEchoClients(int port) { int i, found; int alive; // client still around after read? int sock; // socket for listening int newconn; // socket for new client int highest; // highest handle in use; needed for select() int ready; // number of ready sockets (from select() call) int connected[100]; // handle only 100 simultaneous clients. fd_set socks; // sockets ready for reading, for select() call struct sockaddr_in server_address; // structure for bind() call int reuse=1; // avoid port in use problems // initialize sockets stuff sock = socket(AF_INET, SOCK_STREAM, 0); if (sock < 0) { Shutdown("echo_server: socket() call failed. Can't continue."); } setsockopt(sock, SOL_SOCKET, SO_REUSEADDR, &reuse, sizeof(reuse)); memset((char *) &server_address, 0, sizeof(server_address)); server_address.sin_family = AF_INET; server_address.sin_addr.s_addr = htonl(INADDR_ANY); server_address.sin_port = htons(port); if (bind(sock, (struct sockaddr *)&server_address, sizeof(server_address)) < 0 ) { Shutdown("echo_server: bind() call failed. Can't continue."); }

listen(sock, 15); // 15 is queue length for incoming connections highest = sock; memset((char *) &connected, 0, sizeof(connected)); printf("echo_server: Listening...\n"); while (1) { FD_ZERO(&socks); // initialize set of sockets to monitor FD_SET(sock,&socks); // always care about listening socket // also care about sockets for connected clients for (i=0; i < 100; i++) { if (connected[i] != 0) { FD_SET(connected[i],&socks); if (connected[i] > highest) { highest = connected[i]; } } } ready = select(highest+1, &socks, NULL, NULL, NULL); if (ready < 0) { Shutdown("echo_server: select() call failed. Can't continue."); }

// see who's knocking at our (socket) door... if (FD_ISSET(sock,&socks)) { // new client newconn = accept(sock, NULL, NULL); if (newconn < 0) { printf("** FAILED TO CONNECT TO NEW CLIENT **\n"); } else { // find a home for new client socket found=0; for (i=0; i < 100 && ! found; i++) { if (connected[i] == 0) { printf("echo_server: Connected to new client.\n"); connected[i] = newconn; found=1; } } if (! found) { printf("echo_server: OVERLOADED.\n"); close(newconn); } } }

// check connected clients, deal with one line for each ready client for (i=0; i < 100; i++) { if (FD_ISSET(connected[i],&socks)) { alive = ReadAndEcho(connected[i]); if (! alive) { close(connected[i]); connected[i] = 0; // client hung up } } } } }

int ReadAndEcho(int handle) { char c=-1; int count=1; int ret=1; printf("echo_server: Reading, hoping for \\n...\n"); count = read(handle, &c, 1); // read one char while (c != '\n' && count > 0) { count = write(handle, &c, 1); // echo it if (count) { putchar(c); count=read(handle, &c, 1); // read one char } } if (count == 0) { printf("echo_server: Client hung up.\n"); ret=0; } else { // echo final \n count = write(handle, &c, 1); putchar('\n'); } printf("echo_server: Returning to listening state.\n"); return ret;

CLIENT void EchoClient(char *ip, int port) { struct sockaddr_in them; // address of server int sock; // socket for communication w/ server int err; int len; char buf[512]; char c; int count; struct hostent *remip; // will use this one... unsigned long remip2; // or this one as the binary remote addr bzero((char *)&them, sizeof(them)); them.sin_family = AF_INET; them.sin_port = htons(port); // hton*() convert integer byte order // try inet_addr() call first; some unixes freak if we provide a // dotted numeric IP address to gethostbyname() remip2=inet_addr(ip); if (remip2 <= 0) { remip=gethostbyname(ip); if (remip == NULL) { herror(NULL); Shutdown("Couldn't initialize connection parameters."); } }

if (remip2 <= 0) { memcpy(&(them.sin_addr.s_addr), remip->h_addr, remip->h_length); } else { them.sin_addr.s_addr = remip2; } if ((sock=socket(AF_INET, SOCK_STREAM, 0)) < 0) { printf("echo_client: socket() failed with error %d.\n", sock); Shutdown("Can't continue."); } if ((err = connect(sock, (struct sockaddr*)&them, sizeof(struct sockaddr_in)))) { printf("echo_client: connect() failed with error %d.\n", err); Shutdown("Can't continue."); }

printf("echo_client: \".\" on a line by itself disconnects.\n"); gets(buf); while (buf[0] != '.') { len=strlen(buf); buf[len++]='\n'; // add newline buf[len]=0; write(sock, buf, strlen(buf)); // transmit // get response one char at a time printf("echo_client: Service response:\""); count = read(sock, &c, 1); while (c != '\n' && count > 0) { putchar(c); count=read(sock, &c, 1); // read one char } printf("\"\n"); if (count == 0) { printf("echo_client: Server hung up. How rude!\n"); buf[0]='.'; } gets(buf); } close(sock); } // end of EchoClient

Java IPC • TCP and UDP socket protocols have separate interfaces in Java • More abstract than standard Unix interface, but interoperable and almost as powerful • Far more portable • Simple Java TCP Client • Simple Java TCP Server • Simple Java UDP datagram send/receive

Simple TCP Client in Java try { s=new Socket(servhostname, port); out=new DataOutputStream(s.getOutputStream()); in=new DataInputStream(s.getInputStream()); } catch (Exception e) { /* error */ } // do standard I/O operations on ‘in’ and ‘out’

Simple TCP Server in Java try { servsock=new ServerSocket(listenport); } catch (Exception e) { /* error */ } ... while ( … ) { try { cl=servsock.accept() out=new DataOutputStream(cl.getOutputStream()); in=new DataInputStream(cl.getInputStream()); // do standard I/O operations on ‘in’ and ‘out’ … cl.close(); } catch (Exception e) { /* error for this client connection */ } }

Limitations • Simple client/server are single threaded • Affects server most, since it can only service only client at a time • Other clients are blocked while server is busy • “Bad” client can tie up server forever • Java does NOT support select() for sockets

Simple UDP Client/Server in Java byte[] buf = new byte[MAXDGRAMSIZE]; sock=new DatagramSocket(myport); while (...) { try { // incoming datagram DatagramPacket ingram = new DatagramPacket(buf, buf.length); sock.receive(ingram); … // outgoing datagram DatagramPacket outgram = new DatagramPacket(buf, buf.length, theiraddr, theirport); sock.send(outgram); } catch (UnknownHostException uhe) { … } catch (IOException ioe) { } }

Multithreading for the TCP Server public class MTServer { public static void main(String[] args) { int port=2345; final int MaxClients = 35; ... ServerSocket serversocket = null; try { serversocket = new ServerSocket(port, MaxClients); while (true) { Socket sock = serversocket.accept(); ServerThread thr = new ServerThread(sock ...); thr.start(); } } catch (Exception e) { /* server must die */ } }

Multithreading, Cont. public class ServerThread extends Thread { private Socket sock; private DataInputStream in=null; private DataOutputStream out=null; public ServerThread(Socket sock, ... ) { this.sock = sock; ... try { in=new DataInputStream(this.sock. getInputStream()); out=new DataOutputStream(this.sock. getOutputStream()); } catch (Exception e) { /* oops */ } }

Multithreading, Cont. public void run() { boolean bye=false; try { while (!bye) { String command = in.readUTF(); if (command.equals(“BYE”) { bye = true; … } else if (…) { … } } catch (Exception e) { /* death for this client connection */ } finally { /* cleanup for this client */ out.close(); in.close(); } }

Robust TCP • Want to detect broken connections, avoid client and/or server “hangs” • Timeouts (simple) • Unix “keepalive” timers (SO_KEEPALIVE) • Evil! • Default is generally several hours! • Timeout period generally global, hard to configure • “Connection exercises” • Heartbeats

Java, TCP, and Robust Client/Server • The following read can hang indefinitely if the server dies try { s = input.readUTF(); // ... } catch (Exception e) { System.out.println(”Broken connection”); }

TCP: Maintaining Control with Timeouts try { gotstr = false; while (! gotstr) { try { sock.setSoTimeout(90000); // 90s timeout s = input.readUTF(); gotstr=true; } catch (InterruptedIOException ie) { // at least I’m not stuck! // can do other processing here } } } catch (Exception e) { System.out.println(”Broken connection”); }

Server Side “Probing” // Always detects broken connection try { gotstr = false; while (! gotsr) { try { sock.setSoTimeout(90000); s = input.readUTF(); gotstr=true; } catch (InterruptedIOException ie) { // timeout...tempt fate by writing System.out.println(”Connection check”); // client must ignore output.writeUTF("$!$"); System.out.println("Connection OK"); } } } catch (Exception e) { System.out.println(”Broken connection”); }

Client Side “Probing” // Clients wants to to read a string... // Assumes that a response is expected w/in 30s boolean significant=false; while (! significant) { socket.setSoTimeout(30000); // 30s timeout try { s = input.readUTF(); significant = (! s.equals("$!$")); } catch (InterruptedIOException ie) { // assume connection is broken throw new IOException("Server is down?"); } }

Want more? • See my “www.cs.uno.edu/~golden/teach.html” examples • For a good FAQ on sockets programming: http://www.developerweb.net/sock-faq/ • W. Stevens (deceased) books are classics for network programming (search for his name)

Higher-level Communication • MOM (Message-oriented Middleware) • Message-passing libraries • PVM • MPI • Spaces • Linda • Object-based approaches • RMI • CORBA

CSCI 6 33 : Advanced Operating Systems Dept. of Computer Science CSU San Marcos