COP 5611 Operating Systems Spring 2010

COP 5611 Operating Systems Spring 2010 Dan C. Marinescu Office: HEC 439 B Office hours: M-Wd 2:00-3:00 PM

Lecture 4 Last time: Names and three basic abstractions Today: Soft modularity Procedure call conventions Memory map Errors and soft-modularity Enforcing modularity with clients and services Communication with messages isolates modules. Case studies: WWW, X11 Heterogeneity Intermediaries Remote Procedure Calls Domain Name Service (DNS) Virtualization 2 2

Soft modularity • Soft modularity  divide a program in procedures that call each other. • Hard to debug; if one of the modules has an infinite loop, a call never returns • The caller and the callee are in the same address space and may misuse the stack. • Naming conflicts and wrong context specification.

Procedure call convention Caller saves on the stack (after each operation it adjusts the Stack Pointer -SP) registers arguments return address transfers control to the calle (jump to its starting address) Calee loads from the stack the arguments carries out the desired calculation and load the results in a register (R0) transfers control back to the caller loads in the PC the return address to the caller Caller adjusts the stack restores its registers

Example: procedure MEASURE (func) start_time  GET_TIME(SECONDS) funct() end_time GET_TIME(SECONDS) return (end_time-start_time) procedure GET_TIME (units) timeCLOCK time  CONVERT_TO_UNITS(time,units) return time

Machine code for MEASURE 100 ST R1,SP //save content of R1 on the stack 104 ADD 4, SP //increment stack pointer 108 ST R2, SP //save content of R2 on the stack 112 ADD 4, SP //increment stack pointer 116 LA R1, SECONDS //load address of the argument in R1 120 ST R1, SP // store address of the argument on the stack 124 ADD 4, SP // increment stack pointer 128 L A R1,148 // load return address in R1 132 ST R1, SP // store return address on the stack 136 ADD 4, SP //adjust top stack pointer 140 L A R1, 200 // load address of GET_TIME in R1 144 JMP R1 //transfer control to GET_TIME 148 S 4,SP // decrement stack pointer 152 L R2, SP // restore the contents of R2 156 S 4,SP // decrement stack pointer 160 L R1,SP // restore the contents of R1 164 S 4,SP // decrement stack pointer 168 ST R0, start // store result passed by GET_TIME in Ro into start

Machine code for GET_TIME 200 L R1,SP //load address of the stack pointer in R1 204 S R1,8 //increment stack pointer 208 L R2, R1 //load address of the argument in R2 212 code for the body of GET_TIME 216 code for the body of GET_TIME 220 L R0, time // load in R0 the result 224 L R1,SP // reload in R1 address of the stack pointer 228 S R1,4 // decrement the stack pointer 231 L PC,R1 // load return address from stack into PC

Soft modularity allows errors to propagate • Conventions between caller and callee regarding register usage: • The caller passes the argument (the address of the variable SECONDS) in register R1 • The callee returns the value of the result in register R0 • The callee uses register R2 so the caller must save it before transferring control to the callee • Potential problems caused by soft modularity • The callee is expected to leave the stack pointer as it was set by the caller. But the callee may mess up… • The transfer of control the callee may return to the wrong address • The caller may attempt to get the result from the wrong register • The callee may use registers that the caller has not saved on the stack before transferring control to the callee • An error of the callee will affect the caller • If either the caller or the callee agree to communicate using global variables then changing of these variable will affect other modules

Strongly typed languages help enforce modularity • Provide: • Strong guarantees about the run-time behavior of a program before program execution, whether provided by static analysis, the execution semantics of the language or another mechanism. • Type safety; that is, at compile or run time, the rejection of operations or function calls which attempt to disregard data types. • The guarantee that a well-defined error or exceptional behavior occurs as soon as a type-matching failure happens at runtime. • The compiler ensures that operations only occur on operand types that are valid for the operation. • The type of a given data object does not vary over that object's lifetime. For example, class instances may not have their class altered. • The absence of ways to evade the type system. Such evasions are possible in languages that allow programmer access to the underlying representation of values, i.e., their bit-pattern. • A programming language is strongly typed if type conversions are allowed only when an explicit notation, often called a cast, is used to indicate the desire of converting one type to another. • Disallowing any kind of type conversion. Values of one type cannot be converted to another type, explicitly or implicitly.

Soft modularity may be affected by other factors • Different modules are written in different languages • Errors in the run-time support • Errors in the compiler

Enforced modularity • Enforced modularity  force modules to interact only by sending messages. • The client/service organization makes it more difficult: • For programmers to violate modularity  the only way two modules interact is by means of messages; naming within one module are not visible outside the module. • Errors to propagate  clients and services are independent modules and may fail separately. • Attack the system  if messages are checked carefully the attacker has a very hard time • Other advantages: • The system is more robust; the servers are stateless. • Resources can be managed more effectively.

Client/service organization Not only separates functions but also enforces this separation!! No globally-shared state (e.g., through the stack) Errors can only propagate from client to service and vice-versa only if the messages are not properly checked. A client can use time-outs to detect a non-responsive service and take another course of action. The separation of abstraction from implementation is clearer; the client needs only to know the format of the message, the implementation of the service may change. 14

A client-service system  the World Wide Web The information in each page is encoded and formatted according to some standard, e.g. images: GIF, JPEG, video: MPEG audio: MP3 The web is based upon a “pull” paradigm. The server has the resources and the client pulls it from the server. The Web server also called an HTTP server listens at a well known port, port 80 for connections from clients. The HTTP protocol uses TCP to establish a connection between the client and the server. Some pages are created on the “fly” other have to be fetched from the disk. 15

Client server interactions in HTTP 18

Heterogeneity in client-server systems The client and the service may run on systems with different: internal data representation, e.g., big versus little endian processor architecture, e.g., 32 bit /64 bit addressing operating systems, e.g., version of Linux, Mac OS, etc. libraries Multiple clients and services provided/available on systems with different characteristics : the same service may be provided by multiple systems; a service may in turn user services available on other systems; the same client may use multiple services. Marshaling/unmarshaling  conversion of the information in a message into a canonical representation and back 19

Little endian and big endian 20

Timing; response time The client and the service are connected via communication channel. The response time is a function of the latency and the bandwidth of the communication channel. Distinguish between service time communication time Synchronous call  the client blocks waiting for the response. Easier to manage. Asynchronous calls  the client does not block. Multi-threading and asynchronous calls. Message buffering in kernel space (to allow clients to make asynchronous calls) in user space (before sending) 21

Example: the X-windows (X11) X11  software system and network protocol that provides a GUI for networked computer. Developed as part of Project Athena at MIT in 1984. Separates the service program  manipulates the display from the client program  uses the display. An application running on one machine can access the display on a different computer. Clients operate asynchronously, multiple requests can be sent  the display rate could be much higher than the rate between the client and the server. 22

Intermediaries What if the sender and the receiver of a message are not active at the same time? Intermediaries support buffered communication and allow more flexibility the intermediary may decide how to sort messages The sender and the receiver may: Push a message Pull a message Example: the mail service: The sender pushes a message into his/her outbox The outbox pushes it to the inbox of the recipient The recipient pulls it whenever s(he) wants The publish/subscribe paradigm  the sender notifies an event service when it produced a message. Recipients subscribe to the events and when the events occur the messages are delivered 23

Trusted intermediary Trusted service acting as an intermediary among multiple clients. Enforces modularity  a fault of one client does not affect other clients. Examples: File systems Mail systems Supports thin-clients  a significant part of client functionality is transferred to the intermediary. In a thin client/server system, the only software installed on the thin client is the user interface, certain frequently used applications, and a networked operating system. By simplifying the load on the thin client, it can be a very small, low-powered device giving lower costs to purchase and to operate per seat. The server, or a cluster of servers has the full weight of all the applications, services, and data. By keeping a few servers busy and many thin clients lightly loaded, users can expect easier system management and lower costs, as well as all the advantages of networked computing: central storage/backup and easier security. Because the thin client is relatively passive and low-maintenance, but numerous, the entire system is simpler and easier to install and to operate. As the cost of hardware plunges and the cost of employing a technician, buying energy, and disposing of waste rises, the advantages of thin clients grow. From the user's perspective, the interaction with monitor, keyboard, and cursor changes little from using a thick client. 24

Editor is a client ofFile service which is a client ofBlock-storage serviceFile service is a trusted intermediary. 25

Peer-to-peer systems Decentralized architecture without a trusted intermediary. Peers are both suppliers and consumers of resources, in contrast to the traditional client-server model where servers supply, and clients consume. Peer-to-peer systems often implement an Application Layer overlay network on top of the native or physical network topology. Such overlays are used for indexing and peer discovery. Content is typically exchanged directly over the underlying IP network. Anonymous peer-to-peer systems implement extra routing layers to obscure the identity of the source or destination of queries. In structured peer-to-peer networks, connections in the overlay are fixed. They typically use distributed hash table-based (DHT) indexing, such as in the Chord system developed at MIT Unstructured peer-to-peer networks do not provide any algorithm for organization or optimization of network connections. Advantages use of spare resources at many sites difficult to censor content Disadvantage Finding information in a large peer-to-peer network is hard. 26

Remote procedure call (RPC) Support inter-process communication of remotely located processes and allows implementation of client-server systems (RFC 1831) Preserve the semantics of a local procedure call. To use an RPC a process may use a special service: PORTMAP or RPCBIND available at port 111. A new RPC service uses the portmapper to register. The portmapper also allows a service lookup. If the process knows the port number of the RPC it may call directly. RPC/TCP and also RPC/UDP Messages must be well-structured; contain the identification of the specific RPC are addressed to an RPC demon listening at an RPC port. A machine independent representation of data  external data representation standard (XDR). 27

Stub Unburdens a user from implementation details of the RPC; it hides: the marshalling of the arguments the communication details The client calls the client stub which: marshals the arguments of the call into messages sends the message waits for the responds when the response arrives it un-marshals the results returns to the client 28

RPCs differ from ordinary procedure calls RPCs reduce the so called fate sharing between caller and callee have a different semantics (see next slide) take longer global variables and RPC do not mix well 30

RPC semantics At least once the client stub resends a message up to a given number of times until it receives a message from the server; is no guarantee of a response the server may end up executing the a request more than once suitable for side-effect free operations At most once a message is acted upon at most once. If the timeout set for receiving the response expires then an error code is delivered to the client. The server must keep a history of the time-stamps of all messages. Messages may arrive out of order….. suitable for operations which have side effects Exactly once implement the at most once and request an acknowledgment from the server. 31

The client-server architecture and the Internet • The client-server architecture • allows development of many distributed applications • Supports basic functions in the Internet e.g., the DNS • To resolve the name of a host such as: athena.cs.mit.edu means to find the IP address of this host.

IP addresses • IP address serves two functions: • host identification and • location addressing. • All communication in the Internet must use the IP protocol. The IP addresses are used by the IP protocol to route messages from source to the destination through the Internet • IPv4  • uses 32-bit addresses; the address space is limited to 4,294,967,296 (232) possible unique addresses. • addresses for special purposes: private networks (~18 million addresses); multicast addresses (~270 million addresses). • addresses represented in dot-decimal notation e.g., 218.96.17.12). • IPv6  • uses 64-bit addresses; the address space is limited to 264 possible unique addresses. • No ‘’flag day”

Strategies for name resolution • Distribute to all parties a copy of the directory mapping names to physical /logical addresses. The strategy does not scale well: • when the population is very large, e.g., the directory size is very large and the network traffic to distribute it would be horrendous • the number of updates is proportional to the population and would add to the traffic • Central directory  easy to update but it does not scale well, “hot spot” contention. • Distributed directory  more sophisticated to implement but used successfully for DNS 34

Domain Name System Domain Name System (DNS  general-purpose name management system Hierarchically structured Maps user-friendly host names to IP addresses Domain Name Service (DNS) A database editor generates tables of bindings and these bindings and then these tables are distributed to DNS servers Propagation takes time, hours. Supports both relative and absolute paths DNS architecture  a hierarchical distributed database and an associated set of protocols that define: A mechanism for querying and updating the database. A mechanism for replicating the information in the database among servers. A schema of the database. DNS has a referral architecture somewhat complicated due to need to optimize. . 35

DNS Dictionary • Domain name an identification label that defines a realm of administrative autonomy, authority, or control in the Internet, based on the Domain Name System. The top-level domains (TLDs) are the highest level of domain names of the Internet; they form the DNS root zone. There are 20 generic top-level domains and 248 country code top-level domains • Authoritative name server gives original, first-hand, definitive answers; holds either the name record or a referral record for the name • Authoritative record first hand information about a host name • Naming authority  an Internet administrative authority allowed to add authoritative records to a name server • Referral record  binds a hierarchical region of the DNS name space to another server that could help resolve the name • Recursive name service a DNS server takes upon itself to resolve a name rather than provide a referral record. • Idempotent action action that can be interrupted and restarted from the beginning any number of times and still produce the same result as if the action had run to completion without interruption

How DNS works • A client sends a request to resolve a name to a Domain Name server • The server examines the collection of the domains it is responsible for • If it finds the name record it returns the record • Else it searches a set of referral records • Starts with the most significant component of the requested domain name for the one that matches the most components and • If found it returns the name record • Else returns “not found” • Example on the next slide (left diagram): the system ginger.cs.pedantic.edu tries to resolve the name ginger.Scholarly.edu • Important  each host must have the address of a domain name server when it is connected to the Internet. This address could be : • provided by the ISP (Internet Service Provider) • hardwired into the browser • generated when the system was installed • selected by the user 37

The virtues of DNS • Distributed responsibility  any DNS name server may act as a naming authority and • add authoritative records (see example on the previous slide, the right diagram) • create lower-level naming domains; e.g., UCF can create EECS, EECS can create ComputingFrontiers, etc. • Robustness • High level of replication of the name servers • There are some 80 replicas of the root name server • Each organization with a name server has 2-4 replicas • Stateless name servers  does not maintain any state, its public interface is idempotent • A DNS server is a dedicated computer running a relatively simple code, thus less likely to fail

More virtues and some failings of DNS • Flexibility  • The same name may be bound to several IP addresses. Needed to • ensure replication of services • improve performance  see for example the content delivery services provided by akamai • Allows synonyms • a computer may appear to be in two different domains • Indirect names • Lack of authentication  DNS does not use protocols to authenticate the response to a DNS request. One can impersonate a DNS server and provide a fake response. • Does not guarantee accuracy a DNS cache may hold obsolite information

COP 5611 Operating Systems Spring 2010