EEC-681/781 Distributed Computing Systems

EEC-681/781Distributed Computing Systems Lecture 8 Wenbing Zhao wenbing@ieee.org Cleveland State University

Outline • Midterm#1 results • Processes and threads • Clients and Servers EEC-681: Distributed Computing Systems

Midterm#1 Results • Max: 98 • Min: 72 • Mean: 89 • P1 mean: 37/40 • P2 mean: 34/40 • P3 mean: 18/20 EEC-681: Distributed Computing Systems

Process • Communication takes place between processes • Process is a program in execution • For an OS, process management and scheduling are most important • For distributed systems, other issues are equally or more important • Multithreading • Client-Server organization • Code migration • Software agent EEC-681: Distributed Computing Systems

Process • An operating system creates a number of virtual processors, each one for running a different program • To keep track of these virtual processors, OS maintains a process table • CPU register values, memory maps, open files, accounting info, privileges, etc. EEC-681: Distributed Computing Systems

Process • OS ensures concurrency transparency for different processes that share the same CPU and other hardware resources • Each process has its own address space • Switch CPU between two processes is expensive • CPU context, modify registers for memory management unit (MMU), invalidate address translation caches such as in the translation lookaside buffer (TLB) EEC-681: Distributed Computing Systems

Motivation to Use a Finer Granularity • It is hard to program a single threaded process for efficient distributed computing • Difficult to use non-blocking system calls • Could have used a pool of processes, but • Creation/deletion of a process is expensive • Inter-process communication (IPC) is expensive EEC-681: Distributed Computing Systems

Introduction to Threads • Thread: A minimal software processor in whose contexta series of instructions can be executed • Saving a thread context implies stopping the current execution and saving all the data needed to continue the execution at a later stage • A process can haveone or more threads • Threads share the same address space. => Thread context switching can be done entirely independent of the operating system EEC-681: Distributed Computing Systems

Context Switching • Creating and destroying threads is much cheaper than doing so for processes • Process switching is generally more expensive as it involves getting the OS in the loop, i.e., trapping to the kernel EEC-681: Distributed Computing Systems

Threads and Distributed Systems • Multithreaded clients: • Hiding network latency • Multithreaded servers: • Improved performance and • Better structure EEC-681: Distributed Computing Systems

Multithreaded Clients • Multithreaded clients: hiding network latency • Multithreaded Web client: • Web browser scans an incoming HTML page, and finds that more files need to be fetched • Each file is fetched by a separate thread, each doing a (blocking) HTTP request • As files come in, the browser displays them EEC-681: Distributed Computing Systems

Multithreaded Servers • Improve performance: • Starting a thread to handle an incoming request is much cheaper than starting a new process • Multi-threaded server can scale well to a multiprocessor system • Hide network latency by reacting to next request while previous one is being replied EEC-681: Distributed Computing Systems

Multithreaded Servers • Better server structure: • Using simple, well-understood blocking calls simplifies the overall structure • Multithreaded programs can be smaller and easier to understand due to simplified flow of control EEC-681: Distributed Computing Systems

Multithreaded Servers • Dispatcher/worker model • Thread-per-object • Thread-per-request • Thread-per-client • Thread pool EEC-681: Distributed Computing Systems

Multithreaded Servers • Three ways to construct a server: EEC-681: Distributed Computing Systems

Client-Side Software • User interface • X-window system • Model-View-Controller Pattern • Providing distribution transparency EEC-681: Distributed Computing Systems

The X-Window System EEC-681: Distributed Computing Systems

The X-Window System • X distinguishes two types of applications • Normal application • Can request creation of a window • Mouse and keystroke events are captured when a window is active • X windows manager • Given special permission to manipulate the entire screen • Determines the look and feel • X applications and X kernel interacts through an X protocol • Supports Unix and TCP/IP sockets • X terminals EEC-681: Distributed Computing Systems

Model-View-Controller Pattern • Invented in a Smalltalk context for decoupling the graphical interface of an application from the code that actually does the work • MVC was originally developed to map the traditional input, processing, output roles into the GUI realm: Input --> Processing --> Output Controller --> Model --> View EEC-681: Distributed Computing Systems

Model-View-Controller • Model - manages one or more data elements, responds to queries about its state, and responds to instructions to change state • View - responsible for mapping graphics onto a device. Multiple views might be attached to the same model • Controller - responsible for mapping end-user action to application response EEC-681: Distributed Computing Systems

Model-View-Controller EEC-681: Distributed Computing Systems

Client-Side Software: Providing Distribution Transparency • Access transparency: client-side stubs for RPCs and RMIs • Location/migration transparency: let client-side software keep track of actual location • Replication transparency: multiple invocations handled by client stub • Failure transparency: mask server and communication failures EEC-681: Distributed Computing Systems

Server-Side Software • Basic model: A server is a process that waits for incoming service requests at a specific transport address • A server typically listens on a well-known port: ftp-data 20 File Transfer [Default Data] ftp 21 File Transfer [Control] ssh 22 Secure Shell telnet 23 Telnet smtp 25 Simple Mail Transfer EEC-681: Distributed Computing Systems

Server-Side Software • Superservers: Servers that listen to several ports, i.e., provide several independent services • When a service request comes in, they start a subprocess to handle the request EEC-681: Distributed Computing Systems

Server-Side Software • Iterative vs. concurrent servers: • Iterative servers can handle only one client at a time • Concurrent servers can handle multiple clients at the same time EEC-681: Distributed Computing Systems

Servers and State • Stateless servers: Never keep accurate information about the status of a client after having handled a request • Consequences: • Clients and servers are completely independent • State inconsistencies due to client or server crashes are reduced • Possible loss of performance because, e.g., a server cannot anticipate client behavior • Question: Does connection-oriented communication fit into a stateless design? EEC-681: Distributed Computing Systems

Servers and State • Stateful servers: Keeps track of the status of its clients • Record that a file has been opened, so that prefetching can be done • Knows which data a client has cached, and allows clients to keep local copies of shared data • The performance of stateful servers can be extremely high(from a particular client’s point of view) • Drawback • Crash recovery a lot more challenging • Less scalable EEC-681: Distributed Computing Systems

Practical Implementation of Servers • Servers need to maintain clients state • Where to store such state? Database systems • Solution – three-tier architecture • Application servers interface directly to clients and execute according to business logic • Data (state) is stored in the data access tier so that the application servers can be made stateless • E.g., Web-page personalization using cookies EEC-681: Distributed Computing Systems

Reasons for Migrating Code • Load balancing • Migrate processes from heavy loaded machine to light loaded machines • Minimize communication • Move code from client to server • Move code from server to client • Parallel execution • Web crawlers • Flexibility • Dynamically configure distributed systems EEC-681: Distributed Computing Systems

Strong and Weak Mobility • Process components: • Code segment: set of instructions that make up the program • Resource/data segment: contains references to external resources needed by the process, such as files, devices, other processes • Execution segment: contains the current execution state of a process such as private data, stack, program counter EEC-681: Distributed Computing Systems

Strong and Weak Mobility • Weak mobility: Move only code and data segment (and start execution from the beginning) after migration • Strong mobility: Move component, including execution state • Migration: move entire process from one machine to another • Cloning: start a clone, and set it in the same execution state EEC-681: Distributed Computing Systems

Process-to-Resource Binding • By identifier: the process requires a specific instance of a resource • A specific web page or a remote file • local communication endpoint • By value: the process requires the value of a resource • Shared library • Memory • By type: the process requires that only a type of resource is available • A color monitor • A printer EEC-681: Distributed Computing Systems

Resource-to-Machine Binding • Fixed: the resource cannot be migrated • Local devices • local communication endpoints • Fastened: the resource can, in principle, be migrated but only at high cost • Local databases • complete web site • Unattached: the resource can easily be moved along with the process • A cache • Files EEC-681: Distributed Computing Systems

Migration in Heterogeneous Systems • Challenges: • The target machine may not be suitable to execute the migrated code • The definition of process/thread/processor context is highly dependent on local hardware, operating system and runtime system • Solution: Make use of an abstract machine that is implemented on different platforms • Interpreted languages running on a virtual machine (Java/JVM; scripting languages) • Virtual machine EEC-681: Distributed Computing Systems

What’s an Agent? • Anagentis an autonomous process capable of reacting to, and initiating changes in its environment, possibly in collaboration with users and other agents • collaborative agent: collaborate with others in a multiagent system • mobile agent: can move between machines • interface agent: assist users at user-interface level • information agent: manage information from physically different sources EEC-681: Distributed Computing Systems

Agent Technology • The general model of an agent platform Intra-platform communication Management: Keeps track of where the agents on this platform are (mapping agent ID to port) Directory: Mapping of agent names & attributes to agent IDs ACC: Agent Communication Channel, used to communicate with other platforms EEC-681: Distributed Computing Systems

EEC-681/781 Distributed Computing Systems