580 likes | 674 Views
Processes. Introduction to Distributed Systems. What Is A Process?. Broadly speaking, what is a process? Some notion of execution. Something that has a sequence of instructions, and changes the state of the system as a result of those instructions. Some notion of containment.
E N D
Processes Introduction to Distributed Systems Preliminary Version, Not Final
What Is A Process? • Broadly speaking, what is a process? • Some notion of execution. • Something that has a sequence of instructions, and changes the state of the system as a result of those instructions. • Some notion of containment. • A process is usually thought of as “owning” certain resources. • A process has a set of instructions (the “program”). • “A program in execution”. • What is a thread then? • Has a minimal execution context, but everything else belongs to the containing process. Preliminary Version, Not Final
Context Switching • Switch execution between caller and calle: The minimal collection of values stored in the registers of a processor. • Switch execution between threads: The minimal collection of values stored in registers plus maybe some thread state. • Switch between processes: Thread context plus address space values. Preliminary Version, Not Final
Observation 1: Threads share the same address space. Do we need OS involvement? • Observation 2: Process switching is generally more expensive. OS is involved. (Why?) • Observation 3: Creating and destroying threads is cheaper than doing it for a process. Preliminary Version, Not Final
S1 S3 S2 Context Switching in Large Apps • A large app is commonly a set of cooperating processes, communicating via IPC, which is expensive. S1: Switch from user to kernel. S2: Switch from context A to context B. S3: Switch from kernel to user. Books suggest/implies that IPC requires a context switch. Does IPC always require a context switch? Preliminary Version, Not Final
Threads Preliminary Version, Not Final
Introduction to Threads • Why use threads? • Do multiple things at once, like maintain responsiveness. • Easier to structure • Do we have to use threads? Why not processes? How do you do multiple things at once? • Why prefer processes to threads? • Take advantage of multicores Preliminary Version, Not Final
Threads and OS • Issue: Should an OS kernel provide threads or should they be implemented as part of a user-level package? • User-space: • Nothing to do with the kernel. Can be very efficient. • But everything done by a thread affects the whole process. So what happens when a thread blocks on a syscall? • Can we use multiple CPUs/cores? Preliminary Version, Not Final
Kernel solution: Kernel implements threads. Everything is system call. • Operations that block a thread are no longer a problem. Kernel schedules another. • External events are simple. Kernel unblocks the thread. • Less efficient. • Multicore works. • Thread pools. Preliminary Version, Not Final
Hybrid (Solaris) • Use two levels. Multiplex user threads on top of LWPs (kernel threads). Userspace Thread Kernelspace Lightweightprocess Preliminary Version, Not Final
When a user-level thread does a syscall, the LWP blocks. Thread is bound to LWP. • Kernel schedules another LWP. • Context switches can occur at the user-level. • When no threads to schedule, a LWP may be removed. • So, how well does this work in practice? Preliminary Version, Not Final
Threads and Distributed Systems • Multithreaded clients: Main issue is hiding network latency. • Multithreaded web client: • Browser scans HTML, and finds more files that need to be fetched. • Each file is fetched by a separate thread, each issuing an HTTP request. • As files come in, the browser displays them. • Multiple request-response calls to other machines (RPC): • Client issues several calls, each one by a different thread. • Waits till all return. • If calls are to different servers, will have significant speedup. Preliminary Version, Not Final
Fetching Images • Suppose there are ten images in a page. How should they be fetched? • Sequentially • fetch_sequential() { for (int i = 0; i < 10; i++) { int sockfd = ...; write(sockfd, "HTTP GET ..."); n = read_till_socket_closed(sockfd, jpeg[i], 100K); }} • Concurrently • fetch_concurrent() { int thread_ids[10]; for (int i = 0; i < 10; i++) { thread_ids[i] = start_read_thread(urls[i], jpeg[i]); } for (int i = 0; i < 10; i++) { wait_for_thread(thread_ids[i]); }} • Which is faster? Preliminary Version, Not Final
FSM • Can also multiplex multiple requests on a single thread. • Read any available input. • Process the input chunk. • Save state of that request. • Loop. Preliminary Version, Not Final
Multithreaded servers: Main issue is performance (throughput) and structure. • Improve performance: • Starting a thread to handle an incoming request is cheaper than starting a process. • Single-threaded server can’t take advantage of multiprocessor. • Hide network latency. Other work can be done while a request is coming in. • Better structure: • Using simple blocking I/O calls is easier. • Multithreaded programs tend to be simpler. • This is a controversial area. Preliminary Version, Not Final
Multithreaded Servers • A multithreaded server organized in a dispatcher/worker model. Preliminary Version, Not Final
Multithreaded Servers • Three ways to construct a server. Preliminary Version, Not Final
Virtualization Preliminary Version, Not Final
Describing Systems • Often you need to describe a system very precisely, so that someone can determine, with a high degree of accuracy: • Whether or not the system satisfies the requirements. • How to interface hardware or software to the system. • How do you do it? How do you organize your description? Preliminary Version, Not Final
First, put a black box around your system. • Think in terms of operations. What can you do to your system? Operations can be categorized into: • Those that change the outcome of later operations. They modify the system in some way. • Those that do not. These operations are only used to observe the system. • Based on the operations, figure out what is "observable". • These things that are observable will imply to you some kind of "state". • Note that this state may or may not have a close resemblance to your real state. It could be an abstraction. Preliminary Version, Not Final
Okay, so now, you are almost there. • You first define your system state. Maybe you have some registers, some memory, some files, database, etc. • Then you define your operations. For each operation, you define how it changes the state. • Voila, you are done! • The fancy term for this is “operational semantics”. • You specify your system in terms of state, and a set of operations. For each operation, you describe how it changes the state. Preliminary Version, Not Final
Intuition About Virtualization • Make something look like something else. • Make it look like there is more than one of a particular thing. Preliminary Version, Not Final
Virtualization • Originally developed by IBM. • Virtualization is increasingly important. • Ease of portability. • Isolation of failing or attacked components. • Ease of running different configurations, versions, etc. • Replicate whole web site to edge server. • Below, using the word "interface" in the general sense. General organization between a program, interface, and system. General organization of virtualizing system A on top of system B. Preliminary Version, Not Final
Architecture of VMs • Computer systems offer different levels of interfaces. • Interface between hardware and software, non-privileged. • Interface between hardware and software, privileged. • System calls. • Libraries. • Virtualization can take place at all of these. Preliminary Version, Not Final
Two kinds of VMs. • Process VM: A program is compiled to intermediate (portable) code, which is then executed by a runtime system. • VMM: A separate software layer mimics the instruction set of hardware: a complete OS and its apps can be supported. Preliminary Version, Not Final
Implementation • Let’s say we want to run a Solaris executable on a Windows machine. • What needs to be done? • Machine instructions are different. • while (not end of execution) { ins = read_instruction; switch (ins.opcode) { case STORE: mem[ins.address] = reg[ins.reg_num] break; …} • System calls. Preliminary Version, Not Final
Clients Preliminary Version, Not Final
Networked User Interfaces • There are two approaches to building a client. • For every application, create a client part and a server part. • Client runs on local machine, such as a PDA. • Create a reusable GUI toolkit that runs on the client. GUI can be directly manipulated by the server-side application code. • This is thin-client approach. Preliminary Version, Not Final
Thick client • The protocol is application specific. • For re-use, can be layered, but at the top, it is application-specific. Preliminary Version, Not Final
Thin-client Preliminary Version, Not Final
Example: The XWindow System • Protocol tends to be heavyweight. • Other examples of similar systems? • VNC • Remote desktop Preliminary Version, Not Final
Client-Side Software • Often tailored for distribution transparency. • Access transparency: client-side stubs for RPCs. • Location/migration transparency: Let client-side software keep track of actual location. • Replication transparency: Multiple invocations handled by client-side stub. • Failure transparency: Can often be placed only at client. Preliminary Version, Not Final
Transparent replication of a server using a client-side solution. Preliminary Version, Not Final
Servers Preliminary Version, Not Final
Servers: General Organization • Basic model: A server is a process that waits for incoming service requests at a specific end point (port). • Iterative vs. Concurrent • Which end point? • Either well-known. • Or registry. • Metaservers (superservers): Listen to multiple end points, then spawn the right server. Preliminary Version, Not Final
Binding Using Registry Preliminary Version, Not Final
Metaservers Preliminary Version, Not Final
Stateless Server • What does this mean? • A stateless server does not maintain state in the middle tier. • State is of course still maintained in the back-end (like your bank account). • Suppose you are doing something like buying something on Amazon, and there is a three-step procedure to buy. • What does it mean to do this statelessly? • How about statefully? • Session state vs. permanent state Preliminary Version, Not Final
Cookies • What are cookies? • Cookies and related things can serve two purposes: • They can be used to correlate the current client operation with a previous operation. • They can be used to store state. • For example, you could put exactly what you were buying, and what step you were in, in the checkout process. Preliminary Version, Not Final
Server Clusters • Servers can be organized into clusters, to improve performance. • Typical organization below, into three tiers. • 2 and 3 can be merged. Preliminary Version, Not Final
To maintain transparency, a switch needs to make it look like the client is talking to a single entity. • Whose IP address goes in the return packet? Preliminary Version, Not Final
Distributed Servers • We can be even more distributed. • But over a wide area network, the situation is too dynamic to use TCP handoff. • Instead, use Mobile IP. • Are the servers really moving around? • Mobile IP • A server has a home address (HoA), where it can always be contacted. • It leaves a care-of address (CoA), where it actually is. • Application still uses HoA. Preliminary Version, Not Final
Managing Server Clusters • Most common: do the same thing as usual. • Quite painful, if you have a 128 nodes. • Next step, provide a single management framework that will let you monitor the whole cluster, and distribute updates en masse. • Works for medium sized. What if you have a 5,000 nodes? • Need continuous repair, essentially autonomic computing. Preliminary Version, Not Final
Example: PlanetLab • The basic organization of a PlanetLab node • A set of Vservers, each on a different node, is a slice. Preliminary Version, Not Final
PlanetLab management issues: • Nodes belong to different organizations. • Each organization should be allowed to specify who is allowed to run applications on their nodes, • And restrict resource usage appropriately. • Monitoring tools available assume a very specific combination of hardware and software. • All tailored to be used within a single organization. • Programs from different slices but running on the same node should not interfere with each other. Preliminary Version, Not Final
Node manager • Separate vserver • Task: create other vservers and control resource allocation • No policy decisions • Resource specification (rspec) • Specifies a time interval during which a specific resource is available. • Identified via a 128-bit ID, the resource capability (rcap). • Given rcap, node manager can look up rspec locally. • Resources bound to slices. • Slice associated with service provider. • Slice ID’ed by (principal_id, slice_tag), which identifies the provider and the slice tag which is chosen by the provider. • Slice creation service (SCS) runs on node, receives creation requests from some slice authority. • SCS contacts node manager. Node manager cannot be contacted directly. (Separation of mechanism from policy.) • To create a slice, a service provider will contact a slice authority and ask it to create a slice. • Also have management authorities that monitor nodes, make sure running right software, etc. Preliminary Version, Not Final
Relationships between PlanetLab entities: • A node owner puts its node under the regime of a management authority, possibly restricting usage where appropriate. • A management authority provides the necessary software to add a node to PlanetLab. • A service provider registers itself with a management authority, trusting it to provide well-behaving nodes. • A service provider contacts a slice authority to create a slice on a collection of nodes. • The slice authority needs to authenticate the service provider. • A node owner provides a slice creation service for a slice authority to create slices. It essentially delegates resource management to the slice authority. • A management authority delegates the creation of slices to a slice authority. Managementauthority 3 2 4 Node owner 1 7 Service provider Slice authority 6 Preliminary Version, Not Final
Code Migration Preliminary Version, Not Final
Approaches • Why code migration? • Moving from heavily loaded to lightly loaded. • Also, to minimize communication costs. • Moving code to data, rather than data to code. • Late binding for a protocol. (Download it.) • Do you use it? Preliminary Version, Not Final