Servers and Threads

Servers and Threads Jeff Chase Duke University

Processes and threads stack main thread virtual address space other threads (optional) +… + STOP Each process has a virtual address space (VAS): a private name space for the virtual memory it uses. The VAS is both a “sandbox” and a “lockbox”: it limits what the process can see/do, and protects its data from others. wait From now on, we suppose that a process could have additional threads. We are not concerned with how to implement them, but we presume that they can all make system calls and block independently. Each process has a thread bound to the VAS, with stacks (user and kernel). If we say a process does something, we really mean its thread does it. The kernel can suspend/restart the thread wherever and whenever it wants.

Threads: a familiar metaphor 1 2 3 Page links and back button navigate a “stack” of pages in each tab. Each tab has its own stack. One tab is active at any given time. You create/destroy tabs as needed. You switch between tabs at your whim. Similarly, each thread has a separate stack. The OS switches between threads at its whim. One thread is active per CPU core at any given time. time 

Threads • A thread is a stream of control. • defined by CPU register context (PC, SP, …) • Note: process “context” is thread context plus protected registers defining current VAS, e.g., ASID or “page table base register(s)”. • Generally “context” is the register values and referenced memory state (stack, page tables) • Multiple threads can execute independently: • They can run in parallel on multiple CPUs... • physical concurrency • …or arbitrarily interleaved on a single CPU. • logical concurrency • Each thread must have its own stack.

Two threads sharing a CPU concept reality context switch

Two threads: closer look stack stack address space “on deck” and ready to run 0 common runtime program x code library running thread data R0 CPU (core) Rn y x PC y SP registers high

Thread context switch stack switch out switch in address space 0 common runtime program x code library data R0 1. save registers CPU (core) Rn y x PC y SP registers 2. load registers stack high

Thread states and transitions exit exited running EXIT STOP The kernel process/thread scheduler governs these transitions. wait sleep blocked ready wakeup wait, STOP, read, write, listen, receive, etc. Sleep and wakeup are internal primitives. Wakeup adds a thread to the scheduler’s ready pool: a set of threads in the ready state.

CPU Scheduling 101 The OS scheduler makes a sequence of “moves”. • Next move: if a CPU core is idle, pick a ready thread t from the ready pool and dispatch it (run it). • Scheduler’s choice is “nondeterministic” • Scheduler’s choice determines interleaving of execution blocked threads If timer expires, or wait/yield/terminate ready pool Wakeup GetNextToRun SWITCH()

Event-driven programming • Some of the goals of threads can be met by using an event-driven programming model. • An event-driven program executes a sequence of events. The program consists of a set of handlers for those events. • e.g., Unix signals • The program executes sequentially (no concurrency). But the interleaving of handler executions is determined by the event order. • Pure event-driven programming can simplify management of inherently concurrent activities. • E.g., I/O, user interaction, children, client requests • Some of these needs can be met using either threads or event-driven programming. But often we need both.

Event-driven programming vs. threads • Often we can choose among event-driven or threaded structures. • So it has been common for academics and developers to argue the relative merits of “event-driven programming vs. threads”. • But they are not mutually exclusive. • Anyway, we need both: to get real parallelism on real systems (e.g., multicore), we need some kind of threads underneath anyway. • We often use event-driven programming built above threads and/or combined with threads in a hybrid model. • For example, each thread may be event-driven, or multiple threads may rendezvous on a shared event queue. • We illustrate the continuum by looking first at Android and then at concurrency management in servers (e.g., the Apache Web server).

Android app: main event loop • The main thread of an Android app is called the Activity Thread. • It receives a sequence of events and invokes their handlers. • Also called the “UI thread” because it receives all User Interface events. • screen taps, clicks, swipes, etc. • All UI calls must be made by the UI thread: the UI lib is not thread-safe. • MS-Windows apps are similar. • The UI thread must not block! • If it blocks, then the app becomes unresponsive to user input: bad. 1 2 3

Android event loop: a closer look • The main thread delivers UI events and intents to Activity components. • It also delivers events (broadcast intents) to Receiver components. • Handlers defined for these components must not block. • The handlers execute serially in event arrival order. • Note: Service and ContentProvider components receive invocations from other apps (i.e., they are servers). • These invocations run on different threads…more on that later. main event loop Activity Activity UI clicks and intents Receiver Dispatch events by invoking component-defined handlers.

Event-driven programming • This “design pattern” is called event-driven (event-based) programming. • In its pure form the thread never blocks, except to wait for the next event, whatever it is. • We can think of the program as a set of handlers: the system upcalls a handler to dispatch each event. • Note: here we are using the term “event” to refer to any notification: • arriving input • asynchronous I/O completion • subscribed events • child stop/exit, “signals”, etc. events Dispatch events by invoking handlers (upcalls).

Android event classes: some details • Android defines a set of classes for event-driven programming in conjunction with threads. • A thread may have at most one Looper bound to a MessageQueue. • Each Looper has exactly one thread and exactly one MessageQueue. • The Looper has an interface to register Handlers. • There may be any number of Handlers registered per Looper. • These classes are used for the UI thread, but have other uses as well. Looper Message MessageQueue Handler [These Android details are provided for completeness.]

Android: adding services (simplified) main/UI thread binder thread pool main event loop Activity Service Activity Provider UI clicks and intents incoming binder messages Receiver Service

Pool of event-driven threads • Android Binder receives a sequence of events (intents) in each process. • They include incoming intents on provider and service components. • Handlers for these intents may block. Therefore the app lib uses a pool of threads to invoke the Handlers for these incoming events. • Many Android apps don’t have these kinds of components: those apps can use a simple event-driven programming model and don’t need to know about threads at all. • But apps having these component types use a different design pattern: pool of event-driven threads. • This pattern is also common in multi-threaded servers, which poll socket descriptors listening for new requests. Let’s take a look.

Multi-threaded RPC server [OpenGroup, late 1980s]

Ideal event poll API Poll() • Delivers: returns exactly one event (message or notification), in its entirety, ready for service (dispatch). • Idles: Blocks iffthere is no event ready for dispatch. • Consumes: returns each posted event at most once. • Combines: any of many kinds of events (a poll set) may be returned through a single call to poll. • Synchronizes: may be shared by multiple processes or threads ( handlers are thread-safe as well).

A look ahead • Various systems use various combinations of threaded/blocking and event-driven models. • Unix made some choices, and then more choices. • These choices failed for networked servers, which require effective concurrent handling of requests. • They failed because they violate each of the five properties for “ideal” event handling. • There is a large body of work addressing the resulting problems. Servers mostly work now. • More about server performance and Unix/Linux later. • The Android Binder model is closer to the ideal.

Classic Unix • Single-threaded processes • Blocking system calls • Synchronous I/O: calling process blocks until each I/O request is “complete”. • Each blocking call waits for only a single kind of a event on a single object. • Process or file descriptor (e.g., file or socket) • Add signals when that model does not work. • With sockets: add select system call to monitor I/O on sets of sockets or other file descriptors. • select was slow for large poll sets. Now we have various variants: poll, epoll, pollet, kqueue. None are ideal.

Inside your Web server Server operations create socket(s) bind to port number(s) listen to advertise port wait for client to arrive on port (select/poll/epoll of ports) accept client connection read or recv request write or send response close client socket Server application (Apache, Tomcat/Java, etc) accept queue packet queues listen queue disk queue

Accept loop while (1) { int acceptsock = accept(sock, NULL, NULL); char *input = (char *)malloc(1024*sizeof (char)); recv(acceptsock, input, 1024, 0); int is_html = 0; char *contents = handle(input,&is_html); free(input); …send response… close(acceptsock); } If a server is listening on only one port/socket (“listener”), then it can skip the select/poll/epoll.

Handling a request Accept Client Connection Read HTTP Request Header may block waiting on disk I/O may block waiting on network Find File Send HTTP Response Header Read File Send Data Want to be able to process requests concurrently.

Web server (serial process) • Option 1: could handle requests serially • Easy to program, but painfully slow (why?) WS Client 1 Client 2 R1 arrives Receive R1 Disk request 1a R2 arrives 1a completes R1 completes Receive R2

Web server (event-driven) • Option 2: use asynchronous I/O • Fast, but hard to program (why?) Disk WS Client 1 Client 2 R1 arrives Receive R1 Disk request 1a Start 1a R2 arrives Receive R2 Finish 1a 1a completes R1 completes

Web server (multi-process) • Option 3: assign one thread per request • Where is each request’s state stored? Client 1 WS1 Client 2 WS2 R1 arrives Receive R1 Disk request 1a R2 arrives Receive R2 1a completes R1 completes

Concurrency and pipelining CPU DISK Before NET CPU DISK After NET

Servers and Threads