180 likes | 270 Views
Combining Events and Threads for Scalable Network Services. Peng Li and Steve Zdancewic University of Pennsylvania PLDI 2007, San Diego. A lazy, purely functional programming language http://www.haskell.org. Overview.
E N D
Combining Events and Threads for Scalable Network Services Peng Li and Steve Zdancewic University of Pennsylvania PLDI 2007, San Diego
A lazy, purely functional programming language http://www.haskell.org Overview • A Haskell framework for massively concurrent network applications • Servers, P2P systems, load generators • Massive concurrency ::= 1,000 threads? (easy) | 10,000 threads? (common) |100,000 threads? (challenging) |1,000,000 threads? (20 years later?) |10,000,000 threads? (in 15 minutes) • How to write such programs? • The very first decision to make: the programming model Shall we use threads or events?
The multithreaded model One thread ↔ one client Synchronous I/O Scheduling: OS/runtime libs int send_data(int fd1, int fd2) { while (!EOF(fd1)) { size = read_chunk(fd, buf, count); write_chunk(fd, buf, size); } … The event-driven model: One thread ↔ 10000 clients Asynchronous I/O Scheduling: programmer while(1) { nfds=epoll_wait(kdpfd, events, MAXEVT,-1); for(n=0; n<nfds; ++n) handle_event(events[n]); … “Why events are a bad idea (for high-concurrency servers)” [HotOS 2003] “Why threads are a bad idea (for most purposes)” [USENIX ATC 1999] Threads vs. Events
Can we get the best of both worlds? One application program • Programming with each client: threads • Synchronous I/O • Intuitive control-flow primitives The bridge between threads/events? (some kind of “continuation” support) • Resource scheduling: events • Written as part of the application • Tailored to application’s needs
Roads to lightweight, application-level concurrency • Direct language support for continuations: • Good if you have them • Source-to-source CPS translations • Requires hacking on compiler/runtime • Often not very elegant • Other solutions? • (no language support) • (no compiler/runtime hacks)
The poor man’s concurrency monad • “A poor man’s concurrency monad” by Koen Claessen, JFP 1999. (Functional Pearl) • The thread interface: • The CPS monad • The event interface: • A lazy, tree-like data structure called “trace” SYS_NBIO(write_nb)
Questions on the poor man’s approach Does it work for high-performance network services? (using a pure, lazy, functional language?) • How does the design scale up to real systems? • Symmetrical multiprocessing? Synchronization? I/O? • How cheap is it? • How much does a poor man’s thread cost? • How poor is it? • Does it offer acceptable performance?
Our experiment Ahigh-performance Haskell framework for massively-concurrent network services!!! • Supported features: • Linux Asynchronous IO (AIO) • epoll() and nonblocking IO • OS thread pools • SMP support • Thread synchronization primitives • Applications developed • IO benchmarks on FIFO pipes / Disk head scheduling • A simple web server for static files • HTTP load generator • Prototype of an application-level TCP stack We used the Glasglow Haskell Compiler (GHC)
Multithreaded code example Nested function calls Exception handling Conditional branches Synchronous call to I/O lib Recursion
Event-driven code example A wrapper function to the C library call using the Haskell Foreign Function Interface (FFI) An event loop running in a separate OS thread Put events in queues for processing in other OS threads
A complete event-driven I/O subsystem One “virtual processor” event loop for each CPU Haskell Foreign Function Inteface (FFI) Each event loop runs in a separate OS thread
Modular and customizable I/O system (add a TCP stack if you like) Define / interpret TCP syscalls (22 lines) Event loop for incoming packets (7 lines) Event loop for timers (9 lines)
How cheap is a poor man’s thread? 48 bytes • Minimal memory consumption: 48 bytes • Each thread just loops and does nothing • Actual size determined by thread-local states • Even an ethernet packet can be >1,000 bytes… • Pay as you go --- only pay for things needed In contrast: • A Linux POSIX thread’s stack has 2MB by default • The state-of-the-art user-level thread system (Capriccio) use at least a few KBs for each thread Observation: The poor man’s thread is extremely memory-efficient (Challenging most event-driven systems)
I/O scalability test • Comparison against the Linux POSIX Thread Library (NPTL) • Highly optimized OS thread implementation • Each NPTL thread’s stack limited to 32KB • Mini-benchmarks used: • Disk head scheduling (all threads running) • FIFO pipe scalability with idle threads (128 threads running)
How poor is the poor man’s monad? • Not too shabby • Benchmarks shows comparable(if not higher) performance to existing, optimized systems • An elegant design is more important than 10% performance improvement • Added benefit: type safety for many dangerous things • Continuations, thread queues, schedulers, asynchronous I/O
Related Work • We are motivated by two projects: • Twisted: the python event-driven framework for scalable internet applications - The programmer must write code in CPS • Capriccio: a high-performance user-level thread system for network servers - Requires C compiler hacks - Difficult to customize (e.g. adding SMP support) • Continuation-based concurrency • [Wand 80], [Shivers 97], … • Other languages and programming models: • CML, Erlang, …
Conclusion • Haskell and The Poor Man’s Concurrency Monad are a promising solution for high-performance, massively-concurrent networking applications: Get the best of both threads and events! • This poor man’s approach is actually verycheap, and not so poor! http://www.cis.upenn.edu/~lipeng/homepage/unify.html