470 likes | 566 Views
Computer Systems Principles Concurrency Patterns. Emery Berger and Mark Corner University of Massachusetts Amherst. Web Server. web server. Client (browser) Requests HTML, images Server Caches requests Sends to client. not found. http://server/Easter-bunny/ 200x100/75.jpg. client.
E N D
Computer Systems PrinciplesConcurrency Patterns Emery Berger and Mark Corner University of Massachusetts Amherst
Web Server webserver • Client (browser) • Requests HTML, images • Server • Caches requests • Sends to client not found http://server/Easter-bunny/200x100/75.jpg client
Possible Implementation while (true) { wait for connection; read from socket & parse URL; look up URL contents in cache; if (!in cache) { fetch from disk / execute CGI; put in cache; } send data to client; }
Possible Implementation while (true) { wait for connection; // net read from socket & parse URL; // cpu look up URL contents in cache; // cpu if (!in cache) { fetch from disk / execute CGI;//disk put in cache; // cpu } send data to client; // net }
Problem: Concurrency webserver • Sequential fine until: • More clients • Bigger server • Multicores, multiprocessors • Goals: • Hide latency of I/O • Don’t keep clients waiting • Improve throughput • Serve up more pages clients
Building Concurrent Apps • Patterns / Architectures • Thread pools • Producer-consumer • “Bag of tasks” • Worker threads (work stealing) • Goals: • Minimize latency • Maximize parallelism • Keep progs. simple to program & maintain
Thread Pools • Thread creation relatively expensive • Instead: use pool of threads • When new task arrives, get thread from pool to work on it; block if pool empty • Faster with many tasks • Limits max threads (thus resources) • ( ThreadPoolExecutor class in Java)
Producer-Consumer • Can get pipeline parallelism: • One thread (producer) does work • E.g., I/O • and hands it off to other thread (consumer) producer consumer
Producer-Consumer • Can get pipeline parallelism: • One thread (producer) does work • E.g., I/O • and hands it off to other thread (consumer) producer consumer
Producer-Consumer • Can get pipeline parallelism: • One thread (producer) does work • E.g., I/O • and hands it off to other thread (consumer) producer consumer
Producer-Consumer • Can get pipeline parallelism: • One thread (producer) does work • E.g., I/O • and hands it off to other thread (consumer) producer consumer
Producer-Consumer • Can get pipeline parallelism: • One thread (producer) does work • E.g., I/O • and hands it off to other thread (consumer) producer consumer LinkedBlockingQueueBlocks on put() if full, poll() if empty
Producer-Consumer Web Server • Use 2 threads: producer & consumer • queue.put(x) and x = queue.poll(); while (true) { wait for connection; read from socket & parse URL; look up URL contents in cache; if (!in cache) { fetch from disk / execute CGI; put in cache; } send data to client; } while (true) { do something… queue.put (x); } while (true) { x = queue.poll(); do something… }
Producer-Consumer Web Server • Pair of threads – one reads, one writes while (true) { wait for connection; read from socket & parse URL; queue.put (URL); } while (true) { URL = queue.poll(); look up URL contents in cache; if (!in cache) { fetch from disk / execute CGI; put in cache; } send data to client; }
Producer-Consumer Web Server • More parallelism –optimizes common case (cache hit) while (true) { wait for connection; read from socket & parse URL; queue1.put (URL); } while (true) { URL = queue1.poll(); look up URL contents in cache; if (!in cache) { queue2.put (URL); return; } send data to client; } 1 2 while (true) { URL = queue2.poll(); fetch from disk / execute CGI; put in cache; send data to client; }
When to Use Producer-Consumer • Works well for pairs of threads • Best if producer & consumer are symmetric • Proceed roughly at same rate • Order of operations matters • Not as good for • Many threads • Order doesn’t matter • Different rates of progress
Producer-Consumer Web Server • Should balance load across threads while (true) { wait for connection; read from socket & parse URL; queue1.put (URL); } while (true) { URL = queue1.poll(); look up URL contents in cache; if (!in cache) { queue2.put (URL); } send data to client; } 1 2 while (true) { URL = queue2.poll(); fetch from disk / execute CGI; put in cache; send data to client; }
Bag of Tasks • Collection of mostly independent tasks worker worker worker worker
Bag of Tasks • Collection of mostly independent tasks worker worker worker worker
Bag of Tasks • Collection of mostly independent tasks worker worker worker worker
Bag of Tasks • Collection of mostly independent tasks worker worker worker worker
Bag of Tasks • Collection of mostly independent tasks worker worker worker worker
Bag of Tasks • Collection of mostly independent tasks worker worker worker worker
Bag of Tasks • Collection of mostly independent tasks • Bag could also be LinkedBlockingQueue(put, poll) addWork worker worker worker worker
Exercise: Restructure into BOT • Re-structure this into bag of tasks: • addWork & worker threads • t = bag.poll() or bag.put(t) while (true) { wait for connection; read from socket & parse URL; look up URL contents in cache; if (!in cache) { fetch from disk / execute CGI; put in cache; } send data to client; }
Exercise: Restructure into BOT • Re-structure this into bag of tasks: • addWork & worker • t = bag.poll() or bag.put(t) addWork: while (true) { wait for connection; t.URL = URL; t.sock = socket; bag.put (t); } Worker: while (true) { t = bag.poll(); look up t.URL contents in cache; if (!in cache) { fetch from disk / execute CGI; put in cache; } send data to client via t.sock; }
Bag of Tasks Web Server • Re-structure this into bag of tasks: • t = bag.poll() or bag.put(t) addWork: while (true){ wait for connection; bag.put (URL); } worker addWork worker: while (true) { URL = bag.poll(); look up URL contents in cache; if (!in cache) { fetch from disk / execute CGI; put in cache; } send data to client; } worker worker
Bag of Tasks vs. Prod/Consumer • Exploits more parallelism • Even with coarse-grained threads • Don’t have to break up tasks too finely • What does task size affect? • possibly latency… smaller might be better • Easy to change or add new functionality • But: one major performance problem…
What’s the Problem? addWork worker worker worker worker
What’s the Problem? • Contention – single lock on structure • Bottleneck to scalability addWork worker worker worker worker
Work Queues • Each thread has own work queue (deque) • No single point of contention • Threads now generic “executors” • Tasks (balls): blue = parse, yellow = connect… executor executor executor executor
Work Queues • Each thread has own work queue (deque) • No single point of contention executor executor executor executor
Work Queues • Each thread has own work queue (deque) • No single point of contention executor executor executor executor
Work Queues • Each thread has own work queue (deque) • No single point of contention executor executor executor executor
Work Queues • Each thread has own work queue • No single point of contention • Now what? executor executor executor executor
Work Stealing • When thread runs out of work,steal work from random other thread worker worker worker worker
Work Stealing • When thread runs out of work,steal work from top of random deque • Optimal load balancing algorithm worker worker worker worker
Work Stealing Web Server • Re-structure:readURL, lookUp, addToCache, output • myQueue.put(new readURL (url)) while (true) { wait for connection; read from socket & parse URL; look up URL contents in cache; if (!in cache) { fetch from disk / execute CGI; put in cache; } send data to client; }
while (true) { wait for connection; read from socket & parse URL; look up URL contents in cache; if (!in cache) { fetch from disk / execute CGI; put in cache; } send data to client; } readURL, lookUp, addToCache, output class Work {public: virtual void run();}; class readURL : public Work {public: void run() {…} readURL (socket s) { …}};
readURL lookUp addToCache worker output
class readURL {public: void run() { read from socket, f = get file myQueue.put (new lookUp(_s, f)); } readURL(socket s) { _s = s; }};
class lookUp {public: void run() { look in cache for file “f” if (!found) myQueue.put (new addToCache(_f)); else myQueue.put (new Output(s, cont)); } lookUp (socket s, string f) { _s = s; _f = f; }};
class addToCache {public: void run() { fetch file f from disk into cont add file to cache (hashmap) myQueue.put (new Output(s, cont)); }
Work Stealing Web Server • Re-structure:readURL, lookUp, addToCache, output • myQueue.put(new readURL (url)) readURL(url) { wait for connection; read from socket & parse URL; myQueue.put (new lookUp (URL)); }
Work Stealing Web Server • Re-structure:readURL, lookUp, addToCache, output • myQueue.put(new readURL (url)) readURL(url) { wait for connection; read from socket & parse URL; myQueue.put (new lookUp (URL)); } lookUp(url) { look up URL contents in cache; if (!in cache) { myQueue.put (new addToCache (URL)); } else { myQueue.put (new output(contents)); } } addToCache(URL) { fetch from disk / execute CGI; put in cache; myQueue.put (new output(contents)); }
Work Stealing • Works great for heterogeneous tasks • Convert addWork and worker into units of work (different colors) • Flexible: can easily re-define tasks • Coarse, fine-grained, anything in-between • Automatic load balancing • Separates thread logic from functionality • Popular model for structuring servers