230 likes | 243 Views
Explore the complexities, challenges, and strategies in designing distributed operating systems. Dive into client/server architecture, performance analysis, asynchronous I/O, multiprocessors, and uniprocessors. Understand the coordination, scalability, and security aspects of modern operating systems. Enhance your system research skills and explore real-world applications. Taught by experienced lecturers from MIT, this course offers hands-on experience and in-depth knowledge to tackle the intricacies of distributed systems.
E N D
6.894: Distributed Operating System Engineering Lecturers: Frans Kaashoek (kaashoek@mit.edu) Robert Morris (rtm@lcs.mit.edu) TA: Jinyang Li (jinyang@lcs.mit.edu) www.pdos.lcs.mit.edu/6.894
Operating System • Software that turns silicon into something useful • Provides applications with a programming interface • Manages hardware resources on behalf of applications
Distributed Operating System • The holy grail: transparency • provide applications with a virtual machine consisting of many processors distributed around the network. • Distributed OS engineering is difficult: • Failures • High-degree of concurrency • Long latencies • New classes of security attacks
Client/Server Architecture • A modular architecture to structure distributed systems • Clients request services from servers • Client and servers communicate with messages • Servers are typically trusted • Other architectures • Peer-to-peer (decentralized) • Single address space
6.894 topics • Client-server components • Remote procedure call, threads, address spaces, etc. • Storage • File systems, transactions • Security • Confidentiality, authentication, etc. • Scalable servers
6.894 is an advanced 6.033 • Perform actual systems research • Perform a research project • Study recent research papers • Design systems for real workloads • New abstractions, protocols, datastructures, algorithms, etc. • Build a real system (lab) • Real enough that you can use it
Internet video-on-demand server • Example to study issues and overview 6.894 • Requirements: • Low and high-quality video • Many users, spread around the Internet • Last mile bandwidth may be low • Access control
Client() { fd = connect(“server”); write (fd, “video.mpg”); while (!eof(fd)) { read (fd, buf); display (buf); } } Server() { while (1) { cfd = accept(); read (cfd, name); fd = open (name); while (!eof(fd)) { read(fd, block); write (cfd, block); } close (cfd); close (fd); }} Client and server structure
Performance “analysis” • Server capacity: • Network (100 Mbit/s) • Disk (20 Mbyte/s) • Obtained performance: one client stream • Server is limited by software structure • If a video is 200 Kbit/s, server should be able to support more than one client.
Better single-server performance • Goal: run at server’s hardware speed • Disk or network should be bottleneck • Method: • Pipeline blocks of each request • Multiplex requests from multiple clients • Two implementation approaches: • Multithreaded server • Asynchronous I/O
server() { while (1) { cfd = accept(); read (cfd, name); fd = open (name); while (!eof(fd)) { read(fd, block); write (cfd, block); } close (cfd); close (fd); }} for (i = 0; i < 10; i++) fork (server); Multithreaded server • When waiting for I/O, thread scheduler runs another thread • All shared data must protected by locks • Release locks when blocking
struct callback { bool (*is_ready)(); void (*cb)(arg); void *arg; } main() { while (1) { for (c = each callback) { if (c->is_ready()) c->handler(c->arg); } } } Asynchronous I/O • Code is structured as a collection of handlers • Handlers are nonblocking • Create new handlers for blocking operations • When operation completes, call handler
init() { on_accept(accept_cb); } accept_cb() { on_readable(cfd,name_cb); } on_readable(fd, fn) { c = new callback(test_readable, fn, fd); add c to callback list; } name_cb(cfd) { read(cfd,name); fd = open(name); on_readable(fd, read_cb); } read_cb(cfd, fd) { read(fd, block); on_writeeable(fd, write_cb); } write_cb(cfd, fd) { write(cfd, block); on_readable(fd, read_cb); } Asychronous server
Hard to program Locking code Need to know what blocks Coordination explicit State stored on thread’s stack Memory allocation implicit Context switch may be expensive Multiprocessors Hard to program Callback code Need to know what blocks Coordination implicit State passed around explicitly Memory allocation explicit Lightweight context switch Uniprocessors Multithreaded vs. Async
Threaded server: Thread for network interface Interrupt wakes up network thread Protected (locks and conditional variables) shared buffer shared between server threads and network thread Asynchronous I/O Poll for packets How often to poll? Or, interrupt generates an event Be careful: disable interrupts when manipulating callback queue. Coordination example
Scheduling: polling vs. interrupts • Maintain peak performance under heavy load • Interrupts model can lead to livelock • Solution: • Use interrupts under low load (good latency) • Use polling under heavy load (good throughput) • Polling is typically more efficient than interrupts • Fits naturally into asynchronous I/O model
Other design issues • Disk scheduling • Elevator algorithm • Memory management • File system buffer cache • Address spaces (VM management) • Fault isolate different servers • Efficient local communication? • Efficient transfers between disk and networks • Avoid copies
More than one processor • Problem: single machine may not scale to enough clients • Solutions: • Multiprocessors • Helps when CPU is bottleneck • Server clusters • Helps when bandwidth between server and backbone is high • Distributed server clusters • Helps when bandwidth between client and distant server is low
Clusters • Naming transparency • Server cluster transparent to client? • Server selection • Metrics: CPU load, presence of data • Consistency • Partition data • Availability • More processors can decrease reliability • Replicate data (makes consistency more difficult)
Distributed clusters • Replication policies • Data distribution • Consistency • Network monitoring and modeling • Global load balancing Tradeoff between accuracy, latency, and network load
Making it secure: access control • Redo design: don’t add on • Firewalls: insecure and break many things • CPU cycles is an issue • A secure HTTP server can do about 10-20 connections a second • Pulls in other global issues • Name to key binding • Key management infrastructure
Example summary • Pipelining of disk and network requests • Need a lot of sophisticated software infrastructure • Replication for reliability and performance • Need sophisticated protocols • Difficult: We did it for one application • What if data changes rapidly? • Lack of abstractions!
6.894 lab: real systems • Multi-finger (due next week) • Asynchronous I/O • HTTP proxy • High-performance proxy • Cache, consistency, etc. • Open-ended file system project • Research