1 / 86

15-440 Distributed Systems

This lecture provides an introduction to distributed systems, including their features such as message-based communication and expandability, as well as their goals such as resource availability and distribution transparency.

dowler
Download Presentation

15-440 Distributed Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 15-440 Distributed Systems Lecture 1 – Introduction to Distributed Systems

  2. What Is A Distributed System? “A collection of independent computers that appears to its users as a single coherent system.” • Features: • No shared memory – message-based communication • Each runs its own local OS • Heterogeneity • Expandability • Ideal: to present a single-system image: • The distributed system “looks like” a single computer rather than a collection of separate computers.

  3. Definition of a Distributed System Figure 1-1. A distributed system organized as middleware. The middleware layer runs on all machines, and offers a uniform interface to the system

  4. Distributed Systems: Goals • Resource Availability: remote access to resources • Distribution Transparency: single system image • Access, Location, Migration, Replication, Failure,… • Openness: services according to standards (RPC) • Scalability: size, geographic, admin domains, … • Example of a Distributed System? • Web search on google • DNS: decentralized, scalable, robust to failures, ... • ...

  5. 15-440 Distributed Systems Lecture 2 & 3 – 15-441 in 2 Days

  6. Packet Switching – Statistical Multiplexing • Switches arbitrate between inputs • Can send from any input that’s ready • Links never idle when traffic to send • (Efficiency!) Packets

  7. Model of a communication channel • Latency - how long does it take for the first bit to reach destination • Capacity - how many bits/sec can we push through? (often termed “bandwidth”) • Jitter - how much variation in latency? • Loss / Reliability - can the channel drop packets? • Reordering

  8. Packet Switching • Source sends information as self-contained packets that have an address. • Source may have to break up single message in multiple • Each packet travels independently to the destination host. • Switches use the address in the packet to determine how to forward the packets • Store and forward • Analogy: a letter in surface mail.

  9. Internet • An inter-net: a network of networks. • Networks are connected using routers that support communication in a hierarchical fashion • Often need other special devices at the boundaries for security, accounting, .. • The Internet: the interconnected set of networks of the Internet Service Providers (ISPs) • About 17,000 different networks make up the Internet Internet

  10. Network Service Model • What is the service model for inter-network? • Defines what promises that the network gives for any transmission • Defines what type of failures to expect • Ethernet/Internet: best-effort – packets can get lost, etc.

  11. Possible Failure models • Fail-stop: • When something goes wrong, the process stops / crashes / etc. • Fail-slow or fail-stutter: • Performance may vary on failures as well • Byzantine: • Anything that can go wrong, will. • Including malicious entities taking over your computers and making them do whatever they want. • These models are useful for proving things; • The real world typically has a bit of everything. • Deciding which model to use is important!

  12. What is Layering? • Modular approach to network functionality • Example: Application Application-to-application channels Host-to-host connectivity Link hardware

  13. IP Layering • Relatively simple Application Transport Network Link Physical Host Bridge/Switch Router/Gateway Host

  14. Protocol Demultiplexing • Multiple choices at each layer FTP HTTP NV TFTP TCP UDP Network IP TCP/UDP IPX IP Type Field Protocol Field Port Number NET1 NET2 … NETn

  15. Goals [Clark88] • Connect existing networks initially ARPANET and ARPA packet radio network • Survivability ensure communication service even in the presence of network and router failures • Support multiple types of services • Must accommodate a variety of networks • Allow distributed management • Allow host attachment with a low level of effort • Be cost effective • Allow resource accountability

  16. Goal 1: Survivability • If network is disrupted and reconfigured… • Communicating entities should not care! • No higher-level state reconfiguration • How to achieve such reliability? • Where can communication state be stored?

  17. CIDR IP Address Allocation Provider is given 201.10.0.0/21 Provider 201.10.0.0/22 201.10.4.0/24 201.10.5.0/24 201.10.6.0/23

  18. Ethernet Frame Structure (cont.) Addresses: 6 bytes Each adapter is given a globally unique address at manufacturing time Address space is allocated to manufacturers 24 bits identify manufacturer E.g., 0:0:15:*  3com adapter Frame is received by all adapters on a LAN and dropped if address does not match Special addresses Broadcast – FF:FF:FF:FF:FF:FF is “everybody” Range of addresses allocated to multicast Adapter maintains list of multicast groups node is interested in 18

  19. End-to-End Argument • Deals with where to place functionality • Insidethe network (in switching elements) • At the edges • Argument • If you have to implement a function end-to-end anyway (e.g., because it requires the knowledge and help of the end-point host or application), don’t implement it inside the communication system • Unless there’s a compelling performance enhancement • Key motivation for split of functionality between TCP,UDP and IP Further Reading: “End-to-End Arguments in System Design.” Saltzer, Reed, and Clark.

  20. User Datagram Protocol (UDP): An Analogy Postal Mail • Single mailbox to receive messages • Unreliable  • Not necessarily in-order delivery • Each letter is independent • Must address each reply UDP • Single socket to receive messages • No guarantee of delivery • Not necessarily in-order delivery • Datagram – independent packets • Must address each packet Postal Mail • Single mailbox to receive letters • Unreliable  • Not necessarily in-order delivery • Letters sent independently • Must address each letter Example UDP applications Multimedia, voice over IP

  21. Transmission Control Protocol (TCP): An Analogy TCP • Reliable – guarantee delivery • Byte stream – in-order delivery • Connection-oriented – single socket per connection • Setup connection followed by data transfer Telephone Call • Guaranteed delivery • In-order delivery • Connection-oriented • Setup connection followed by conversation Example TCP applications Web, Email, Telnet

  22. 15-440 Distributed Systems Lecture 5 – Classical Synchronization

  23. Classic synchronization primitives • Basics of concurrency • Correctness (achieves Mutex, no deadlock, no livelock) • Efficiency, no spinlocks or wasted resources • Fairness • Synchronization mechanisms • Semaphores (P() and V() operations) • Mutex (binary semaphore) • Condition Variables (allows a thread to sleep) • Must be accompanied by a mutex • Wait and Signal operations • Work through examples again + GO primitives

  24. 15-440 Distributed Systems Lecture 6 – RPC

  25. RPC Goals • Ease of programming • Hide complexity • Automates task of implementing distributed computation • Familiar model for programmers (just make a function call) Historical note: Seems obvious in retrospect, but RPC was only invented in the ‘80s. See Birrell & Nelson, “Implementing Remote Procedure Call” ... or Bruce Nelson, Ph.D. Thesis, Carnegie Mellon University: Remote Procedure Call., 1981 :)

  26. Passing Value Parameters (1) • The steps involved in a doing a remote computation through RPC.

  27. Stubs: obtaining transparency • Compiler generates from API stubs for a procedure on the client and server • Client stub • Marshals arguments into machine-independent format • Sends request to server • Waits for response • Unmarshals result and returns to caller • Server stub • Unmarshals arguments and builds stack frame • Calls procedure • Server stub marshals results and sends reply

  28. Real solution: break transparency • Possible semantics for RPC: • Exactly-once • Impossible in practice • At least once: • Only for idempotent operations • At most once • Zero, don’t know, or once • Zero or once • Transactional semantics

  29. Asynchronous RPC (3) • A client and server interacting through two asynchronous RPCs.

  30. Important Lessons • Procedure calls • Simple way to pass control and data • Elegant transparent way to distribute application • Not only way… • Hard to provide true transparency • Failures • Performance • Memory access • Etc. • How to deal with hard problem  give up and let programmer deal with it • “Worse is better”

  31. 15-440 Distributed Systems Lecture 7,8 – Distributed File Systems

  32. Why DFSs? • Why Distributed File Systems: • Data sharing among multiple users • User mobility • Location transparency • Backups and centralized management • Examples: NFS (v1 – v4), AFS, CODA, LBFS • Idea: Provide file system interfaces to remote FS’s • Challenge: heterogeneity, scale, security, concurrency,.. • Non-Challenges: AFS meant for campus community • Virtual File Systems: pluggable file systems • Use RPC’s

  33. NFS vs AFS Goals • AFS: Global distributed file system • “One AFS”, like “one Internet” • LARGE numbers of clients, servers (1000’s cache files) • Global namespace (organized as cells) => location transparency • Clients w/disks => cache, Write sharing rare, callbacks • Open-to-close consistency (session semantics) • NFS: Very popular Network File System • NFSv4 meant for wide area • Naming: per-client view (/home/yuvraj/…) • Cache data in memory, not on disk, write through cache • Consistency model: buffer data (eventual, ~30seconds) • Requires significant resounces as users scale

  34. DFS Important bits (1) • Distributed filesystems almost always involve a tradeoff: consistency, performance, scalability. • We’ve learned a lot since NFS and AFS (and can implement faster, etc.), but the general lesson holds. Especially in the wide-area. • We’ll see a related tradeoff, also involving consistency, in a while: the CAP tradeoff. Consistency, Availability, Partition-resilience.

  35. DFS Important Bits (2) • Client-side caching is a fundamental technique to improve scalability and performance • But raises important questions of cache consistency • Timeouts and callbacks are common methods for providing (some forms of) consistency. • AFS picked close-to-open consistency as a good balance of usability (the model seems intuitive to users), performance, etc. • AFS authors argued that apps with highly concurrent, shared access, like databases, needed a different model

  36. Coda Summary • Distributed File System built for mobility • Disconnected operation key idea • Puts scalability and availability beforedata consistency • Unlike NFS • Assumes that inconsistent updates are very infrequent • Introduced disconnected operation mode and file hoarding and the idea of “reintegration”

  37. Low Bandwidth File SystemKey Ideas • A network file systems for slow or wide-area networks • Exploits similarities between files • Avoids sending data that can be found in the server’s file system or the client’s cache • Uses RABIN fingerprints on file content (file chunks) • Can deal with byte offsets when part of file change • Also uses conventional compression and caching • Requires 90% less bandwidth than traditional network file systems

  38. 15-440 Distributed Systems Lecture 9 – Time Synchronization

  39. Impact of Clock Synchronization • When each machine has its own clock, an event that occurred after another event may nevertheless be assigned an earlier time.

  40. Clocks in a Distributed System • Computer clocks are not generally in perfect agreement • Skew: the difference between the times on two clocks (at any instant) • Computer clocks are subject to clock drift (they count time at different rates) • Clock drift rate: the difference per unit of time from some ideal reference clock • Ordinary quartz clocks drift by about 1 sec in 11-12 days. (10-6 secs/sec). • High precision quartz clocks drift rate is about 10-7 or 10-8 secs/sec

  41. Perfect networks • Messages always arrive, with propagation delay exactly d • Sender sends time T in a message • Receiver sets clock to T+d • Synchronization is exact

  42. mr p Time server,S mt Cristian’s Time Sync • A time server S receives signals from a UTC source • Process p requests time in mr and receives t in mtfrom S • p sets its clock to t + RTT/2 • Accuracy ± (RTT/2- min) : • because the earliest time S puts t in message mt is min after p sent mr. • thelatest time was min before mtarrivedat p • the time by S’s clock when mt arrives is in the range [t+min, t + RTT - min] Tround is the round trip time recorded by p min is an estimated minimum round trip time

  43. Berkeley algorithm • Cristian’s algorithm - • a single time server might fail, so they suggest the use of a group of synchronized servers • it does not deal with faulty servers • Berkeley algorithm (also 1989) • An algorithm for internal synchronization of a group of computers • A master polls to collect clock values from the others (slaves) • The master uses round trip times to estimate the slaves’ clock values • It takes an average (eliminating any above average round trip time or with faulty clocks) • It sends the required adjustment to the slaves (better than sending the time which depends on the round trip time) • Measurements • 15 computers, clock synchronization 20-25 millisecs drift rate < 2x10-5 • If master fails, can elect a new master to take over (not in bounded time) •

  44. Server T1 T2 Time m m' Time Client T0 T3 NTP Protocol • All modes use UDP • Each message bears timestamps of recent events: • Local times of Send and Receive of previous message • Local times of Send of current message • Recipient notes the time of receipt T3 (we have T0, T1, T2, T3)

  45. Logical time and logical clocks (Lamport 1978) • Instead of synchronizing clocks, event ordering can be used • If two events occurred at the same process pi (i = 1, 2, … N) then they occurred in the order observed by pi, that is the definition of: “i” • when a message, m is sent between two processes, send(m) happens before receive(m) • The happened before relation is transitive • The happened before relation is the relation of causal ordering

  46. Total-order Lamport clocks • Many systems require a total-ordering of events, not a partial-ordering • Use Lamport’s algorithm, but break ties using the process ID • L(e) = M * Li(e) + i • M = maximum number of processes • i = process ID

  47. Vector Clocks • Note that e  e’ implies V(e)<V(e’). The converse is also true • Can you see a pair of parallel events? • c || e (parallel) because neither V(c) <= V(e) nor V(e) <= V(c)

  48. 15-440 Distributed Systems Lecture 10 – Mutual Exclusion

  49. (+$100) (+1%) (San Francisco) (New York) Example: Totally-Ordered Multicasting • San Fran customer adds $100, NY bank adds 1% interest • San Fran will have $1,111 and NY will have $1,110 • Updating a replicated database and leaving it in an inconsistent state. • Can use Lamport’s to totally order

  50. Mutual ExclusionA Centralized Algorithm (1) @ Client  Acquire: Send (Request, i) to coordinator Wait for reply @ Server: while true: m = Receive() If m == (Request, i): If Available(): Send (Grant) to i

More Related