1 / 58

Introduction CS 188 Distributed Systems January 6, 2015

Introduction CS 188 Distributed Systems January 6, 2015. Description of Class. Topics covered and structure of class Grading Reading materials Student assignments Office hours Web page. Topics to Be Covered. Distributed systems Basic principles Distributed systems algorithms

chestnutd
Download Presentation

Introduction CS 188 Distributed Systems January 6, 2015

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IntroductionCS 188Distributed SystemsJanuary 6, 2015

  2. Description of Class • Topics covered and structure of class • Grading • Reading materials • Student assignments • Office hours • Web page

  3. Topics to Be Covered • Distributed systems • Basic principles • Distributed systems algorithms • Important case studies • Focusing on distributed storage systems • For specificity

  4. Specific Topics • Concepts and basic architectures • Concurrent processes and synchronization methods • IPC • Distributed processes • Distributed file systems • Models of distributed computation

  5. Specific Topics, Con’t • Synchronization and election • Distributed agreement • Replicated data management • Checkpoint and recovery • Distributed system security • Cloud computing

  6. Pre-Requisites for Class • Assumes understanding of material in • CS111 (Operating Systems) • CS118 (Computer Networks) • Without having taken these classes, you will be at a serious disadvantage • Thursday class will briefly review relevant stuff from these courses

  7. Structure of Class • A bit different • Some lectures • Some in-class discussion/design sections • Grading based on tests and group projects

  8. The Basic Class Plan • I will lecture on some topic in distributed systems on Tuesdays • There will be a taped lecture you should watch before Thursdays • Thursday classes will be interactive • Discussing how to make use of the ideas discussed in earlier lectures

  9. Taped Lectures • In Powerpoint • Made available from the class web page • Intention is that you view this lecture before the Thursday class • They will start in week 2

  10. The Core Example • Distributed storage • How can we make effective use of data and storage space on multiple machines? • One example of an important distributed system problem • That touches on many of the key challenges in distributed systems

  11. Investigating Distributed Storage • We’ll look at different design alternatives • Some embodied in well-known code • Others not necessarily implemented at all • We’ll discuss how we should go about designing such a system

  12. How Will We Start? • We’ll start simple • I want to access data stored on another machine • How do I get to the data? • What problems am I likely to face? • Then we’ll add interesting complications

  13. Some Complications • What if multiple parties use writeable data? • What if machines/networks are of limited power? • What if we keep multiple copies of the data? • What if the scale of our system is very large? • What if there are failures? • Temporary • Permanent • What if we don’t trust everyone in the system equally?

  14. The Format of the Discussions • Not lectures by me • In-class discussions • Which, hopefully, everyone will participate in • Not graded • But if you want to learn about distributed systems, be there and join in

  15. Grading • Midterm - 20% • Project - 40% • Final - 40%

  16. Reading Materials • No textbook • Online readings will be assigned on the course web page

  17. Office Hours • Tuesday/Thursday 1-2 PM • Held in 3532F Boelter Hall • Other times available by prior arrangement • I’m usually around

  18. Class Web Page http://lever.cs.ucla.edu/classes/188_winter15 • Slides for classes will be posted there • By 5 PM the previous afternoon • Readings will be posted there • With links to papers • Also links to other interesting info

  19. Class Projects • Groups of 4-5 students • Implementation of software relevant to distributed systems • Must be demonstrated • Accompanied by a short (10 page) report • Topic to be chosen by end of week 3 • Must submit 1 paragraph description

  20. More on the Projects • Largely handled by our TA • Turker Garip • He will present a set of possible projects • Groups will choose from among those projects • Groups meet regularly with Turker • Details at first recitation section

  21. Tests • Midterm and final • Format will be determined and discussed later

  22. Introduction • What is a distributed system? • Basic issues in distributed systems • Basic architectures for distributed systems

  23. What Is a Distributed System? • A system with more than one active machine • Typically connected by a network • Typically cooperating • Either on one specific task • Or to give the illusion of a bigger, more powerful machine

  24. Some Examples • An office’s local area network • A client/server system • A peer file sharing service • A factory’s industrial control system • A cloud computing environment • Specialized services like DNS

  25. Problems With Distributed Systems • Computations involving multiple machines inherently more difficult • They make resource control harder • They make coordination harder • They make security harder

  26. Basic Issues in Distributed Systems • Transparency • Naming issues • Consistency issues • Failure and recovery issues • Heterogeneity • Security

  27. Transparency • One goal of most distributed systems is to hide the distribution • Make the system look like one computer • Make the network and multiple CPUs transparent • Elusive, difficult, not always as desirable as it seems

  28. Goals of Transparency • Hide where processes execute • Hide where data is stored • Hide where IPC goes to/comes from • Hide effects of failures

  29. Access Transparency • Uniform method of access to local and remote resources • User doesn’t have to worry about where his resources are located • Implies that system must try to make very different operations look the same

  30. Location Transparency • Sometimes called name transparency • Users don’t know where resources are located • Users don’t worry about locations of objects • Users don’t worry when objects move • But system worries a lot

  31. Failure Transparency • System looks the same even when components fail • User is insulated from effects of failures • System works like crazy to pretend five machines are the same as six

  32. Why Is Transparency Important? • Distributed systems are hard • People find even single machines too complicated • The system must handle all unpleasant details that it can

  33. Why Is Transparency Hard? • Transparency implies that the system worries about all the nasty issues • Different local/remote overheads • Hiding/handling failures • Translating user-level names to physical locations • The nasty issues are hard

  34. So What? • Aren’t software systems supposed to handle the nasty details? • Yes, but . . . • If the state of the technology isn’t capable of handling them, transparency can be expensive and constraining • We aren’t smart enough to provide full transparency yet

  35. Naming • One of the key recurring problems in distributed systems • How do you name both local and remote resources? • How do you resolve the names to physical locations?

  36. Naming Local and Remote Resources • Does the resource have the same name locally and remotely? • If not, hard to work with remote resources • If so, requires keeping distributed data consistent

  37. Resolving Names • Standard operating systems can resolve names for local resources • File system names, process names, etc. • All required information is local • How does the system map the name of a remote resource to its remote location? • Keep all required information local? • Or find it remotely?

  38. A Simple Example in Distributed Naming Read File X Create File X X Node 1 Node 2 How does node 2 even know File X exists?

  39. Problems in Naming • Naming remote resources • Consistency issues • Scaling issues • Name conflicts • Most of these are related to general problems of keeping distributed data

  40. Consistency • Many distributed systems support distributed computations • User computations running at more than one node • Even if only a data storage system, writes, creates, and deletes raise issues • Unlike multi-process jobs on one node, processes don’t share memory • Only accessible over a network • How do you ensure they’re synchronized?

  41. Typical Problems in Consistency • Consistency in name spaces • Detecting file creations/deletions • Consistency in saved data • If caches/replication used, how does update to one copy change others? • Consistency in system state • Can I even reach agreement on what nodes are working?

  42. Failure and Recovery • In single machine system, failure typically halts entire system • In distributed system, one failed machine doesn’t halt the system • But what if the failed machine was performing part of a distributed computation?

  43. Heterogeneity • What if not all the machines in the distributed system are the same? • Different processors, different CPU speeds, different configurations, etc. • Even seemingly homogenous things are actually heterogeneous • Causes great problems

  44. Security Challenges for Distributed Systems • Machines are doing things for remote machines • Do you know who you’re talking to? • Do you understand what he’s asking you to do? • Can you limit your risk? • Can you protect the distributed service that spans several machines?

  45. Distributed System Architectures • Workstation-server model • Peer workstation model • Cloud model • Parallel computer model

  46. Workstation-Server Model • Some machines are dedicated for user client use • Some machines are specially designated servers • Servers have special abilities and responsibilities

  47. Characteristics of Workstation-Server Model • User workstations often lightly utilized • Waste of resources • High response when needed • Servers may be temporarily or permanently overloaded • Failure of important servers can have serious consequences

  48. Server Systems and Load • Many services are very popular • One machine can’t handle all the load • Typically, divide the load among several machines • But that becomes complicated if clients need to worry about it • Transparent load balancing usually required

  49. Peer Model • System is made up of individual users’ machines • Each servicing a particular user • Machines may act as servers for each other • But most machines not formally servers

  50. Characteristics of Peer Model • Matches what most people want • My machine interoperates seamlessly with everyone else’s • Scaling challenges • Especially beyond LAN scale • Some peer services popular • NFS • Peer file sharing systems

More Related