580 likes | 595 Views
Introduction CS 188 Distributed Systems January 6, 2015. Description of Class. Topics covered and structure of class Grading Reading materials Student assignments Office hours Web page. Topics to Be Covered. Distributed systems Basic principles Distributed systems algorithms
E N D
Description of Class • Topics covered and structure of class • Grading • Reading materials • Student assignments • Office hours • Web page
Topics to Be Covered • Distributed systems • Basic principles • Distributed systems algorithms • Important case studies • Focusing on distributed storage systems • For specificity
Specific Topics • Concepts and basic architectures • Concurrent processes and synchronization methods • IPC • Distributed processes • Distributed file systems • Models of distributed computation
Specific Topics, Con’t • Synchronization and election • Distributed agreement • Replicated data management • Checkpoint and recovery • Distributed system security • Cloud computing
Pre-Requisites for Class • Assumes understanding of material in • CS111 (Operating Systems) • CS118 (Computer Networks) • Without having taken these classes, you will be at a serious disadvantage • Thursday class will briefly review relevant stuff from these courses
Structure of Class • A bit different • Some lectures • Some in-class discussion/design sections • Grading based on tests and group projects
The Basic Class Plan • I will lecture on some topic in distributed systems on Tuesdays • There will be a taped lecture you should watch before Thursdays • Thursday classes will be interactive • Discussing how to make use of the ideas discussed in earlier lectures
Taped Lectures • In Powerpoint • Made available from the class web page • Intention is that you view this lecture before the Thursday class • They will start in week 2
The Core Example • Distributed storage • How can we make effective use of data and storage space on multiple machines? • One example of an important distributed system problem • That touches on many of the key challenges in distributed systems
Investigating Distributed Storage • We’ll look at different design alternatives • Some embodied in well-known code • Others not necessarily implemented at all • We’ll discuss how we should go about designing such a system
How Will We Start? • We’ll start simple • I want to access data stored on another machine • How do I get to the data? • What problems am I likely to face? • Then we’ll add interesting complications
Some Complications • What if multiple parties use writeable data? • What if machines/networks are of limited power? • What if we keep multiple copies of the data? • What if the scale of our system is very large? • What if there are failures? • Temporary • Permanent • What if we don’t trust everyone in the system equally?
The Format of the Discussions • Not lectures by me • In-class discussions • Which, hopefully, everyone will participate in • Not graded • But if you want to learn about distributed systems, be there and join in
Grading • Midterm - 20% • Project - 40% • Final - 40%
Reading Materials • No textbook • Online readings will be assigned on the course web page
Office Hours • Tuesday/Thursday 1-2 PM • Held in 3532F Boelter Hall • Other times available by prior arrangement • I’m usually around
Class Web Page http://lever.cs.ucla.edu/classes/188_winter15 • Slides for classes will be posted there • By 5 PM the previous afternoon • Readings will be posted there • With links to papers • Also links to other interesting info
Class Projects • Groups of 4-5 students • Implementation of software relevant to distributed systems • Must be demonstrated • Accompanied by a short (10 page) report • Topic to be chosen by end of week 3 • Must submit 1 paragraph description
More on the Projects • Largely handled by our TA • Turker Garip • He will present a set of possible projects • Groups will choose from among those projects • Groups meet regularly with Turker • Details at first recitation section
Tests • Midterm and final • Format will be determined and discussed later
Introduction • What is a distributed system? • Basic issues in distributed systems • Basic architectures for distributed systems
What Is a Distributed System? • A system with more than one active machine • Typically connected by a network • Typically cooperating • Either on one specific task • Or to give the illusion of a bigger, more powerful machine
Some Examples • An office’s local area network • A client/server system • A peer file sharing service • A factory’s industrial control system • A cloud computing environment • Specialized services like DNS
Problems With Distributed Systems • Computations involving multiple machines inherently more difficult • They make resource control harder • They make coordination harder • They make security harder
Basic Issues in Distributed Systems • Transparency • Naming issues • Consistency issues • Failure and recovery issues • Heterogeneity • Security
Transparency • One goal of most distributed systems is to hide the distribution • Make the system look like one computer • Make the network and multiple CPUs transparent • Elusive, difficult, not always as desirable as it seems
Goals of Transparency • Hide where processes execute • Hide where data is stored • Hide where IPC goes to/comes from • Hide effects of failures
Access Transparency • Uniform method of access to local and remote resources • User doesn’t have to worry about where his resources are located • Implies that system must try to make very different operations look the same
Location Transparency • Sometimes called name transparency • Users don’t know where resources are located • Users don’t worry about locations of objects • Users don’t worry when objects move • But system worries a lot
Failure Transparency • System looks the same even when components fail • User is insulated from effects of failures • System works like crazy to pretend five machines are the same as six
Why Is Transparency Important? • Distributed systems are hard • People find even single machines too complicated • The system must handle all unpleasant details that it can
Why Is Transparency Hard? • Transparency implies that the system worries about all the nasty issues • Different local/remote overheads • Hiding/handling failures • Translating user-level names to physical locations • The nasty issues are hard
So What? • Aren’t software systems supposed to handle the nasty details? • Yes, but . . . • If the state of the technology isn’t capable of handling them, transparency can be expensive and constraining • We aren’t smart enough to provide full transparency yet
Naming • One of the key recurring problems in distributed systems • How do you name both local and remote resources? • How do you resolve the names to physical locations?
Naming Local and Remote Resources • Does the resource have the same name locally and remotely? • If not, hard to work with remote resources • If so, requires keeping distributed data consistent
Resolving Names • Standard operating systems can resolve names for local resources • File system names, process names, etc. • All required information is local • How does the system map the name of a remote resource to its remote location? • Keep all required information local? • Or find it remotely?
A Simple Example in Distributed Naming Read File X Create File X X Node 1 Node 2 How does node 2 even know File X exists?
Problems in Naming • Naming remote resources • Consistency issues • Scaling issues • Name conflicts • Most of these are related to general problems of keeping distributed data
Consistency • Many distributed systems support distributed computations • User computations running at more than one node • Even if only a data storage system, writes, creates, and deletes raise issues • Unlike multi-process jobs on one node, processes don’t share memory • Only accessible over a network • How do you ensure they’re synchronized?
Typical Problems in Consistency • Consistency in name spaces • Detecting file creations/deletions • Consistency in saved data • If caches/replication used, how does update to one copy change others? • Consistency in system state • Can I even reach agreement on what nodes are working?
Failure and Recovery • In single machine system, failure typically halts entire system • In distributed system, one failed machine doesn’t halt the system • But what if the failed machine was performing part of a distributed computation?
Heterogeneity • What if not all the machines in the distributed system are the same? • Different processors, different CPU speeds, different configurations, etc. • Even seemingly homogenous things are actually heterogeneous • Causes great problems
Security Challenges for Distributed Systems • Machines are doing things for remote machines • Do you know who you’re talking to? • Do you understand what he’s asking you to do? • Can you limit your risk? • Can you protect the distributed service that spans several machines?
Distributed System Architectures • Workstation-server model • Peer workstation model • Cloud model • Parallel computer model
Workstation-Server Model • Some machines are dedicated for user client use • Some machines are specially designated servers • Servers have special abilities and responsibilities
Characteristics of Workstation-Server Model • User workstations often lightly utilized • Waste of resources • High response when needed • Servers may be temporarily or permanently overloaded • Failure of important servers can have serious consequences
Server Systems and Load • Many services are very popular • One machine can’t handle all the load • Typically, divide the load among several machines • But that becomes complicated if clients need to worry about it • Transparent load balancing usually required
Peer Model • System is made up of individual users’ machines • Each servicing a particular user • Machines may act as servers for each other • But most machines not formally servers
Characteristics of Peer Model • Matches what most people want • My machine interoperates seamlessly with everyone else’s • Scaling challenges • Especially beyond LAN scale • Some peer services popular • NFS • Peer file sharing systems