110 likes | 226 Views
UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department. Introduction to CS739: Distribution Systems. CS 739 Distributed Systems. Andrea C. Arpaci-Dusseau. What are distributed systems? What are the benefits and challenges? How will CS739 be structured?
E N D
UNIVERSITY of WISCONSIN-MADISONComputer Sciences Department Introduction to CS739:Distribution Systems CS 739Distributed Systems Andrea C. Arpaci-Dusseau What are distributed systems? What are the benefits and challenges? How will CS739 be structured? Readings, Writeups, Presentations Projects
Goals of Course • Learn about challenges and existing techniques for building distributed systems and services • Read and discuss influential papers from SOSP, OSDI, NSDI • Gain some experience programming in distributed environment • Warm-up project • Final project
What is a Distributed System? • Leslie Lamport says: “You know you have one when the crash of a computer you never heard of stops you from doing any work” • More technical definition:“Collection of independent computers that appears to its users as a single coherent system” • How are parallel, distributed, networked systems different? • All contain nodes (processing, memory, disk) connected with network Moreunified Lessunified parallel distributed networked Consider distributed services as well…
Benefits of Distributed Systems • Great price/performance • Leverage commodity components (nodes and networks) • Use many, many of them • Incremental scalability • Can add x% new nodes (or disks or memory) to improve performance x% • Improved availability • Continue operating when some nodes stop working • Improved reliability • Deliver correct results when some nodes misbehave, corrupt data • Allow geographically-distributed individuals to share data or cooperate
Distributed System Challenges • Lack of global state information • Different nodes have different view of system • What are the contents of file A? • How many jobs are running on node X? • Which nodes are currently part of the system? • See delays, different ordering of messages, lost messages, network partitions • Tension with goal of “single coherent system” • Handling slow, failed and misbehaving nodes • How do you avoid slow nodes? • How do you get back data or work from failed node? • When nodes disagree, how do you know who is wrong? • Tension with goal of “available and reliable” • When is it okay to have some centralized components? • Simplifies state management, but single point-of-failure and performance bottleneck
Content of 739 • Distributed system courses can be very different… • Theoretical: distributed algorithms (e.g., to allow nodes to come to consensus or agreement) • 4 lectures • Practical: distributed programming (e.g., using RPC, JAVA RMI, CORBA, DCOM, MPI, PVM) • Warm-up project • Research systems: new ideas for making distributed systems better • Focus of course • Implemented systems with new conceptual ideas • Recent papers in top systems conferences (SOSP, OSDI, NSDI)
Learning by Reading • Intense reading list; assume sophisticated reader (736) • Usually cover 1 fascinating paper per class • No exams • Three types of classes • Formal lecture: Only for 4 theory topics • Discussions: Most papers • I ask questions, expect everyone to enthusiastically participate; fairly casual • Task 1: Read paper 2-3 times before class • Task 2: Email write-up to me BEFORE class • Task 3: Take turns being scribe (about 2 times in semester) • Write-up notes from discussion in latex • Post to web page within 72 hours
Learning by Reading (cont) • Types of classes (cont) • Group-led lectures: 4 topics • Small group gives overview of about 3-4 related papers • Topics: • Distributed system analysis • Process migration • Programming environments • Specialized distributed services • Advantages • Good practice for giving presentations • Learn about topic in slightly more depth • Tasks • Group: • Finalize related papers (1 week before) • Present to me (2 days before) • Use slides • Everyone else: Skim papers • Handout: State preferences by next week
Course Topics: Reading List • Distributed Operating Systems (Survey, Amoeba vs Sprite) • Network File Systems (NFS, Coda, LBFS) • Theory: Time, Ordering, and Distributed Snapshots (2 Lamport papers) • Analysis of Distributed Systems (1 + Group Presentation) • Programming Environments (DSM, MapReduce, Group) • Process Migration (1 + Group) • Specialized Distributed Services (Porcupine + Group) • SPRING BREAK • Theory: Consensus (Byzantine failures and fail-stop processors) • Cluster-based File Systems (Petal+Frangipani and GoogleFS) • Communication Primitives (RPC vs U-Net) • P2P Systems (Measurement, CFS, Amazon, Pangaea, LOCKSS) • Miscellaneous: Trust, Recovery, Mistakes, Speculation, Sensor Networks
Learning by Doing • Warm-up Project • Goal: Become familiar with existing distributed programming environments • Examples: Hadoop (open-source MapReduce), MPI, PVM • Task 0: Get environment running • Task 1: Implement simple application (e.g., sorting) • Task 2: Report sufficient numbers to indicate did something • Final Project • Goal 1: Experience with “research process” in general • Work on open-ended project, unknown result • New idea where don’t know if it will work • Goal 2: Learn about specific topic in depth • Topic from my list or your own choice; work with project partner • Deliverables: 20 minute talk, short research paper
Agenda for Next Class • See website:www.cs.wisc.edu/~cs739-1 • Read: • Survey : Distributed Operating SystemsAndrew S. Tanenbaum and Robbert Van RenesseACM Computing Surveys, Volume 17, Issue 4 (December 1985), pp 419-470 • Long paper: Focus on Sections 1 and 2 • Answer question: • What were the goals of distributed systems at this time? Which design issue (I.e., communication primitives, naming and protection, resource management, fault tolerance, services) seems most challenging (or interesting)? Why? • Email answer to me with Subject cs739: Survey • Think about group presentation papers