1 / 11

Introduction to CS739: Distribution Systems

UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department. Introduction to CS739: Distribution Systems. CS 739 Distributed Systems. Andrea C. Arpaci-Dusseau. What are distributed systems? What are the benefits and challenges? How will CS739 be structured?

avongara
Download Presentation

Introduction to CS739: Distribution Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. UNIVERSITY of WISCONSIN-MADISONComputer Sciences Department Introduction to CS739:Distribution Systems CS 739Distributed Systems Andrea C. Arpaci-Dusseau What are distributed systems? What are the benefits and challenges? How will CS739 be structured? Readings, Writeups, Presentations Projects

  2. Goals of Course • Learn about challenges and existing techniques for building distributed systems and services • Read and discuss influential papers from SOSP, OSDI, NSDI • Gain some experience programming in distributed environment • Warm-up project • Final project

  3. What is a Distributed System? • Leslie Lamport says: “You know you have one when the crash of a computer you never heard of stops you from doing any work” • More technical definition:“Collection of independent computers that appears to its users as a single coherent system” • How are parallel, distributed, networked systems different? • All contain nodes (processing, memory, disk) connected with network Moreunified Lessunified parallel distributed networked Consider distributed services as well…

  4. Benefits of Distributed Systems • Great price/performance • Leverage commodity components (nodes and networks) • Use many, many of them • Incremental scalability • Can add x% new nodes (or disks or memory) to improve performance x% • Improved availability • Continue operating when some nodes stop working • Improved reliability • Deliver correct results when some nodes misbehave, corrupt data • Allow geographically-distributed individuals to share data or cooperate

  5. Distributed System Challenges • Lack of global state information • Different nodes have different view of system • What are the contents of file A? • How many jobs are running on node X? • Which nodes are currently part of the system? • See delays, different ordering of messages, lost messages, network partitions • Tension with goal of “single coherent system” • Handling slow, failed and misbehaving nodes • How do you avoid slow nodes? • How do you get back data or work from failed node? • When nodes disagree, how do you know who is wrong? • Tension with goal of “available and reliable” • When is it okay to have some centralized components? • Simplifies state management, but single point-of-failure and performance bottleneck

  6. Content of 739 • Distributed system courses can be very different… • Theoretical: distributed algorithms (e.g., to allow nodes to come to consensus or agreement) • 4 lectures • Practical: distributed programming (e.g., using RPC, JAVA RMI, CORBA, DCOM, MPI, PVM) • Warm-up project • Research systems: new ideas for making distributed systems better • Focus of course • Implemented systems with new conceptual ideas • Recent papers in top systems conferences (SOSP, OSDI, NSDI)

  7. Learning by Reading • Intense reading list; assume sophisticated reader (736) • Usually cover 1 fascinating paper per class • No exams • Three types of classes • Formal lecture: Only for 4 theory topics • Discussions: Most papers • I ask questions, expect everyone to enthusiastically participate; fairly casual • Task 1: Read paper 2-3 times before class • Task 2: Email write-up to me BEFORE class • Task 3: Take turns being scribe (about 2 times in semester) • Write-up notes from discussion in latex • Post to web page within 72 hours

  8. Learning by Reading (cont) • Types of classes (cont) • Group-led lectures: 4 topics • Small group gives overview of about 3-4 related papers • Topics: • Distributed system analysis • Process migration • Programming environments • Specialized distributed services • Advantages • Good practice for giving presentations • Learn about topic in slightly more depth • Tasks • Group: • Finalize related papers (1 week before) • Present to me (2 days before) • Use slides • Everyone else: Skim papers • Handout: State preferences by next week

  9. Course Topics: Reading List • Distributed Operating Systems (Survey, Amoeba vs Sprite) • Network File Systems (NFS, Coda, LBFS) • Theory: Time, Ordering, and Distributed Snapshots (2 Lamport papers) • Analysis of Distributed Systems (1 + Group Presentation) • Programming Environments (DSM, MapReduce, Group) • Process Migration (1 + Group) • Specialized Distributed Services (Porcupine + Group) • SPRING BREAK • Theory: Consensus (Byzantine failures and fail-stop processors) • Cluster-based File Systems (Petal+Frangipani and GoogleFS) • Communication Primitives (RPC vs U-Net) • P2P Systems (Measurement, CFS, Amazon, Pangaea, LOCKSS) • Miscellaneous: Trust, Recovery, Mistakes, Speculation, Sensor Networks

  10. Learning by Doing • Warm-up Project • Goal: Become familiar with existing distributed programming environments • Examples: Hadoop (open-source MapReduce), MPI, PVM • Task 0: Get environment running • Task 1: Implement simple application (e.g., sorting) • Task 2: Report sufficient numbers to indicate did something • Final Project • Goal 1: Experience with “research process” in general • Work on open-ended project, unknown result • New idea where don’t know if it will work • Goal 2: Learn about specific topic in depth • Topic from my list or your own choice; work with project partner • Deliverables: 20 minute talk, short research paper

  11. Agenda for Next Class • See website:www.cs.wisc.edu/~cs739-1 • Read: • Survey : Distributed Operating SystemsAndrew S. Tanenbaum and Robbert Van RenesseACM Computing Surveys, Volume 17, Issue 4 (December 1985), pp 419-470 • Long paper: Focus on Sections 1 and 2 • Answer question: • What were the goals of distributed systems at this time? Which design issue (I.e., communication primitives, naming and protection, resource management, fault tolerance, services) seems most challenging (or interesting)? Why? • Email answer to me with Subject cs739: Survey • Think about group presentation papers

More Related