180 likes | 250 Views
Introduction. Readings. Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 3 Note: All figures from this book . Who Can Take This Class?. Anyone with some interest in distributed systems and data management Pre-requisite:
E N D
Readings • Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 3 • Note: All figures from this book
Who Can Take This Class? • Anyone with some interest in distributed systems and data management • Pre-requisite: • Undergraduate OS (or undergraduate networking course) • This course does have programming and you may have to be willing to learn new technologies
Course Structure • Lectures • Papers will be assigned to you to review • You are expected to participate in class • Three assignments • Programming • Group work • Survey paper and presentation
Course Readings • No official textbook • Lectures are based on (mostly) research papers • UWO has on-line subscriptions to many of the conferences and journals that the papers are found in
intranet % % ISP % % backbone satellite link desktop computer: server: network link: A typical portion of the Internet
Services and the Internet • Increasingly we are seeing more of our applications moving from the PC to the Internet e.g., • Email – gmail, yahoo • Photo management – Picasso, Kodak, Shutterbug • Word processing – Google apps • Why? • Less work on the user’s behalf • Maybe the potential for less cost for the user
Data Centers • To support this move from the PC to the “Internet” requires a large number of servers, storage, network support etc; • Companies like Amazon, Google, eBay are running data centers with tens of thousands of machines • To make users trust these systems requires that a number of issues be addressed e.g., failure handling
Reasons • Share data and information despite geography • Examples: • Internet: Wikipedia, email, web sites • Intranet: Networked file systems • Availability • Hardware can fail: power outage, disk failures, memory corruption • Software can fail: bugs, mis-configuration, • To achieve high availability requires replications of data/computation with automatic failovers • Aggregate resources of many computers • CPU: Dryad, MapReduce • Bandwidth: Akamai CDN, BitTorrent
Challenges • Consistency • Sharing data consistently among multiple readers/writers • Fault tolerance • Keeping system available despite node or network failures • Security • Authenticating clients or servers • Defending against or audit misbehaving servers
Challenges • System design • Right interface or abstraction? • Implementation • Maximize concurrency • Identify bottlenecks • Reduce load on the bottleneck resource
Case Study: Distributed File System • A distributed file system provides shared access for multiple clients to a shared storage transparently. • NFS is one of the earliest examples • NFS Design • A file server stores all data and handles requests from clients
Case Study • Consistency • When a client edits a shared file when do other clients see the results? • Fault Tolerance • How to keep the system running when the file server is down? • Replicate data at multiple servers • How to update replicated data? • How to fail-over among replicas • How to maintain consistency?
Case Study • Security • Adversary can manipulate messages • How to authenticate? • Adversary may compromise machines • Can the FS remain correct despite a few compromised nodes • Implementation • The file server should serve multiple clients concurrently • Server threads need to modify shared state • Avoid race conditions
Other Examples • It’s not just file servers • What about companies like Amazon? Facebook? Google?
Summary • We briefly discussed a high-level view of data distributed • In this course we will discuss various architectures and algorithms and how they are used by major companies like Amazon, Google, etc;