210 likes | 257 Views
This lecture covers the concepts and challenges of distributed file systems, including naming and transparency, remote file access, caching, and server with or without state. Different naming strategies and their pros and cons are discussed, along with remote file caching and cache consistency. The case study of Sun's Network File System (NFS) is also presented.
E N D
Operating SystemsCMPSCI 377Lecture 21: Distributed File Systems Emery Berger University of Massachusetts, Amherst
Distributed File Systems • Most common use of distributed systems • Idea: • Given set of disks attached to different nodes,share as if all were attached to every node • Examples: • Edlab: one server, diskless workstations on LAN • AppleShare: nodes are servers with disk & client
Distributed File Systems: Issues • Naming & transparency • Remote file access • Caching • Server with state or without • Replication
Naming & Transparency • Issues • How are files named? • Do filenames reveal location? • Do filenames change if file moves? • Do filenames change if user moves?
Transparency • Location transparency: filename does not reveal physical storage location • Location independence: filename need not change if file’s storage location changes • In practice: • Most naming schemes do not have location independence • Many have location transparency
Naming Strategies:Absolute Names • Disadvantages: • User must know complete name – aware of which files are local & which are remote • File is location dependent (cannot move) • Makes sharing harder • Not fault-tolerant • <machine name, pathname> • Examples: AppleShare, Windows NT • Advantages: • Easy to find fully specified filename • Easy to add & delete new names • No global state • Scales easily
Naming Strategies:Mount Points • Mount points (NFS – Network File System) • Each host has set of local names for remote locations • Mount table (/etc/fstab): specifies <remote pathname @ machine name, local pathname> • At boot: bind local name to remote • Users refer to local pathnames • NFS manages mapping
Mount Points: Pros & Cons • Advantages: • Location transparent • Remote name can change across reboots • Disadvantages: • Single unified strategy hard to maintain • Same file can have different names
NFS Example • Partial contents of /etc/fstab for Edlab: /usr1/mail@elux3.cs.umass.edu:/var/spool/mail /users/users1@elsrv1:/users/users1 /courses/cs300@elsrv3:/courses/cs300 /rcf/common@elsrv1:/exp/rcf/common
Naming Strategies:Global Name Space • Single name space: • Examples: • AFS (CMU’s Andrew File System) • Sprite (Berkeley) • No matter which node you are on,filenames remain the same • Client: gets filename structure from server(s) • When users access files, server sends copies to workstation, where they are cached
Global Name Space: Pros & Cons • Advantages: • Naming – consistent & easy to keep consistent • Ensures all files are same regardless of where you login • Late binding of names ) moving them is easier • Disadvantages: • Difficult for OS to keep files consistent (caching) • Global name space may limit flexibility • Performance issues
Distributed File Systems: Issues • Naming & transparency • Remote file access • Caching • Server with state or without • Replication
Remote File Access & Caching • Can access files: • Remotely: returns results using RPC = remote service • Transfer part of file, perform local access = caching • Caching issues: • Where & when are file blocks cached? • When are modifications propagated back to remote file? • What happens when multiple clients cache same file?
Remote File Caching • Local disk: • Reduces access time (compared to remote) • Safe if node fails • Difficult to keep local copy consistent with remote copy • Requires client to have disk! • Local memory: • Quick access time • Works without disks • Difficult to keep local copy consistent with remote copy • Smaller cache size • Not fault-tolerant
Cache Update Policies • Write-through: write to remote disk • Reliable • Low-performance = remote service for all writes • Write-back: write only to cache • Write to disk on evictions, periodic synch • Quick • Reduces network traffic (repeated writes to same block) • User machine crashes ) data loss
Cache Consistency • Client-initiated consistency:client contacts server and checks consistency • Can check every access • Can check at given intervals • Can check only upon opening a file • Server-initiated consistency:server detects potential conflicts, invalidates caches • Server needs to know which clients have cached which parts of which files • Server must know which clients are readers & which are writers
Case Study:Sun’s Network File System • NFS: standard for distributed UNIX file access • Designed to run on LANs • Nodes are both servers & clients • Servers have no state • Uses mount protocol to make global name local • /etc/exports: lists local names server willing to export • /etc/fstab: lists global names that local nodes import • Corresponding global name must be in /etc/exports on server
NFS Implementation • NFS defines set of RPC operations for remote file access: • Directory search, reading directory entries • Manipulating links & directories • Accessing file attributes • Reading/writing files • Does not rely on node homogeneity • Heterogeneous nodes support NFS mount & remote access protocols using RPC • Users may need to know different names depending upon which node they log on
Summary • Distributed File Systems • Naming & transparency • Remote file access • Caching • Server with state or without • Replication