210 likes | 418 Views
Operating Systems CMPSCI 377 Lecture 21: Distributed File Systems. Emery Berger University of Massachusetts Amherst. Distributed File Systems. Most common use of distributed systems Idea: Given set of disks attached to different nodes, share as if all were attached to every node Examples:
E N D
Operating SystemsCMPSCI 377Lecture 21: Distributed File Systems Emery Berger University of Massachusetts Amherst
Distributed File Systems • Most common use of distributed systems • Idea: • Given set of disks attached to different nodes,share as if all were attached to every node • Examples: • Edlab: one server, workstations on LAN • AppleShare: nodes are servers with disk & client
Distributed File Systems: Issues • Naming & transparency • Remote file access • Caching • Server with state or without • Replication
Naming & Transparency • Issues • How are files named? • Do filenames reveal location? • Do filenames change if file moves? • Do filenames change if user moves?
Transparency • Location transparency: filename does not reveal physical storage location • Location independence: filename need not change if file’s storage location changes • In practice: • Most naming schemes do not have location independence • Many have location transparency
Naming Strategies:Absolute Names • Disadvantages: • User must know complete name – aware of which files are local & which are remote • File is location dependent (cannot move) • Makes sharing harder • Not fault-tolerant • <machine name, pathname> • Examples: AppleShare, Windows NT • Advantages: • Easy to find fully specified filename • Easy to add & delete new names • No global state • Scales easily
Naming Strategies:Mount Points • Mount points (NFS – Network File System) • Each host has set of local names for remote locations • Mount table (/etc/fstab): specifies <remote pathname @ machine name, local pathname> • At boot: bind local name to remote • Users refer to local pathnames • NFS manages mapping
Mount Points: Pros & Cons • Advantages: • Location transparent • Remote name can change across reboots • Disadvantages: • Single unified strategy hard to maintain • Same file can have different names
NFS Example • Partial contents of /etc/fstab for Edlab: /usr1/mail@elux3.cs.umass.edu:/var/spool/mail /users/users1@elsrv1:/users/users1 /courses/cs300@elsrv3:/courses/cs300 /rcf/common@elsrv1:/exp/rcf/common
Naming Strategies:Global Name Space • Single name space: • Examples: • AFS (CMU’s Andrew File System) • Sprite (Berkeley) • No matter which node you are on,filenames remain the same • Client: gets filename structure from server(s) • When users access files, server sends copies to workstation, where they are cached
Global Name Space: Pros & Cons • Advantages: • Naming – consistent • Ensures all files are same regardless of where you login • Late binding of names ) moving them is easier • Disadvantages: • Difficult for OS to keep files consistent (caching) • Global name space may limit flexibility • Performance issues
Distributed File Systems: Issues • Naming & transparency • Remote file access • Caching • Server with state or without • Replication
Remote File Access & Caching • Can access files: • Remotely: returns results using RPC = remote service • Transfer part of file, perform local access = caching • Caching issues: • Where & when are file blocks cached? • When are modifications propagated back to remote file? • What happens when multiple clients cache same file?
Remote File Caching • Local disk: • Reduces access time (compared to remote) • Safe if node fails • Difficult to keep local copy consistent with remote copy • Requires client to have disk! • Local memory: • Quick access time • Works without disks • Difficult to keep local copy consistent with remote copy • Smaller cache size • Not fault-tolerant
Cache Update Policies • Write-through: write to remote disk • Reliable • Low-performance = remote service for all writes • Write-back: write only to cache • Write to disk on evictions, periodic synch • Quick • Reduces network traffic (repeated writes to same block) • User machine crashes ) data loss
Cache Consistency • Client-initiated consistency:client contacts server and checks consistency • Can check every access, • at given intervals, • only upon opening a file • Server-initiated consistency:server detects potential conflicts, invalidates caches • Server needs to know which clients have cached which parts of which files • which clients are readers & which are writers
Case Study:Sun’s Network File System • NFS: standard for distributed UNIX file access • Designed to run on LANs • Nodes are both servers & clients • Servers have no state • Uses mount protocol to make global name local • /etc/exports: lists local names server willing to export • /etc/fstab: lists global names that local nodes import • Corresponding global name must be in /etc/exports on server
NFS Implementation • NFS defines set of RPC operations for remote file access: • Directory search, reading directory entries • Manipulating links & directories • Accessing file attributes • Reading/writing files • Does not rely on node homogeneity • Heterogeneous nodes support NFS mount & remote access protocols using RPC • Users may need to know different names depending upon which node they log on
Summary • Distributed File Systems • Naming & transparency • Remote file access • Caching • Server with state or without • Replication