300 likes | 398 Views
Distributed File Systems. Group A5 Amit Sharma Dhaval Sanghvi Ali Abbas. Outline. What is a DFS Requirements of a DFS Sun Network File System History Architecture Protocols Implementation. Basics. File: named collection of logically related data
E N D
Distributed File Systems Group A5 Amit Sharma Dhaval Sanghvi Ali Abbas
Outline • What is a DFS • Requirements of a DFS • Sun Network File System • History • Architecture • Protocols • Implementation
Basics • File: named collection of logically related data • Unix file: an uninterpreted sequence of bytes • File system: • Provides a logical view of data and storage functions • User-friendly interface • Provides facility to create, modify, organize, and delete files • Provides sharing among users in a controlled manner • Provides protection
What is a DFS • A distributed implementation of time sharing model of a file system, where multiple users share files and storage resources. • Overall storage space managed by a DFS consists of different, remotely located, smaller storage spaces.
Requirements • Transparency: • Access transparency • Location transparency • Mobility transparency • Failure transparency • Performance transparency
Other Requirements • Scaling • Security • Hardware and operating System heterogeneity
Sun’s Network File System • Introduced by Sun Microsystems in 1985 • Sun published the protocol and licensed reference implementation • Since then, NFS has been supported by every Unix variant
NFS design objectives • Machine and OS independence, no recompilation of applications • Crash recovery • Transparent access • Reasonable performance (comparable to local FS)
NFS - The basic idea • allow an arbitrary collection of clients and servers to share a common file system • In most cases all clients and servers are on the same LAN • each machine can be both a client and a server • Each NFS server exports one or more of its directories for access by remote clients
NFS - The basic idea (cont.) • When a directory is made available, so are all of its subdirectories. • whole directory trees are exported by NFS as a unit • The list of exported directories a server exports is maintained in the /etc/exports file • Uses RPC / XDR
NFS - How do we get the files • Mount protocol • access shared file systems by mounting them from an NFS server machine. • Where? at mount point • Mount point? -an empty directory or subdirectory, created as place to attach a remote file system.
How do we get the files (cont.) • server returns a file handle to the client. • The file handle contains fields uniquely identifying • the file system type (ext2, vfat, Novell, BSD, NeXTSTEP..) • the disk • the i-node number of the directory • and security information
How do we get the files (cont.) • The server daemons: • nfsd: The NFS Daemon which services requests from NFS clients. • mountd: The NFS Mount Daemon which actually carries out requests that nfsd passes on to it. • portmap: The portmapper daemon which allows NFS clients to find out which port the NFS server is using.
VFS • VFS allows diverse specific file systems to coexist in a file tree, isolating all FS-dependencies in pluggable filesystem modules. • VFS was an internal kernel restructuring with no effect on the syscall interface. • VFS layer maintains a table with one entry for each open file
VFS 2 • VFS layer has an entry called a v-node (virtual i-node). • for every open file, V-nodes are used to tell whether the file is local or remote. • A V-node points to either an i-node, when the file is on the local disk, or an r-node in the NFS Client code, when the reference is to data on a remote disk. • all state information on the open files is stored on the client's side.
Vnode use • To mount a remote file system, the system admn (or /etc/rc) calls the mount program • Kernel constructs vnode for remote directory and asks NFS-client code to create an r-node in its internal tables. Vnode in client VFS will point to local I-node or r-node.
NFS implementation • Servers are stateless: Each request has complete information – does not rely on previous state. i.e. idempotent • User’s identity must be verified for each request • Most UNIX system calls are supported except for open and close
Idempotent • idem·po·tent • Pronunciation: 'I -d&m-"pO-t&nt • Date: 1870 • : relating to or being a mathematical quantity which when applied to itself under a given binary operation (as multiplication) equals itself; • also : relating to or being an operation under which a mathematical quantity is idempotent
Semantics of file sharing • On a single processor, when a read follows a write, the value returned by the read is the value just written. • In a distributed system with caching,obsolete values may be returned.
Method Comment UNIX semantics Every operation on a file is instantly visible to all processes Session semantics No changes are visible to other processes until the file is closed Immutable files No updates are possible; simplifies sharing and replication Transaction All changes occur atomically Semantics of file sharing • NFS implements session semantics
Caching • The cache consistency problem: cached data may become stale if cached data is updated elsewhere in the network. • NFS solution: • Timestamp invalidation. Timestamp each cache entry, and periodically query the server: “has this file changed since time t?”; invalidate cache if stale.
NFS Client Caching • Where? -in main memory of clients • What? - file blocks, translation of file names to vnodes, and attributes of files and directories. • (1) File blocks- time stamp of file (when last modified on the server). • After certain age, blocks have to be validated with server • delay writing policy: modified blocks flushed to server after certain delay
NFS Client Caching • Clients do not free delayed-write blocks until the server confirms that the data have been written to disk. • (2) Caching of file names to vnodes for remote directory access • speeds up the lookup procedure • (3) Caching of file and directory attributes • updated when new attributes received from server, discarded after certain time
NFS Client Caching • Writes: • block marked dirty and scheduled for flushing. • flushing: when file is closed, or a sync occurs at client. • What if multiple clients write to same file at the same time? • Can get either version (or parts of both). Completely arbitrary. Just like normal Unix • Problem: Writes from clients So if writes happen at time t and close happens at t’ then other clients might not see new data till t’
Cache validation • Validation check performed : • at file open • whenever server contacted to get new block • after timeout (3s for file blocks, 30s for directories) • Done for all files (even if not being shared). • Expensive! • Potentially, every 3 sec get file attributes. • If needed invalidate all blocks.
Operation Description Lock Creates a lock for a range of bytes (non-blocking_ Lockt Test whether a conflicting lock has been granted Locku Remove a lock from a range of bytes Renew Renew the lease on a specified lock Locking in NFS • NFS supports file locking • Applications can use locks to ensure consistency • Locking was not part of NFS until version 3 • NFS v4 supports locking as part of the protocol (see above table)
NFS score card • Pros: • simple • highly portable • Cons: • Not Secure • Locking is not good • Sometimes inconsistent • Clients maintain 2 caches, one for file attributes (i-nodes) and one for file data. Caching can be nasty
Summary • How do we make it fast? • Answer: caching, read-ahead • How do we make it reliable? What if a message is dropped? What if the server crashes? • Answer: client retransmits request until it receives a response. • How do we preserve file system semantics in the presence of failures and/or sharing by multiple clients? • Answer: well, we don’t, at least not completely.
Alternatives to NFS • Andrew File System - CMU, now IBM • Sprite • Coda • Distributed File System • Remote File System • Netware - Novell based file system