330 likes | 557 Views
UNIX Internals – the New Frontiers. Distributed File Systems. Difference between DOS and DFS. Distributed OS looks like a centralized OS, but runs simultaneously on multiple machines. It may provide a FS shared by all its host machines.
E N D
UNIX Internals – the New Frontiers Distributed File Systems
Difference between DOS and DFS • Distributed OS looks like a centralized OS, but runs simultaneously on multiple machines. It may provide a FS shared by all its host machines. • Distributed FS is a software layer that manages communication between conventional operating systems and file systems
General Characteristics of DFS • Network transparency • Location transparency & Location independence • User Mobility • Fault tolerance • Scalability • File mobility
Design Considerations • Name Space • Stateful or stateless • Semantics of sharing • UNIX semantics • Session semantics • Remote access method
Network File System(NFS) • Based on Client-server model • Communicate via remote procedure call
User Perspective • An NFS server exports one or more file systems • Hard mount: must get a reply • Soft mount: returns an error • Spongy mount: hard for mount, soft for I/O • Commands: • mount –t nfs nfssrv:/usr /usr • mount –t nfs nfssrv:/usr/u1 /u1 • mount –t nfs nfssrv:/usr /users • mount –t nfs nfssrv:/usr/local /usr/local
Design goals • Not restricted to UNIX • Not be dependent on any hardware • Simple recovery mechanisms • To access remote files transparently • UNIX semantics • NFS performance must be comparable to that of a local disk • Transport-independent
NFS components • NFS protocol • RPC protocol • XDR(Extended Data Representation) • NFS server code • NFS client code • Mount protocol • Daemon processes (nfsd, mountd,biod) • NLM(Network Lock Manager)& NSM(Network Status Monitor)
Statelessness • Each request is independent • It makes crash recovery simple • Client crash • Server crash • Problem: • It must commit all modifications to stable storage before replying to a request.
10.4 The protocol suite • Why XDR? • Differences among internal representation of data elements: • Order, sizes of types. • Opaque (byte stream) • Typed • Little-endian • Big-endian
XDR • Integers • 32 bits, (0 byte leftmost - most significant), (signed integers - 2’s compliment) • Variable-length opaque data • Length(4B),data is NULL padded • Strings • Length(4B), ASCII string, NULL padded • Arrays • size(4B),same type of data • Structures • Natural order
RPC • Specify the format of communications between the client and the server. • SUN RPC: synchronous requests only. • Implemented on UDP/IP. • Authentication to identify callers • AUTH _NULL, AUTH _UNIX, AUTH_SHORT, AUTH _DES, and AUTH _KERB • RPC language compiler: rpcgen
10.5 NFS Implementation • Control Flow • Vnode • Rnode
File Handle • Assign a file handle for lookup, create or mkdir. • Subsequent I/O operations will use it. • A file handle =Opaque 32B object =<file system ID, inode number, generation number> • Generation number is used to check if the file is not obsolete (its inode is allocated to another file)
The mount operation • nfs_mount(): • send RPC request with argument of pathname • Mountd daemon translate • Checks • Reply success with a file handle • Initialize vfs, records name, address • Allocate rnode & vnode • Server must check access rights on each request
Pathname Lookup • Client: • Initiate lookup during open, create & stat • From current or root directory, proceeds one component at a time • Send request if it is a NFS directory • Server • From file handle ->FS ID->vfs->VGET-> vnode ->VOP_LOOKUP->vnode & pointer • VOP_GETATTR->VOP_FID-> file handle • Reply message= status+file handle+file attributes • Client: • Gets the reply, allocates rnode+vnode, copy info and proceeds to search for the next component
10.6 UNIX Semantics • NFS leads to a few incompatibilities with UNIX because of stateless. • Open file permission • UNIX checks for open • NFS checks for each read and write • In NFS, the server always allows the owner of the file to read or write the file. • Write to the write-protected? • Save attributes containing the file permission when open
Deletion of open files • The server has no ideas about the open file. • The clients renames the file to be deleted. • Delete it when closing it • Delete on different machines?
Reads and Writes • UNIX locks the vnode at the start of I/O • NFS clients can lock the vnode on the same machine. • NFS offers no protection against overlapping I/O requests. • Using NLM(Network Lock Manager) protocol is only advisory.
10.7 NFS Performance • Bottlenecks • Writes must be committed to stable storage • Fetching of file attributes requires one RPC call per file • Processing retransmitted requests adds to the load on the server
Client-side caching • Caching both blocks and file attributes • To avoid invalid data • Keep an expiry time in the kernel • 60 seconds for rechecking the modified time • Reduces but not eliminates the problem
Deferral of writes • Asynchronous writes for full blocks • Delayed writes for partial blocks • Flush delayed writes when closing or 30 seconds by biod daemon • Server uses NVRAM buffer, flushes the buffer to disk • Write-gathering: • Wait, process >1 writes to one file and reply for each • The server process gathered write requests
The retransmissions cache • Idempotent • Nonidempotent • Problem: • Retransmissions (xid) cache (server): • Check xid, procedure number, & client ID • Check cache only when failure • Remove request • Remove, sends reply success, but lost • Client restransmit remove • Server processes remove request • Remove error, sends remove failure • Client receives the error message
New implementation • Caches all requests • Check xid, procedure number, client ID, state field & timestamp • If request in progress, discard; if done, discards if timestamp shows the request is in the throwaway window(3-6s) • Otherwise processes request if idempotent; • For nonidempotent, checks the file if modified, if not - send success; otherwise, retry it.
10.9 NFS Security • NFS Access Control • On mount and request • By an exports list • Mount: checks the list, denies the ineligible • Request: authentication information, AUTH_UNIX form(UID,GID) • Loophole: a imposter can use <UID,GID> to access the files of others
UID Remapping • A translation map for each client. • Same UID may map to different UID on the server • Nobody if does not match in the map • Implemented at RPC level • Implemented at NFS level • Merging the map and /etc/exports file
Root Remapping • Map the super user to nobody • Limit the super user of the client to access files on the server • The UNIX framework is designed for an isolated, multi-user environment. The users trust each other.
10.10 NFS Version 3 • Commit request • Client writes, the kernel sends asynchronous write • Server saves to local cache, replies immediately • Client holds the data copy until the process closes the file and sends commit request • Server flushes data to disk • file length: • From 32 bits(4GB) to 64 bits(234 GB) • READDIRPLUS =(LOOKUP+GETATTR) • Returns names, file handles, file attributes
Other DFS • The Andrew File System (10.15 – 10.17) • The DCE Distributed File System (10.18 – 10.18.5)