Examples of Remote File Systems CS 188 Distributed Systems January 29, 2015

Examples of Remote File Systems CS 188Distributed SystemsJanuary 29, 2015

Introduction • Details on actual remote file systems • CIFS • NFS • AFS

Common Internet File System • Originally a proprietary Microsoft Protocol • For use in Windows environments • Now a standard usable on most platforms • Designed to enable “work group” computing • Group of PCs sharing same data, printers • Any PC can export its resources to the group • They chose a peer solution • Though they treat it as client/server • Any machine can act as client or server • Work group is the union of those resources

CIFS Architecture • Standard remote file access architecture • Based on SMB protocol • State-full per-user client/server sessions • Password or challenge/response authentication • Server tracks open files, offsets, updates • Makes server fail-over much more difficult • Opportunistic locking • Client can cache file if nobody else using/writing it • Otherwise all reads/writes must be synchronous • Servers regularly advertise what they export • Enabling clients to “browse” the workgroup

Benefits of Opportunistic Locking • A big performance win • Getting permission from server before each write is a huge expense • In both time and server loading • If no conflicting file use 99.99% of the time, opportunistic locks greatly reduce overhead • When they can’t be used, CIFS does provide correct centralized serialization

CIFS Pros and Cons • Performance/Scalability • Opportunistic locks enable good performance when shared access is rare • Otherwise, forced synchronous I/O is slow • Transparency • Very good, especially the global name space • Conflict Prevention • File/record locking and synchronous writes work well • Robustness • State-full servers make seamless fail-over difficult

The Network File System (NFS) • Transparent, heterogeneous file system sharing • Local and remote files are indistinguishable • Peer-to-peer and client-server sharing • Disk-full clients can export file systems to others • Able to support diskless (or dataless) clients • Minimal client-side administration • High efficiency and high availability • Read performance competitive with local disks • Scalable to huge numbers of clients • Seamless fail-over for all readers and some writers

mount(bar,node2,A) NFS Example open(/bar/two) open(/A/two) Node 1: NFS Client Node 2: NFS Server / / bar A B foo x y one two two two

NFS Implementation • Code at both client and server • Client code implements a virtual file system • Translates opens, reads, etc. into RPC operations • Server converts incoming RPC requests to operations on local files

NFS Handles • When a file is opened at the client, the NFS server creates a file handle • Opaque to that client • Meaningful to the server • Client names file by providing handle to server • File handles can become stale • Typically when file they point to disappears/changes inode numbers

NFS Processes • In addition to virtual file system/RPC code, NFS uses long-running processes • At the application level, but usually only stubs • Which call special NFS kernel code • nfsd daemons - server daemons that accept RPC calls for NFS • rpc.mountd daemons - server daemons that handle mount requests • biod daemons - optional client daemons that can improve performance

The NFS Protocol • Relies on idempotent operations and stateless server • Built on top of a remote procedure call protocol • With eXternal Data Representation, server binding • Versions of RPC over both TCP or UDP • Optional encryption (may be provided at lower level) • Scope – most normal file operations • Lookup (open), read, write, read-directory, stat, etc. • Some operations not quite the same as local • Supports client or server-side authentication • Supports client-side caching of file contents • Locking and auto-mounting done with another protocol

NFS From the Client Side • User issues a normal file operation • Like chmod() • Passes through VFS to client-side NFS implementation • Client-side implementation formats and sends RPC packet to server • Actually, arranges that client process sends it, so client process blocks

NFS From the Server Side • Server side’s file system isn’t NFS • EXT3, BTRFS, or some other local file system, typically working off disk • This may be a very different file system than what’s on the client • rpc.mountd and nfsd map incoming RPC calls into VFS calls on local file system • Again, most of the code in the kernel • Servers keep no state on previous operations • So NFS server operations must be self-contained

Implications of Statelessness • RPC requests must completely describe operations • NFS requests must be idempotent • Stateless transport protocol (e.g., UDP) is OK, at least for small requests • Servers need not worry about client crashes • Server crashes won’t leave junk lying around

One Very Important Implication of NFS Statelessness • Servers don’t know what files clients think are open • Unlike many other remote file systems • Like CIFS • Makes it harder to provide certain semantics to the remote users • But easier for normal server operations • And recovery from failures

Sleazy NFS Tricks • NFS does lots of tricks to make it look like normal POSIX file semantics • E.g., if client unlinks file he has open, send rename to server rather than remove • Perform actual remove when file is closed • Won’t work if file removed on server • What happens if client crashes?

NFS Performance • How does NFS avoid always going across the net? • Obviously, cache the data on the client • Done through an internal buffer cache • NFS knows what it has kept there • Responds from cache, when it can • Different caching strategies for data and metadata • biod does read-ahead for sequential access • Tending to pre-fill the cache

Why is Caching File Data Important? • Reads often done in small increments • E.g., 128 bytes • Each network round trip involves multiple RPC packets • Which is expensive • NFS client usually asks server for much more data • Say, 8K bytes • Which it stores internally • If client wants the next 128 bytes, no need to go over the network

NFS File Attribute Caching • Attribute caching very important for performance • Many applications get and set file attributes frequently • So they need to do it fast • NFS internally caches attributes • Changes to cached attributes not written back immediately • Typically after 30-60 seconds

NFS Authentication • How can we trust NFS clients to authenticate themselves? • NFS not not designed for direct use by user applications • It permits one operating system instance to access files belonging to another OS instance • If we trust the remote OS to see the files, might as well trust it to authenticate the user • Obviously, don’t use NFS if you don’t trust the remote OS . . .

NFS and Updates • An NFS server does not prevent conflicting updates • As with local file systems, this is application’s job • Auxiliary server/protocol for file and record locking • All leases are maintained on the lock server • All lock/unlock operations handed by lock server • Locking integrated into basic protocol in NSF version 4 • Client/network failure handling • Server can break locks if client dies or times out • “Stale-handle” errors inform client of broken lock • Client response to these errors are application specific • Lock server failure handling is very complex

NFS Pros and Cons • Transparency/Heterogeneity • Local/remote transparency is excellent • NFS works with all major OSes and FSes • Performance • Read performance may be better than local disk • Write performance slower than local disk • Robustness • Transparent fail-over capability for readers • Recoverable fail-over capability for writers

The Andrew File System • AFS • Developed at CMU • Designed originally to support student and faculty use • Generally, large numbers of users of a single organization • Uses a client/server model • Makes use of whole-file caching

Basic AFS Approach • Use dedicated file server machines to store files • Several, to share the load • All files stored at servers permanently • Except workstation config and temporary files • Users’ personal files stored at servers • Only make files available to client workstations on demand • Assume reasonable level of reliability and connectivity

AFS Basics • Designed for scalability, performance • Large numbers of clients (~5-10K) and few servers • Needed performance of local file systems • Very low per-client load imposed on servers • No administration or back-up for client disks • Master files reside on a file server • Local file system is used as a local cache • Local reads satisfied from cache when possible • Files are only read from server if not in cache • Simple synchronization of updates

AFS Architecture client server Andrew cache mangaer Andrew Agent socket I/O socket I/O UDP TCP UDP TCP EXT3 FS EXT3 FS Andrew Relay IP IP MAC driver MAC driver block I/O block I/O NIC driver NIC driver disk driver disk driver local FS (cache only) remote server file system

Server File Storage • Each file is stored at one server • Files organized into hierarchical subtrees • All servers maintain a map of which subtrees are at which servers • Clients asking any server for a file can be directed to the right server

Multiple File Copies in AFS • Server always keeps a copy of each file • Multiple clients might also be caching a copy • Clients check for local copies in cache at open time • If no local copy exists, fetch it from server • If local copy exists, see if it is still up-to-date • Compare file size and modification time with server • Optimizations reduce overhead of checking • Subscribe/broadcast change notifications • Time-to-live on cached file attributes and contents

AFS and Updates • Updates made directly to the local cached copy (only) • Send updates to server when file is closed • Wait for all changes to be completed • File may be deleted before it is closed • E.g., temporary files that servers need not know about • When server receives update, uses callback mechanism to provide consistency

AFS Callbacks • Servers keep track of who cached a file • If one cached copy is updated, cache invalidation messages sent to all others • Clients receiving a callback message discard their cached copy • If further activity on file, get a new copy from the server • There could be problems . . .

AFS Pros and Cons • Performance and Scalability • All file access by user/applications is local • Update checking (with time-to-live) is relatively cheap • Both fetch and update propagation are very efficient • Minimal per-client server load (once cache filled) • Robustness • No server fail-over, but have local copies of most files • Transparency • Mostly perfect - all file access operations are local • Pray that we don't have any update conflicts • AFS is still fairly widely used

A Diversion Into Generality • The problem AFS addresses via callbacks must be addressed by any remote file system • What is the nature of the problem? • What are the choices for addressing it? • What choices do real systems choose?

Illustrating the Problem File server file foo We might be in trouble open(foo) write(foo) open(foo) write(foo) File client A File client B

What Happens Next? File server What does system do now? But B has a different version of foo close(foo) File client A File client B

AFS Callbacks AFS server File foo: Client A Caching records allow server to perform callbacks Client B invalidate(foo) AFS client A AFS client B What does B do now?

The AFS Solution • Allow conflicts to occur • Locking can be used to prevent them • But AFS locking is advisory, not mandatory • Conflicts handled at the client • Originally action not specified • Later versions of AFS included automated and manual conflict resolution tools

Conflicts and Other File Systems • This problem is not unique to AFS • Any file system that allows caching faces this issue • Possible options: • Don’t allow multiple nodes to cache • Invalidate all cached copies before writes • Obtain locks to ensure no conflicts • Detect and handle conflicts • Allow multiple versions of a file

Implications of the Choices • Allow only one node to cache • No conflicts possible, so good consistency • Only one site can access file at a time, so concurrency is poor • Requires records at server • Complications in face of failures

Implications of the Choices • Invalidate all cached copies before write • No conflicts possible, so good consistency • Concurrency good when nobody writes • Updates delayed while handling invalidation of other copies • Requires records at server • Complications in face of failures

Implications of the Choices • Obtain locks to ensure no conflicts • No conflicts possible, so good consistency • Good concurrency if no one obtains a write lock • Updates not delayed (once lock obtained) • Introduces possibility of deadlock • Leases remove that • Requires records at server • Complications in face of failures

Implications of the Choices • Detect and handle conflicts • Conflicting updates are possible and may be hard to handle • Possibly requiring human intervention • Good concurrency for reads and writes • Minimal recordkeeping at server • Maybe none in some cases • Failures increase chances of problems, but otherwise no extra complications

Implications of the Choices • Allow multiple versions of a file • Excellent concurrency for both read and write • No extra complications for failures • Weird model of file behavior that can confuse users • Requires substantial recordkeeping

Some Other Systems’ Choices • CIFS – use locking to prevent concurrency problems (option 3) • NFS – allow concurrent updates, but minimize chances and detect them (option 4) • Ficus – similar to AFS • Lotus Notes – treat updates as “comments” on original (option 5)

One More (Crazy?) Option • Allow anyone to write their cached copy • Detect conflicting writes • Use some precedence mechanism to select one write to keep • Roll back all the others • What about actions based on those writes occurring, though?

A Generalization To Storage Issues • We were discussing distributed caching • Where there is one node that has the one true copy of the data • What if that’s not the case? • What if there is no primary copy? • How do our solutions change?

Making It More General • This issue isn’t just about caching for file systems • The cached copies are distributed state • The underlying issue is consistency of distributed state • So the problem arises in other distributed systems problems • Including problems not related to distributed storage

For Example, • Four nodes are performing a distributed computation • There are two different algorithms they could be using • One works well for simple data • One is more suitable for complex data • What if 2 nodes choose to use algorithm A and 2 nodes choose to use algorithm B? • A very similar problem

Something to Bear In Mind • The example shown had two cached copies • Your solution must work for more than two • Unless your system only supports two users • Ideally, it should work for as many copies as is possible • This point is true for any proposed solution to any distributed systems problem

Comparing Sample Systems • How do: • CIFS • NFS • AFS • handle similar problems with similar goals?

Examples of Remote File Systems CS 188 Distributed Systems January 29, 2015

Examples of Remote File Systems CS 188 Distributed Systems January 29, 2015

Presentation Transcript

Distributed File Systems

Distributed File Systems

Distributed File Systems

DISTRIBUTED FILE SYSTEMS

Distributed File Systems

Distributed File Systems

Distributed File Systems

Distributed File Systems

Distributed Systems Course Distributed File Systems

Distributed File Systems

Distributed File Systems

Introduction CS 188 Distributed Systems January 6, 2015

CS 194: Distributed Systems Distributed File Systems

Distributed File Systems

Data Replication CS 188 Distributed Systems February 3, 2015

Recovery From Failure in Distributed Systems CS 188 Distributed Systems February 26, 2015

Distributed Systems Course Distributed File Systems

DISTRIBUTED FILE SYSTEMS

Distributed File Systems

When Is Agreement Possible? CS 188 Distributed Systems February 24, 2015

Distributed File Systems