340 likes | 447 Views
File Systems IV. CS 423, Fall 2007 Klara Nahrstedt/Sam King. Administrative. MP3 deadline, November 5, 2007 Today discussion Basic Concepts RPC, Reliability/Failure, State, Replication Examples of Distributed File Systems NFS, AFS, Google. Remote Procedure Call.
E N D
File Systems IV CS 423, Fall 2007 Klara Nahrstedt/Sam King
Administrative • MP3 deadline, November 5, 2007 • Today discussion • Basic Concepts • RPC, Reliability/Failure, State, Replication • Examples of Distributed File Systems • NFS, AFS, Google
Remote Procedure Call RPC servers will call arbitrary functions in dll, exe, with arguments passed over the network, and return values back over network Client: Server: foo.dll,bar(4, 10, “hello”) “returned_string” foo.dll,baz(42) err: no such function …
Possible Interfaces RPC can be used with two basic interfaces: synchronous and asynchronous Synchronous RPC is a “remote function call” – client blocks and waits for return val Asynchronous RPC is a “remote thread spawn”
Wrapper Functions Writing rpc_call(foo.dll, bar, arg0, arg1..) is poor form Confusing code Breaks abstraction Wrapper “stub” function makes code cleaner bar(arg0, arg1); //programmer writes this; // makes RPC “under the hood”
More Design Considerations Who can call RPC functions? Anybody? How do you handle multiple versions of a function? Need to marshal objects How do you handle error conditions? Numerous protocols: DCOM, CORBA, JRMI…
Characteristics of Reliable DFS • Fault-tolerant • Highly available • Recoverable • Consistent • Scalable • Predictable performance • secure
Failures • Hardware failures • Happen less frequently now • Software failures • Software bugs account for estimated 25-35% of unplanned downtime • Residual bugs in mature systems • Heisenbug – disappears or alters characteristics when observed or researched • Bohrbug – does not disappear or alter its characteristics when researched – manifests itself under well-defined conditions
Specific Failures in DFS • Halting failures • Fail-stop • Omission failures • Network failures • Network partition failures • Timing failures • Byzantine failures
8 Fallacies • Network is reliable • Latency is 0 • Bandwidth is infinite • Network is secure • Topology does not change • There is one administrator • Transport cost is 0 • Network is homogeneous
Stateful versus Stateless Service • Stateful • Server records which client is accessing file • What are the advantages and disadvantages? • UNIX stateful • Stateless • Each request independent from previous requests (contains state info) • What are the advantages and disadvantages? • NFS stateless
File Replication • Replicas of same file reside on failure-independent machines • Improves availability • Replicas should be invisible, yet distinguished at lower levels • Updates to replicas must be duplicated -- need exactly once semantics. • Demand replication -- build a cache of whole file
Network File System (NFS) • Arbitrary collection of clients and servers share a common file system • Multiple file system trees on different machines can be mounted together • Mount procedure • OS is given the name of the device and the location within the file structure at which to attach the file system (called ‘mount point’ ) • OS verifies that the device contains a valid file system • OS notes in its directory structure that a file system is mounted at the specified mount point
Major Layers of NFS Architecture • vnode -- network wide unique (like an inode but for a network) • RPC and NFS Service layer -- NFS Protocol • Path name look up (past mount point) requires RPC per name. • client cache of remote vnodes for remote directory names • Can Client access another server through a server?
NFS Protocols (2 client-server protocols) • First NFS protocol handles mounting • A client can send a path name to a server and request permission to mount that directory somewhere in its directory hierarchy • The place where it is to be mounted is not contained in the message, as the server does not care where it is mounted • If the path name is legal and the directory specified has been exported, the server returns a file handle to the client • File handle contains fields uniquely identifying the file system type, disk, i-node number of the directory, security information
NSF Protocols • Second NFS Protocol is for directory and file access • Clients send messages to servers to manipulate directories, read and write files • Clients access file attributes • NFS ‘read’ operation • Lookup operation – returns file handle • Read operation – uses file handle to read the file • Advantage: stateless server !!!!
NFS Caching • File blocks and file-attribute caches • Attributes used only if up to date. Discarded after 60 seconds. • Read-ahead and delayed write techniques used. • Delayed write used even for concurrent access (not UNIX semantics.) • New files may not be visible for 30 seconds. • Updated files may not be visible to systems with file open for reading for a while.
SUN Network File System • Uses UDP/IP protocol and stateless server • A remote file system is mounted over a local file system directory • Local file system directory is no longer visible. • The mount command uses name of remote machine • No concurrency control mechanisms, modified data must be committed to server disk before request returned to client to avoid problems • Works on heterogeneous machines by using a machine independent RPC
Andrew File System Architecture User admin cache servers volumes Desktop computers /afs/hq.firm/ User/alice Solaris/bin Group/research Transarc.com/pub AFS Namespace
AFS (1) • AFS – Andrew File System • workstations grouped into cells • note position of venus and vice • Client's view
AFS (2) • Aimed at scalability • Clients are not servers • Local name space and shared name space • Local name space is root file system • Whole file caching • Clients may access files from any workstation using same name space
AFS (3) • Security imposed at server interfaces -- no client programs run on servers. • Access lists for files • Client workstation interacts with servers only during opening and closing of files • Reading and writing bytes performed by kernel • AFS used by NCSA Web server over FDDI
Google File System (GFS) • Design Assumptions • Component failures are norm rather than the exception • Inexpensive commodity components • Files are huge by traditional standards • Multi-GB files are common • High sustained bandwidth is more important than low latency "The Google File System", SOSP 2003
GFS (Design Assumptions) • Most files are mutated by appending new data rather than overwriting existing data • Once written, the files are only read and often only sequentially • Two types of reads – large streaming reads and small random reads • Efficient implementation • Allow multiple clients read the same file • Atomicity with minimal synchronization overhead is essential
GFS Design Parameters • Client and Chunkserver can run on the same machine • Files divided into fixed-sized chunks (64 MB) • Chunk handle • What are the tradeoffs of the 64 MB chunk size? • Chunkserver stores chunks on local disks as Linux files • Chunks are replicated on multiple chunkservers • Master maintains all file system metadata – stateful (namespace, access control, mapping file to chunks, current location)
GFS Design Parameters • Client’s code implements GFS API and communicates with master and chunkservers to read and write • Client communicates with master for metadata information • Client communicates with chunkservers for data over TCP/IP • No data caching!!!
GFS Design Parameters • Single Master • Sophisticated chunk placement • Replication decisions using global knowledge • Minimal involvement in reads and writes • Metadata: file and chunk namespaces, mapping from files to chunks and locations of each chunk’s replicas. • Heartbeat protocol between master and chunkservers
GFS Consistency Model • Relaxed consistency model • File namespace mutations (e.g., file creation) are atomic (handled by master) • Data mutations (e.g. Writes or record appends) are atomic • Each mutation is performed at all chunk’s replicas • Stale replica detection • use chunk version number to distinguish up-to-date and stale replicas