510 likes | 685 Views
Distributed File Systems. Yih-Kuen Tsay Dept. of Information Management National Taiwan University. Purposes of a Distributed File System. Sharing of storage and information across a network Convenience (and efficiency) of a conventional file system
E N D
Distributed File Systems Yih-Kuen Tsay Dept. of Information Management National Taiwan University Distributed File Systems -- 1
Purposes of a Distributed File System • Sharing of storage and information across a network • Convenience (and efficiency) of a conventional file system • Persistent storage that most other services (e.g., Web servers) need Distributed File Systems -- 2
Properties of Storage Systems Other properties include availability, timing guarantees, etc. Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition. Distributed File Systems -- 3
Files • Files are an abstraction of permanent storage. • A file is typically defined as a sequence of similar-sized data items along with a set of attributes. • A directory is a file that provides a mapping from text names to internal file identifiers. Distributed File Systems -- 4
File Attributes Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition. Distributed File Systems -- 5
File Systems • Responsible for the (a) organization, (b) storage, (c) retrieval, (d) naming, (e) sharing, and (f) protection of files. • Provide a set of programming operations that characterize the file abstraction, particularly operations to read and write subsequences of data items beginning at any point of a file. Distributed File Systems -- 6
File System Modules A basic distributed file system implements all of the above plus modules for client-server communication and distributed naming and location of files. Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition. Distributed File Systems -- 7
UNIX File Operations Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition. Distributed File Systems -- 8
Distributed File System Requirements • Transparency: access, location, mobility, performance, and scaling transparency. • Concurrency (and Consistency) • Replication/Caching (and Consistency) • Hardware/operating system heterogeneity • Fault-Tolerance • Security (Access Control, Authentication) • Efficiency Distributed File Systems -- 9
A File Service Architecture Note: The modules communicate with one another by remote procedure calls. Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition. Distributed File Systems -- 10
File Service Components • Flat file service: implementing operations on the contents of files, which are referred to by unique file identifiers (UFIDs) • Directory service: mapping text names of files (including directories) to their UFIDs • Client module: integrating and extending the previous two services under a single application programming interface * Why is this structure more open and configurable? Distributed File Systems -- 11
Flat File Service Operations Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition. Distributed File Systems -- 12
Difference from UNIX • Immediate access to files using UFIDs (without open or close) • Read or write starts at the position indicated by a parameter • All operations, except create, are repeatable • Allows a stateless implementation Distributed File Systems -- 13
Access Control • Conventional access rights checks (at open calls) not feasible • Two ‘stateless’ approaches: * Capability (by manipulating the UFID) * User identity sent with every request (adopted in NFS and AFS) • Main problem: forged requests; some authentication mechanism is needed Distributed File Systems -- 14
Capabilities and UFIDs A capability is a binary value that acts as an access key; it can be encoded in the UFID. • Basic construction of a UFID: file group id + file number + random number • Additional field: permissions • Additional field: encryption of the permission field Distributed File Systems -- 15
Directory Service Operations Note: Each directory is stored as an ordinary file with a UFID. Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition. Distributed File Systems -- 16
The Network File System (NFS) • Introduced by Sun Microsystems in 1985, now an Internet standard • Runs on top of RPC (RFC 1831) • Implemented on most operating systems • Version described here: UNIX implementation of NFS Version 3 (RFC 1813, June 1995) • Most recent version: NFS Version 4 (RFC 3010, December 2000) Distributed File Systems -- 17
NFS Architecture Note: Each computer can act as both a client and a server. Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition. Distributed File Systems -- 18
The Virtual File System Module • Access transparency • File handles (file identifiers): • ‘filesystem indentifier’ + ‘i-node number’ + ‘i-node generation number’ • One VFS structure for each mounted filesystem • relates a remote filesystem (identified by its file handle obtained at mount time) to a local directory on which it is mounted • One v-node per open file • indicates whether a file is local (i-node) or remote (file handle) Distributed File Systems -- 19
The NFS Client Module in UNIX • Integrated with the kernel • Emulates the UNIX file system primitives • A single client module serves all user-level processes • The encryption key for authentication stored in the kernel • Caches file blocks Distributed File Systems -- 20
NFS Server Operations Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition. Distributed File Systems -- 21
NFS Server Operations (cont’d) Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition. Distributed File Systems -- 22
Remote File Accesses Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition. Distributed File Systems -- 23
File System Information in UNIX saturn:~ 35 % df -k Filesystem kbytes capacity Mounted on /dev/dsk/c0t3d0s0 143903 91% / /dev/dsk/c0t3d0s6 267943 99% /usr /dev/dsk/c0t3d0s3 15383 3% /tmp galaxy:/usr/local.real 4030440 53% /usr/local lucky:/var/mail.real 564648 86% /var/mail cosmos:/home.real/student/xxx 3941760 60% /home/xxx galaxy:/home.real/faculty/yyy 2964512 51% /home/yyy * Note: The output of ‘df -k’ has been edited. Distributed File Systems -- 24
Caching • Server caching • read-ahead • write-through • delayed-write with the commit operation • Client caching • cache validation (freshness interval and validation timestamp, modification timestamp and getattr, …) • bio-daemon (for read-ahead and delayed-write caching at the client side) Distributed File Systems -- 25
Achievements of NFS • Access and location transparency • Mobility transparency (partially) • Read-only file replication: the automounter • Fault-tolerance: stateless servers, the automounter • Efficiency: caching of disk blocks (main problem: frequent use of getattr) Nonachievements: scalability, concurrency and consistency, security (Kerberos), ... Distributed File Systems -- 26
The Andrew File System (AFS) • Developed at CMU • Current versions: AFS-2, AFS-3 • Compatible with NFS • Main achievement over (older) NFS: better scalability by minimizing client-server communication • Key characteristics: whole-file serving and caching (partial file caching allowed in AFS-3) Distributed File Systems -- 27
Observations on UNIX File Usage • Files are mostly small • Read operations are more common • Sequential accesses are more common • Most files are written by one user • Files are referenced in burst Distributed File Systems -- 28
AFS Architecture Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition. Distributed File Systems -- 29
AFS File Name Space Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition. Distributed File Systems -- 30
System Call Interception in AFS Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition. Distributed File Systems -- 31
AFS System Calls Implementation Distributed File Systems -- 32 Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.
Cache Consistency • A callback promise is provided when Vice supplies a copy of file to a Venus process • The callback promise stored with the cached copy is in either valid or cancelled state • When Venus handles an open, it checks the cache. Distributed File Systems -- 33
The Vice Service Interface Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition. Distributed File Systems -- 34
Enhancements to NFS and AFS • Spritely NFS • add open and close, use callbacks • NQNFS (Not Quite NFS) • use callbacks and leases • WebNFS • allow browsers and other applications to interact with an NFS server directly • NFS Version 4 (RFC 3010, December 2000) • incorporating all of the above and more • DCE/DFS (based on AFS) • use callbacks and write tokens (with a lifetime) Distributed File Systems -- 35
New Features of NFS Version 4 • Adoption of the RPCSEC_GSS (RFC 2203) security protocol • Multiple operations in one request • Better migration and replication abilities • A client may query the location(s) of a file system. • Introduction of open and close operations • Lease-based file locking • Callback-based delegation of files Distributed File Systems -- 36
New Design Approaches • Background • high-performance storage technology (e.g., RAID) • log-structure file systems (e.g., Sprite, BSD LFS) • high-performance switched networks (e.g., ATM, high-speed Ethernet) • Goals: high scalability and fault-tolerance • Main ideas: distribute file data among many nodes, separate responsibilities, … • Constraints: high level of trust Distributed File Systems -- 37
More Recent File System Designs • xFS • Serverless: all data, metadata, and control can be located anywhere in the system; any machine can take over the responsibilities of a failed one • Frangipani • Two-layer structure • the Petal distributed virtual disk system • the Frangipani server module Both designs utilize RAID-style striping, log-structured file storage, etc. Distributed File Systems -- 38
Log-based Striping in xFS Source: T.E. Anderson et al., Serverless Network File Systems, ACM TOCS 1996 Distributed File Systems -- 39
An xFS Configuration Source: T.E. Anderson et al., Serverless Network File Systems, ACM TOCS 1996 Distributed File Systems -- 40
A Frangipani Configuration Distributed File Systems -- 41 Source: C.A. Thekkath et al., Frangipani, A Scalable Distributed File System, ACM SOSP 1997
Storage Systems Distributed File Systems -- 42 Source: G.A. Gibson and R. van Meter, Network Attached Storage Architecture, CACM, November 2000.
NAS and SAN Note: the difference is disappearing. Distributed File Systems -- 43 Source: G.A. Gibson and R. van Meter, Network Attached Storage Architecture, CACM, November 2000.
Bandwith for Disk Access Source: E. Riedel, Storage Systems, Queue, June 2003. Distributed File Systems -- 44
Increasing the Bandwith Source: E. Riedel, Storage Systems, Queue, June 2003. Distributed File Systems -- 45
Virtualization in SAN Distributed File Systems -- 46 Source: E. Riedel, Storage Systems, Queue, June 2003.
Requirements for Storage Systems • Basic requirements: resource consolidation, rapid deployment, central management, convenient backup, high availability, data sharing. • Geographic separation • Security against an increasing risk of unauthorized access • Performance scalable with capacity (accesses per second or megabytes per second) Distributed File Systems -- 47