1 / 80

Distributed File Systems

Distributed File Systems. NFS AFS SMB. http://en.wikipedia.org/wiki/Network_file_system. DFS Introduction. Distributed File Systems (DFS). Any computer file system that supports sharing of files and other resources as persistent storage over a computer network

Pat_Xavi
Download Presentation

Distributed File Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distributed File Systems NFS AFS SMB

  2. http://en.wikipedia.org/wiki/Network_file_system DFS Introduction

  3. Distributed File Systems(DFS) • Any computer file system that supports sharing of files and other resources as persistent storage over a computer network • First file servers were developed in the 1970s • Network File System (NFS) • Created at Sun Microsystems • 1985 • First widely used distributed file system • Other notable distributed file systems • Andrew File System (AFS) • Server Message Block SMB • AKA CIFS

  4. Distribution

  5. Distribution • Distributed File System (DFS) • Network file system with: • Clients, servers, and storage devices • Spread throughout the machines of a distributed system or intranet • Service activity occurs across the network • System has multiple and independent storage devices • Some DFS servers run on dedicated machines • Some machines can be both a server and a client • A DFS can be implemented: • As part of a distributed operating system • By a software layer • Managing the communication between conventional operating systems and file systems

  6. Distribution • Distinctive feature of a DFS: • System with • many - and - • autonomous • Clients and Servers

  7. Note: • NFS is a DFS for the Unix family • Network File System • Originated in the Unix world • Implementations for other OSes • NFS sometimes used as a synonym for DFS • Be careful of the distinction

  8. DFS attributes

  9. Transparency • A DFS should appear to its users to be the same as a conventional, centralized file system • Multiplicity and dispersion of its servers and storage devices should be invisible • Client interface used by programs should not distinguish between local and remote files • DFS locates the files and arranges the transport of data

  10. Performance • Most important performance measurement of a DFS: • Amount of time needed to satisfy service requests • Typical performance constraints: • Conventional file systems (local drives): • Disk-access time • Small amount of CPU-processing time • DFS: • Additional overhead due to the distributed structure • Includes: • Time to deliver the request to a server • Time to deliver the response to the client • For each direction, a CPU overhead of running the communication protocol software

  11. Performance • Performance of DFS can be viewed as one dimension of its transparency • Ideally, it would be comparable to that of a conventional file system

  12. Concurrent file updates • DFS should allow • Multiple client processes • On multiple machines • Access and update the same files • Updates to the file from one client should not interfere with access and updates from other clients • Concurrency control or locking may be: • Built into the file system • Provided by an add-on protocol

  13. Examples

  14. Sample Distributed File Systems • Client-server file systems • 9P • Amazon S3 • Coda • Secure File System (SFS) • Self-certifying File System (SFS)

  15. Sample Distributed File Systems • Distributed file systems • Andrew File System (AFS), OpenAFS • DCE Distributed File System • Remote file access protocol used with the Distributed Computing Environment • Google File System • File system that is optimized for Google's core storage needs • Hadoop • Java software framework that supports distributed applications running on large clusters of commodity computers • Kosmos FS • High performance distributed file system. It is also integrated with hadoop. • Lustre • Distributed file system for large scale cluster computing • Microsoft Distributed File System • Set of client and server services for organizing many distributed SMB file shares into a distributed file system • GlusterFS • Platform for developing clustering applications • Parallel Virtual File System • Parallel file system that focuses on high performance access to large data sets

  16. Network File Systems

  17. http://en.wikipedia.org/wiki/Network_File_System_%28protocol%29http://en.wikipedia.org/wiki/Network_File_System_%28protocol%29 NFS Protocol

  18. Network File SystemProtocol • Network file system protocol: • Allows a user on a client computer to access files over a network • As if the network devices were attached to its local disks • Originally developed by Sun Microsystems in 1984 • NFS builds on Open Network Computing Remote Procedure Call system • ONC RPC • Network File System protocol specified in: • RFC 1094 • RFC 1813 • RFC 3530 (which obsoletes RFC 3010)

  19. Versions and variations

  20. Versions and variations • Version 1 • Used only for Sun in-house experimental purposes • Not released to public

  21. Versions and variations • Version 2 • Originally operated entirely over UDP • Defined in RFC 1094 • March 1989 • Designers meant to keep the protocol stateless • Locking (for example) implemented outside of the core protocol

  22. Versions and variations • Version 3 (RFC 1813, June 1995) added: • Support for 64-bit file sizes and offsets • Handle files larger than 4 gigabytes (GB) • 32 bit address limit • Support for asynchronous writes on the server • Improve write performance • Additional file attributes in many replies • Avoid the need to re-fetch them • READDIRPLUS operation • Get file handles and attributes • Along with file names when scanning a directory • Assorted other improvements

  23. Versions and variations • At the introduction of Version 3 • Vendor support for TCP as a transport-layer protocol began increasing • Several vendors had already added support for NFS Version 2 with TCP as a transport • Sun Microsystems added support for TCP as a transport for NFS at the same time it added support for Version 3 • Using TCP as a transport made using NFS over a WAN more feasible

  24. Versions and variations • Version 4 • First version developed with the IETF • After Sun Microsystems handed over the development of the NFS protocols

  25. Versions and variations • Version 4 includes: • Performance improvements • Mandates strong security • Introduces a stateful protocol • Defined by • RFC 3010, December 2000 • Revised in RFC 3530, April 2003

  26. Versions and variations • Various side-band protocols have become associated with NFS, including: • Byte-range advisory Network Lock Manager (NLM) protocol • Added to support UNIX System V file-locking APIs • Remote quota reporting (RQUOTAD) protocol • Allow NFS-users to view their data-storage quotas on NFS servers

  27. Versions and variations • WebNFS • Allows NFS to • Integrate more easily into Web-browsers • Enable operation through firewalls • Was an extension to Version 2 and Version 3 • Sun Microsystems open sourced their WebNFS implementation • via https://yanfs.dev.java.net/

  28. Platforms

  29. Platforms • NFS • Most common with UNIX like OSes • Other software platforms: • Mac OS • Microsoft Windows • Novell NetWare • IBM AS/400

  30. Platforms • Alternative remote file access protocols include • Server Message Block (SMB) protocol • Also known as CIFS • Apple Filing Protocol (AFP) • NetWare Core Protocol (NCP) • OS/400 File Server file system (QFileSvr.400) • Note: • SMB and NetWare Core Protocol (NCP) are most common on Microsoft Windows • AFP is most common on (old) Macintosh systems • QFileSvr.400 is most common on AS/400 systems

  31. Typical implementation

  32. Typical implementation • Unix-style scenario: • Client • Requires access to data • Server • Serves data

  33. Typical implementation • Server Side • Server • NFS daemon (nfsd) • Make data generically available to clients • Server administrator • Determines what resources to make available • Exporting names and parameters of directories • Typically using the • /etc/exports configuration file • exportfs command • Server security-administration • Recognizes and approves validated clients • Server network configuration • Lets appropriate clients can negotiate with it through any firewall system

  34. Typical implementation • Client Side • Client machine • Requests access to exported data • Issues a mount command • Users on the client machine • View and interact with mounted file systems on the server • Within the parameters permitted • Permissions, etc. • Note: automating the NFS mounting process is typical • E.g. using /etc/fstaband/or automounting facilities

  35. NFS Protocol development versus Competing protocols

  36. NFS Protocol development versus competing protocols • 1980s • NFS and ONC figured prominently in the network-computing war between Sun Microsystems and Apollo Computer, and later the UNIX wars (ca 1987-1996) between AT&T and Sun on one side, and Digital Equipment, HP, and IBM on the other. • During the development of the ONC protocol (called SunRPC at the time), only Apollo's Network Computing System (NCS) offered comparable functionality. Two competing groups developed over fundamental differences in the two remote procedure call systems. Arguments focused on the method for data-encoding — ONC's External Data Representation (XDR) always rendered integers in big-endian order, even if both peers of the connection had little-endian machine-architectures, whereas NCS's method attempted to avoid byte-swap whenever two peers shared a common endianness in their machine-architectures. An industry-group called the Network Computing Forum formed (March 1987) in an (ultimately unsuccessful) attempt to reconcile the two network-computing environments. • Later, Sun and AT&T announced that the two firms would jointly develop AT&T's next version of UNIX: System V Release 4. This caused many of AT&T's other licensees of UNIX System V to become concerned that this would put Sun in an advantaged position, and it ultimately led to Digital Equipment, HP, IBM, and others forming the Open Software Foundation (OSF) in 1988. Ironically, Sun and AT&T had previously competed over Sun's NFS versus AT&T's Remote File System (RFS), and the quick adoption of NFS over RFS by Digital Equipment, HP, IBM, and many other computer vendors tipped the majority of users in favor of NFS. • OSF solicited the proposals for various technologies, including the remote procedure call (RPC) system and the remote file access protocol. In the end, proposals for these two requirements, called respectively, the Distributed Computing Environment (DCE), and the Distributed File System (DFS) won over Sun's proposed ONC and NFS. DCE derived from a suite of technologies, including NCS and Kerberos. DFS used DCE as the RPC and derived from AFS.

  37. Protocol development versus competing protocols • 1990s • Sun Microsystems and the Internet Society (ISOC) reached an agreement to cede "change control" of ONC RPC so that ISOC's engineering-standards body, the Internet Engineering Task Force (IETF), could publish standards documents (RFCs) documenting the ONC RPC protocols and could extend ONC RPC. OSF attempted to make DCE RPC an IETF standard, but ultimately proved unwilling to give up change-control. Later, the IETF chose to extend ONC RPC by adding a new authentication flavor, RPCSEC GSS, in order to meet IETF's requirements that protocol standards have adequate security. • Later, Sun and ISOC reached a similar agreement to give ISOC change control over NFS, although writing the contract carefully to exclude NFS version 2 and version 3. Instead, ISOC gained the right to add new versions to the NFS protocol, which resulted in IETF specifying NFS version 4 in 2003. • 2000s • By the 21st century, neither DFS nor AFS had achieved any major commercial success as compared to CIFS or NFS. IBM, which had previously acquired the primary commercial vendor of DFS and AFS, Transarc, donated most of the AFS source code to the free software community in 2000. The OpenAFS project lives on. In early 2005, IBM announced end of sales for AFS and DFS.

  38. Present • NFSv4.1 presents new opportunities for the NAS community • Primary addition is Parallel NFS (pNFS) • A method of introducing data access parallelism • Protocol defines a method of separating the meta-data of a file system from the location of the file data • Meta-data: names and attributes • Goes beyond simple name/data separation • Defines method of striping the data amongst a set of data servers • Very different from the traditional NFS server • Holds the names of files and their data under the single umbrella of the server • Products exist that are multi-node NFS servers • Limited participation by the client in separation of meta-data and data • NFSv4.1 client can be enabled to be a direct participant in the exact location of file data • Avoid sole interaction with the single NFS server when moving data

  39. Present • NFSv4.1 pNFS server: • A collection or community of server resources or components • Community members are controlled by the meta-data server • pNFS client still accesses a single meta-data server for traversal or interaction with the namespace • Client moves data to and from the server • Directly interact with the set of data servers belonging to the pNFS server community

  40. Present • NFSv4.1 provides: • Sessions • Directory Delegation and Notifications • Multi-server Namespace • ACL/SACL/DACL • access control lists • system access control list • discretionary access control list • Retention Attributions • SECINFO_NO_NAME

  41. Future • NFSv4.2 • Under development • Some features are currently useable

  42. AFS http://en.wikipedia.org/wiki/Andrew_File_System

  43. Andrew File System(AFS) • Distributed networked file system • Developed by Carnegie Mellon University • Part of the Andrew Project • Named for • Andrew Carnegie • Andrew Mellon • Primary used in distributed computing

  44. Features

  45. Features • AFS has several benefits over traditional networked file systems • Particularly in the areas of security and scalability • Not uncommon for enterprise AFS cells to exceed 50,000 clients • Uses Kerberos for authentication • Implements access control lists on directories for users and groups • Each client caches files on the local file system for increased speed on subsequent requests for the same file • Allows limited file system access in the event of a server crash or a network outage

  46. Features • Read and write operations on an open file are directed only to the locally cached copy • When a modified file is closed • Only the changed portions copied back to the file server • Cache consistency maintained by callback • When a file is cached • Server makes a note of this • Will inform the client if the file is updated by someone else • After any client, server, or network failure (including a time-out) • Callbacks are discarded and must be re-established • Re-establishing a callback involves: • Status check • Does not require re-reading the file itself

  47. Features • Consequence of the whole file locking strategy • AFS does not support large shared databases or record updating within files shared between client systems • Deliberate design decision • Based on the perceived needs of the university computing environment • Leads to the use of a single file per message in the original email system for the Andrew Project, the Andrew Message System, rather than the more customary single file per mailbox

  48. Features • A significant feature of AFS is the volume • Tree of files • Sub-directories • AFS mountpoints • Links to other AFS volumes • Volumes are created by administrators • Linked at a specific named path in an AFS cell • Once created, users of the filesystem may create directories and files as usual without concern for the physical location of the volume • A volume may have a quota assigned to it in order to limit the amount of space consumed. • As needed, AFS administrators can move that volume to another server and disk location without the need to notify users • Moving can occur while files in that volume are even being used

  49. Features • AFS volumes can be replicated to read-only cloned copies • When accessing files in a read-only volume, a client system will retrieve data from any read-only copy • If a copy becomes unavailable, clients will look for any remaining copies • Users of that data are unaware of the location of the read-only copy • Administrators can create and relocate copies as needed • AFS command suite guarantees that all read-only volumes contain exact copies of the original read-write volume at the time the read-only copy was created

  50. Features • File name space on an Andrew workstation is partitioned into a shared and local name space • Shared name space identical on all workstations • Usually mounted as /afson the Unix file system • Local name space unique to each workstation • Contains only temporary files needed for workstation initialization and symbolic links to files in the shared name space

More Related