1 / 16

PVFS (parallel Virtual file system)

PVFS (parallel Virtual file system). Bohao She CSS 534 Spring, 2014. Background. First developed in 1993 by Walt Ligon and Eric Blumer

chesmu
Download Presentation

PVFS (parallel Virtual file system)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PVFS(parallel Virtual file system) Bohao She CSS 534 Spring, 2014

  2. Background • First developed in 1993 by Walt Ligon and Eric Blumer • It was conducted jointly between The Parallel Architecture Research Laboratory at Clemson University and The Mathematics and Computer Science Division at Argonne National Laboratory. • Funded by NASA Goddard Space Flight Center Code 930 and The National Computational Science Alliance through the National Science Foundation's Partnerships for Advanced Computational Infrastructure. • Based on Vesta, a parallel file system developed in IBM

  3. Features • Object-based design • All PVFS server requests involved objects call dataspaces • A dataspace can be used to hold: • File data • File metadata • Directory metadata • Directory entries • Symbolic links • Every dataspace in the file system has a unique handle • A dataspace has two components: • Bytestream, typically used to hold file data • Set of key/value pairs, typically used to hold metadata

  4. Features (cont.) • Separation of data and metadata • Client can access a server for metadata once • Then access data server without further interaction with the metadata server. • Removes a critical bottleneck from the system. • MPI-based requests • Client program requests data from PVFS it can supply a description of the data that is based on MPI_Datatypes

  5. Features (cont.) • Multiple network support • PVFS uses a networking layer named Buffer Message Interface (BMI) which provides a non-blocking message interface designed specifically for file systems.  • BMI has multiple implementation modules for a number of different networks used in high performance computing including TCP/IP, Myrinet, Infiniband, and Portals. • Stateless • Server does not share any state with each other or with clients. If a server crashes, another can be started in its place. • Update are performed without using locks.

  6. Features (cont.) • User-level implementation • Clients and servers are running at user level • It has an optional kernel module to allow a PVFS to be mounted like any other file system • Or, programs can be linked directly to user interface like MPI-IO and Posix-like interface • Makes it easy to install and less prone to causing system crashes • System-level interface • PVFS interface is designed to integrate at the system level • It exposes many of the features of the underlying file system so that interfaces can take advantage of them if desired • It is similar to the Linux VFS, making it easy to implement as a mountable file system.

  7. Architecture • There are four major components to the PVFS system: • Metadata server (MGR) • I/O server (ION or Iod) • PVFS native API (libpvfs on CN) • PVFS Linux kernel support Figure courtesy of P. H. Carns, W. B. Ligon III, R. B. Ross, and R. Thakur, "PVFS: A Parallel File System For Linux Clusters", Proceedings of the 4th Annual Linux Showcase and Conference, Atlanta, GA, October 2000, pp. 317-327

  8. Architecture (cont.) • The metadata server, MGR, manages all file metadata for PVFS files. Metadata is information which describes a file, such as its name, its place in the directory hierarchy, its owner, and how it is distributed across nodes in the system. • By having a daemon which atomically operates on file metadata we avoid many of the shortcomings of storage area network approaches, which have to implement complex locking schemes to ensure that metadata stays consistent in the face of multiple accesses. Figures courtesy of P. H. Carns, W. B. Ligon III, R. B. Ross, and R. Thakur, "PVFS: A Parallel File System For Linux Clusters", Proceedings of the 4th Annual Linux Showcase and Conference, Atlanta, GA, October 2000, pp. 317-327

  9. Architecture (cont.) • The I/O server, ION or Iod, handles storing and retrieving file data stored on local disks connected to the node. • These servers actually create files on an existing file system on the local node, and they use the traditional read(), write(), and mmap() for access to these files. • This means that one can use whatever local file system one likes for storing this data. One can even use software or hardware RAID support on the node to tolerate disk failures transparently and to create extremely large file systems. Figure courtesy of P. H. Carns, W. B. Ligon III, R. B. Ross, and R. Thakur, "PVFS: A Parallel File System For Linux Clusters", Proceedings of the 4th Annual Linux Showcase and Conference, Atlanta, GA, October 2000, pp. 317-327

  10. Architecture (cont.) • The PVFS native API, libpvfs, provides user-space access to the PVFS servers. It handles the scatter/gather operations necessary to move data between user buffers and PVFS servers, keeping these operations transparent to the user. • For metadata operations, applications communicate through the library with the metadata server. • For data access the metadata server is eliminated from the access path and instead I/O servers are contacted directly. Figures courtesy of P. H. Carns, W. B. Ligon III, R. B. Ross, and R. Thakur, "PVFS: A Parallel File System For Linux Clusters", Proceedings of the 4th Annual Linux Showcase and Conference, Atlanta, GA, October 2000, pp. 317-327

  11. Architecture (cont.) • The PVFS Linux kernel support provides the functionality necessary to mount PVFS file systems on Linux nodes. It includes a loadable module, an optional kernel patch to eliminate a memory copy, and a daemon (pvfsd) accesses the PVFS file system on behalf of applications. • This allows existing programs to access PVFS files without any modification. This support is not necessary for PVFS use by applications, but it provides an extremely convenient means for interacting with the system. Figure courtesy of P. H. Carns, W. B. Ligon III, R. B. Ross, and R. Thakur, "PVFS: A Parallel File System For Linux Clusters", Proceedings of the 4th Annual Linux Showcase and Conference, Atlanta, GA, October 2000, pp. 317-327

  12. File striping and partitioning ION CN • Though there are six IONs in this example, the file is striped across only three IONs, starting from node 2, because the metadata file specifies such a striping. • Each I/O daemon stores its portion of the PVFS file in a file on its ION local file system. • The name of this file is based on the inode number that the manager assigned to the PVFS file. It is 1092157504 in this example. Left images courtesy of P. H. Carns, W. B. Ligon III, R. B. Ross, and R. Thakur, "PVFS: A Parallel File System For Linux Clusters", Proceedings of the 4th Annual Linux Showcase and Conference, Atlanta, GA, October 2000, pp. 317-327 Right image courtesy of Munehiro Fukuda, “CSS534 Parallel Programming in Grid and Cloud, Lecture 11: File Management,” UW Bothell, 2014, pp. 17

  13. Performance over Ethernet Data courtesy of P. H. Carns, W. B. Ligon III, R. B. Ross, and R. Thakur, "PVFS: A Parallel File System For Linux Clusters", Proceedings of the 4th Annual Linux Showcase and Conference, Atlanta, GA, October 2000, pp. 317-327

  14. Performance over Myrinet Data courtesy of P. H. Carns, W. B. Ligon III, R. B. Ross, and R. Thakur, "PVFS: A Parallel File System For Linux Clusters", Proceedings of the 4th Annual Linux Showcase and Conference, Atlanta, GA, October 2000, pp. 317-327

  15. Pros and Cons • Pros: • Higher cluster performance than NFS. • Many hard drives to act a one large hard drive. • Works with current software. • Best when reading/writing large amounts of data • Cons: • Multiple points of failure. • Poor performance when using kernel module. • Not as good for “interactive” work.

  16. Question?

More Related