260 likes | 266 Views
This paper discusses the pNFS extension for NFSv4, which enables scalable I/O by incorporating object storage. It explores the problem of scalable I/O, reviews the proposal, and explains the status and next steps. Additionally, it delves into object storage and the object security scheme.
E N D
pNFS extension for NFSv4IETF-62 March 2005 Brent Welch welch@panasas.com June 5, 2014
Abstract • pNFS extends NFSv4 • Scalable I/O problem • (Brief review) of the proposal • Status and next steps • Object Storage and pNFS • Object security scheme IETF 61, Nov 2004 Page 2
pNFS • Extension to NFSv4 • NFS is THE file system standard • Fewest additions that enable parallel I/O from clients • Layouts are the key additional abstraction • Describe where data is located and how it is organized • Clients access storage directly, in parallel • Generalized support for asymmetric solutions • Files: Clean way to do filer virtualization, eliminate botteneck • Objects: Standard way to do object-based file systems • Blocks: Standard way to do block-based SAN file systems IETF 61, Nov 2004 Page 3
Scalable I/O Problem • Storage for 1000’s of active clients => lots of bandwidth • Scaling capacity through 100’s of TB and into PB • File Server model has good sharing and manageability • but it is hard to scale • Many other proprietary solutions • GPFS, CXFS, StorNext, Panasas, Sustina, Sanergy, … • Everyone has their own client • Like to have a standards based solution => pNFS IETF 61, Nov 2004 Page 4
Scaling and the Client • Gary Grider’s rule of thumb for HPC • 1 Gbyte/sec for each Teraflop of computing power • 2000 3.2 GHz processors => 6TF => 6 GB/sec • One file server with 48 GE NICs? I don’t think so. • 100 GB/sec I/O system in ’08 or ’09 for 100 TF cluster • Making movies • 1000 node rendering farm, plus 100’s of desktops at night • Oil and Gas • 100’s to 1000’s of clients • Lots of large files (10’s of GB to TB each) • EDA, Compile Farms, Life Sciences … • Everyone has a Linux cluster these days IETF 61, Nov 2004 Page 5
Asymmetric File Systems • Control Path vs. Data Path (“out-of-band” control) • Metadata servers (Control) • File name space • Access control (ACL checking) • Sharing and coordination • Storage Devices (Data) • Clients access storage directly • SAN Filesystems • CXFS, EMC Hi Road, Sanergy, SAN FS • Object Storage File Systems • Panasas, Lustre Client Storage Client Storage Client Storage Client Storage Client Storag Storage Storage Protocol NFSv4 + pNFS ops Storage Management Protocol pNFS Server IETF 61, Nov 2004 Page 6
NFS “Head” NFS “Head” NFS “Head” Native Client Native Client Native Client NFS and Cluster File Systems • Currently, NFS head can be a client of a cluster file system (instead of a client of the local file system) • NFSv4 adds more state, makes integration with cluster file system more interesting NFS Client NFS Client NFS Client NFS Client NFS Client NFS Client NFS Client NFS Client Native Client Native Client Native Client Native Client Cluster FS Cluster FS Cluster FS Cluster FS IETF 61, Nov 2004 Page 7
pNFS and Cluster File Systems • Replace proprietary cluster file system protocols with pNFS • Lots of room for innovation inside the cluster file system pNFS Client pNFS Client pNFS Client pNFS Client pNFS Client pNFS Client pNFS Client pNFS Client Native Client Native Client Native Client Native Client Cluster FS + NFSv4 Cluster FS + NFSv4 Cluster FS + NFSv4 Cluster FS + NFSv4 Cluster FS + NFSv4 IETF 61, Nov 2004 Page 8
pNFS Ops Summary • GETDEVINFO • Maps from opaque device ID used in layout data structures to the storage protocol type and necessary addressing information for that device • LAYOUTGET • Fetch location and access control information (i.e., capabilities) • LAYOUTCOMMIT • Commit write activity. New file size and attributes visible on storage. • LAYOUTRELEASE • Give up lease on the layout • CB_LAYOUTRETURN • Server callback to recall layout lease IETF 61, Nov 2004 Page 9
Multiple Data Server Protocols • BE INCLUSIVE !! • Broaden the market reach • Three (or more) flavors of out-of-band metadata attributes: • BLOCKS: SBC/FCP/FC or SBC/iSCSI… for files built on blocks • OBJECTS: OSD/iSCSI/TCP/IP/GE for files built on objects • FILES: NFS/ONCRPC/TCP/IP/GE for files built on subfiles • Inode-level encapsulation in server and client code Client Apps pNFS IFS Layout driver NFSv4 extendedw/ orthogonallayout metadataattributes 1. SBC (blocks)2. OSD (objects)3. NFS (files) pNFS server Layout metadatagrant & revoke Local Filesystem IETF 61, Nov 2004 Page 10
March 2005 Connectathon • NetApp server prototype by Garth Goodson • NFSv4 for the data protocol • Layouts are an array of filehandles and stateIDs • Storage Mangement protocol is also NFSv4 • Client prototype by Sun • Solaris 10 variant IETF 61, Nov 2004 Page 11
Object Storage • Object interface is midway between files and blocks (think inode) • Create, Delete, Read, Write, GetAttr, SetAttr, … • Objects have numeric ID, not pathnames • Objects have a capability based security scheme (details in a moment) • Objects have extensible attributes to hold high-level FS information • Based on NASD and OBSD research out of CMU (Gibson et. al) • SNIA T10 standards based. V1 complete, V2 in progress. IETF 61, Nov 2004 Page 12
Objects as Building Blocks Object Comprised of: User Data Attributes Layout Interface: ID <dev,grp,obj> Read/Write Create/Delete Getattr/Setattr Capability-based File Component: Stripe files acrossstorage nodes IETF 61, Nov 2004 Page 13
Objects and pNFS • Clients speak pNFS to metadata server • Or they will, eventually • Clients speak iSCSI/OSD to the data servers (OSD) • Files are striped across multiple objects • Metadata manager speaks iSCSI/OSD to data server • Metadata manager uses extended attributes on objects to store metadata • High level file system attributes like owners and ACLs • Storage maps • Directories are just files stored in objects IETF 61, Nov 2004 Page 14
Object Security (1) • Clean security model based on shared, secret device keys • Part of T10 standard • Metadata manager knows device keys • Metadata manager signs caps, OSD verifies them (explained next slide) • Metadata server returns capability to the client • Capability encodes: object ID, operation, expire time, data range, cap version, and a signature • Data range allows serialization over file data, as well as different rights to different attributes • Cap version allows revocation by changing it on the object • May want to protect capability with privacy (encryption) to avoid snooping the path between metadata manager and client IETF 61, Nov 2004 Page 15
Object Security (2) • Capability Request (client to metadata manager) • CapRequest = object ID, operation, data range • Capability signed with device key known by metadata manager • CapSignature = {CapRequest, Expires, CapVersion} KeyDevice • CapSignature used as a signing key (!) • OSD Command (client to OSD) • Request contains {CapRequest, Expires, CapVersion} + other details + nonce to prevent replay attacks • RequestSignature = {Request} CapSignature • OSD can compute CapSignature by signing CapRequest with its own key • OSD can then verify RequestSignature • Caches and other tricks used to make this go fast IETF 61, Nov 2004 Page 16
Scalable Bandwidth IETF 61, Nov 2004 Page 17
Per Shelf Bandwidth IETF 61, Nov 2004 Page 18
Scalable NFS IETF 61, Nov 2004 Page 19
Rendering load from 1000 clients IETF 61, Nov 2004 Page 20
Status • pNFS ad-hoc working group • Dec ’03 Ann Arbor, April ’04 FAST, Aug ’04 IETF, Sept ’04 Pittsburgh • Internet Drafts • draft-gibson-pnfs-problem-statement-01.txt July 2004 • draft-gibson-pnfs-reqs-00.txt October 2004 • draft-welch-pnfs-ops-00.txt October 2004 • draft-welch-pnfs-ops-01.txt March 2005 • Next Steps • Add NFSv4 layout and storage protocol to current draft • Add text for additional error and recovery cases • Add separate drafts for object storage and block storage IETF 61, Nov 2004 Page 21
Backup IETF 61, Nov 2004 Page 22
Object Storage File System • Out of band control path • Direct data path IETF 61, Nov 2004 Page 23
“Out-of-band” Value Proposition • Out-of-band allows a client to use more than one storage address for a given file, directory or closely linked set of files • Parallel I/O direct from client to multiple storage devices • Scalable capacity: file/dir uses space on all storage: can get big • Capacity balancing: file/dir uses space on all storage: evenly • Load balancing: dynamic access to file/dir over all storage: evenly • Scalable bandwidth: dynamic access to file/dir over all storage: big • Lower latency under load: no bottleneck developing deep queues • Cost-effectiveness at scale: use streamlined storage servers • pNFS standard leads to standard client SW: share client support $$$ IETF 61, Nov 2004 Page 24
Scaling and the Server • Tension between sharing and throughput • File server provides semantics, including sharing • Direct attach I/O provides throughput, no sharing • File server is a bottleneck between clients and storage • Pressure to make server ever faster and more expensive • Clustered NAS solutions, e.g., Spinnaker • SAN filesystems provide sharing and direct access • Asymmetric, out-of-band system with distinct control and data paths • Proprietary solutions, vendor-specific clients • Physical security model, which we’d like to improve Client Client Client Client Server IETF 61, Nov 2004 Page 25
Symmetric File Systems • Distribute storage among all the clients • GPFS (AIX), GFS, PVFS (User Level) • Reliability Issues • Compute nodes less reliable because of the disk • Storage less reliable, unless replication schemes employed • Scalability Issues • Stealing cycles from clients, which have other work to do • Coupling of computing and storage • Like early days of engineering workstations, private storage IETF 61, Nov 2004 Page 26