200 likes | 688 Views
Implementing NFSv4 for HPSS: the GANESHA architecture Philippe DENIEL CEA/DAM philippe.deniel@cea.fr There are 3 goals in this presentation: Describing the NFSv4 enhancements compared to NFSv2 and NFSv3
E N D
Implementing NFSv4 for HPSS:the GANESHA architecture Philippe DENIEL CEA/DAM philippe.deniel@cea.fr
There are 3 goals in this presentation: Describing the NFSv4 enhancements compared to NFSv2 and NFSv3 Describing the design and architecture of our development of the NFS protocols on top of HPSS : the GANESHA architecture. Showing how these new features can be interesting for the HPSS community Scope of this presentation
NFS v2 was known for being a "home made" protocol, designed by Sun Microsystems, in 1984 NFS v3 was a little more discussed and several companies took part in the design of the protocol NFSv4 is the result of an IETF working group, like TCP, UDP or IPv6. Design process started in 1997, and ends with edition of RFC3530 in late 2003. NFS v4 is an IETF protocol
NFS v4 is defined by RFC3530 NFSv4 is a standalone protocol; it requires no ancillary protocol Mount protocol, NLM, NFS Stat are no more needed Port 2049 is the only resource required by NFSv4 This value is written explicitly in the RFC NFSv4 is not bound to a Unix semantic File system object's attributes are shown as self-described bitmaps, not as a Unix-like structure NFS v4 is firewall friendly NFS v4 is not bound to Unix A more integrated protocol
NFSv4 is design to work on high latency / low bandwidth network • NFSv4 • latency ~100ms • rate ~KB/sec • Distance ~ 1000 km • Designed for WAN • NFSv2 and v3 • latency ~1ms • rate ~MB/sec • Distance ~ 100 meters • Designed for LAN Designed for the Internet
Semantic is not necessary a Unix-like system no link to Unix structures, information is managed as bitmaps User/group managed as strings, not ids NFSv4 can export file systems with reduced attributes (like PC FAT), or extended attributes Access Control Lists are natively supported ACL model suits the need of diverse ACLs models (POSIX and NT) Windows OPEN/CLOSE semantics is supported Non latin characters are supported via UTF-8 NFSv4 will support minor versioning for protocol extensions NFSv4 is ready for IPv6 NFSv4 is not dedicated to Unix clients 0010011 Philippe.deniel@cea.fr UNIX Uid = 6734 NT sid = 12-34-5678-9012 Cross-Platform interoperability
NFSv4 is firewall friendly Only ports 2049/tcp and 2049/udp will be use; no other port is required NFSv4 is ONC/RPC based RPCSEC_GSS is explicitly supported Every security paradigm with a GSSAPI integration can be used with RPCSEC_GSS : krb5, LIPKEY, SPKM3 NFSv4 is connection oriented Connection based security (like SSL) is possible RFC3530 recommends not to use SSL, but to use LIPKEY via GSSAPI and RPCSECGSS instead NFSv4 is security oriented
NFSv4 compound requests are kind of lists of elementary operations to be performed on the server The client can do many things in one call: the client/server dialog is reduced and becomes more flexible The client can perform the request that fits correctly its caching policy Elementary operation are dedicated to cache validation implementation on the client side (OP_VERIFY, OP_NVERIFY) NFSv4 clients have the capability to handle locally, in its cache, a file for a given time period (delegation mechanism) NFSv4 could be very efficient in a large scale through a proxy caching Proxies with policies similar to HTTP proxies can cache files accessed by a pack of clients NFSv2/3 NFSv4 server client client server NFSv4 Caching capabilities
NFSv4 protocol can be interesting with HPSS on several points Aggressive caching on both client and server sides reduces the amount of request performed to the HPSS systems HPSS specific information could be obtained, on a per file handle base, via the use of NFSv4 named attributes Class of Service (set / get) Storage Class related information (get only) Migration state ( get only) Site local attributes (set / get / create ) NFSv4 and its use with HPSS (1)
Native support of ACLs Secured mount points could be use to safely share files between remote sites (Kerberos 5 and possibly SPKM-3 or LIPKEY). Both TCP and UDP support as RPC transport layer Filesets semantics and junction traversal is natively supported with potential server indirection and security re-negotiation. Non Unix clients can be used (See Hummingbird’s NFS Maestro). NFSv4 and its use with HPSS (2)
GANESHA is a generic NFS server design. It has several component: RPC Layer : manages ONCRPC / RPCSEC_GSS requests FSAL : File System Abstraction layer, provides access to the exported file system Cache Inode layer and : File Content layer cache the metadata and information related to the managed FSAL objects Admin Layer: generic interface for administrative operation on the daemon (stop, start, statistics, …) ECL: External Control Layer, provides a way to administrate the server from the outside, on a client/server basis. Internal Logging / External Logging Memory management: resources are allocated at server’s boot, and managed internally. The GANESHA Architecture
RPC Dispatcher clients requests RPCSEC_GSS Dup Req Layer Syslog API External Control API Logging Admin Hash Tables GSSAPI Security Mount V1/V3 NFS V2 / V3 NFS V4 cache fs operations File system Abstraction Layer HPSS CLAPI Metadata Cache File Content layer fs operations Architecture of the GANESHA NFS Server
The FSAL semantics is closed to the NFSv4 semantics (for reducing structure conversion) Cache Inode uses Red-Black Tree based Hash Tables, designed to managed 10e5 to 10e6 entries at the same time: the objective is to keep a few days of production in this cache. File Content cache will be preserved in case of server crash recovery Architecture’s objectives
CLIENT SERVER 2: Dispatcher thread stores each request into a tbuf entry. V4 and V2-V3 requests are separated 1: Client sends request to server 5: the requested operation is done in HPSS NFS Requests Client Authentication Dispatcher Thread Access to HPSS V4 pending entry Dispatcher does authentication (based on nfs creds) FSAL V4 pending entry GSSAPI 4: One available Worker thread pick up a waiting tbuf entry. V4 Tbuf list GSSAPI V2/v3 pending entry Authentication management (e.g. Krb5) V2/v3 pending entry Worker thread One call per request Worker thread V4 bulk request (does several calls to HPSS) V2/v3 pending entry V2/V3 Tbuf list 3: Request is decoded and waits in a tbuf entry ( V2-V3 and V4 requests are separated) 0: Client/Server authentication 6: Results of the request are replayed to the client by the worker Thread NFS Daemon architecture
The FSAL layer is specific to the use of the NFS server with HPSS. It will be under the term of the HPSS licence The other layers will have no adherence with HPSS. They will be provided under the term of the CeCILL licence (CEA’s free software licence). They can be freely used with other FSAL modules based on other File Systems. GANESHA and HPSS
06/05: HPSS/FSAL, Cache Inode and File Content layers development almost complete Summer 05: complete the integration with RPC layers. The daemon should be NFSv2/NFSv3 capable and implement part of the NFSv4 protocol, but enough to be functional. Security support fully provided. Autumn 05: delegation support and named attributes support to be added to the daemon. Validation of non Unix clients 12/05: first full version of the product. After that : other FSAL modules to be developed. “Small files pack” specific FSAL to be added. What is available now ?
Hypothesis: the small files are located in the same directory tree A file containing a file system image (mountable on loopback) is created. The directory tree with the small file is stored in the filesystem internal to this file. The file system image is stored in HPSS (as a big file), but is seen as a NFSv4 fileset, with explicit export entry in the daemon configuration file When accessed the first time, the big “fs image” file is stored in a cache dedicated to “fs image” caching. Further operation to this fileset are done internally to the fs image (the NFS daemon can browse the fs image in user space) HPSS sees only a big file, small files exists only as they are seen by the NFS daemon inside the “fs image”. Small file management: a possible solution
Other FSAL will be provided FSAL for ext3fs and Reiserfs FSAL for LUSTRE FSAL implementing a user space NFSv4 client to build a NFSv4 proxy (server on one side, client on the other) Other GSSAPI’s quality of protection supported as soon as supported by Linux. Future evolution of GANESHA