340 likes | 460 Views
Federated DAFS: Scalable Cluster-based Direct Access File Servers. Murali Rangarajan , Suresh Gopalakrishnan Ashok Arumugam, Rabita Sarker Rutgers University. Liviu Iftode University of Maryland. Network File Servers. TCP/IP. NFS. FILE SERVER. CLIENTS.
E N D
Federated DAFS: Scalable Cluster-based Direct Access File Servers Murali Rangarajan, Suresh Gopalakrishnan Ashok Arumugam, Rabita Sarker Rutgers University Liviu Iftode University of Maryland
Network File Servers TCP/IP NFS FILE SERVER CLIENTS • OS involvement increases latency & overhead • TCP/UDP protocol processing • Memory-to-memory copying
User-level Memory Mapped Communication OS Application Application Send Receive OS OS NIC NIC • Application has direct access to the network interface • OS involved only in connection setup to ensure protection • Performance benefits: zero-copy, low-overhead
Virtual Interface Architecture • Data transfer from user-space • Setup & Memory registration through kernel • Communication models • Send/Receive: a pair of descriptor queues • Remote DMA: receive operation not required Application VI Provider Library Setup & Memory registration RECV QUEUE COMP QUEUE SEND QUEUE Kernel Agent VI NIC
DAFS File Server Buffers DAFS File Server Buffers Driver KVIPL VIPL VI NIC Driver VI NIC Driver Direct Access File System Model DAFS Server Application Buffers DAFS Client File access API User VIPL Kernel VI NIC Driver NIC NIC
Goal: High-performance DAFS Server • Cluster-based DAFS Server • Direct access to network-attached storage distributed across server cluster • Clusters of commodity computers - Good performance at low cost • User-level communication for server clustering • Low-overhead mechanism • Lightweight protocol for file access across cluster
Outline • Portable DAFS client and server implementation • Clustering DAFS servers – Federated DAFS • Performance Evaluation
Application DAFS Server Application Application DAFS Server DAFS Server DAFS API Request DAFS Client DAFS API Response VI Network DAFS Client VI Local FS VI DAFS Client VI VI Local FS VI VI Local FS VI Network VI Network User-space DAFS Implementation • DAFS client and server in user-space • DAFS API primitives translate to RPCs on server • Staged Event Driven Architecture • Portable across Linux, FreeBSD and Solaris
Connection Request Connection Request Connection Request Connection Request Response DAFS API Request DAFS API Request DAFS API Request DAFS Server SERVER Connection Manager Protocol Threads CLIENT
buf dafs_write(file, buf) req DAFS Server Application DAFS Client VI VI VI Network DAFS Server DAFS Client VI Network VI Local FS Local FS Local FS Local FS VI buf dafs_read(file, buf) DAFS Server Response DAFS Client VI VI buf VI Network dafs_read(file, buf) req DAFS Server Request DAFS Client VI VI VI Network Client-Server Communication • VI channel established at client initialization • VIA Send/Receive used except for dafs_read • Zero-copy data transfers • Emulation of RDMA Read used for dafs_read • Scatter/gather I/O used in dafs_write
Asynchronous I/O Implementation • Applications use I/O descriptors to submit asynchronous read/write requests • Read/write call returns immediately to application • Result stored in I/O descriptor on completion • Applications need to use I/O descriptors to wait/poll for completion
Standalone DAFS Servers on a Cluster Standalone DAFS Servers on a Cluster Clustered DAFS Servers Single DAFS Server Application Application Application Application Application Application Application Application Application Application Application Application DAFS Server DAFS Client DAFS Client DAFS Client DAFS Client DAFS Client DAFS Client DAFS Client DAFS Client DAFS Client DAFS Client DAFS Client DAFS Client DAFS Server DAFS Server DAFS Server DAFS Server DAFS Server DAFS Server DAFS Server Clustering Layer VI VI VI VI VI VI VI VI VI VI VI VI VI Local FS VI VI VI VI VI VI VI Local FS Local FS Local FS Local FS Local FS Local FS Local FS DAFS Server Clustering Layer VI Local FS DAFS Server Clustering Layer VI Local FS Benefits of Clustering
Clustering DAFS Servers Using FedFS • Federated File System (FedFS) • Federation of local file systems on cluster nodes • Extend the benefits of DAFS to cluster-based servers • Low overhead protocol over SAN
FedFS Goals • Global name space across the cluster • Created dynamically for each distributed application • Load balancing • Dynamic Reconfiguration
Virtual Directory (/usr) / usr file1 file2 / / / / usr usr usr usr file1 file1 file2 file2 Virtual Directory (VD) • Union of all local directories with same pathname • Each VD is mapped to a manager node • Determined using hash function on pathname • Manager constructs and maintains the VD
Constructing a VD • Constructed on first access to directory • Manager performs dirmerge to merge real directory info on cluster nodes into a VD • Summary of real directory info is generated and exchanged at initialization • Cached in memory and updated on directory modifying operations
File Access in FedFS manager(f1) • Each file mapped to a manager • Determined using hash on pathname • Maintains information about the file • Request manager for location (home) of file • Access file from home DAFS Server DAFS Server DAFS Server FedFS FedFS FedFS Local FS Local FS VI VI Local FS VI f1 home(f1) VI Network
Optimizing File Access • Directory Table (DT) to cache file information • File information cached after first lookup • Cache of name space distributed across cluster • Block level in-memory data cache • Data blocks cached on first access • LRU Replacement
DAFS Server DAFS Server RDMA for Response with data FedFS FedFS Buffer VI Local FS VI Local FS VI Network DAFS Server DAFS Server Send/Receive for Request/Response FedFS FedFS VI Local FS VI Local FS VI Network Communication in FedFS • Two VI channels between any pair of server nodes • Send/Receive for request/response • RDMA exclusively for data transfer • Descriptors and buffers registered at initialization
Performance Evaluation DAFS Server Application DAFS Client FedFS VI VI Local FS Application DAFS Client VI VI Network DAFS Server FedFS Application VI Local FS DAFS Client VI
Experimental Platform • Eight node server cluster • 800 MHz PIII, 512 MB SDRAM, 9 GB 10K RPM SCSI • Clients • Dual processor (300 MHz PII), 512 MB SDRAM • Linux-2.4 • Servers and Clients equipped with Emulex cLAN adapter • 32 port Emulex switch in full-bandwith configuration
SAN Performance Characteristics • VIA Latency and Bandwidth • poll/wait for latency/bandwidth measurement respectively
Workloads • Postmark – Synthetic benchmark • Short-lived small files • Mix of metadata-intensive operations • Benchmark outline • Create a pool of files • Perform transactions – READ/WRITE paired with CREATE/DELETE • Delete created files
Workload Details • Each client performs 30,000 transactions • Each transaction – READ paired with CREATE/DELETE • READ=open, read, close • CREATE=open, write, close • DELETE =unlink • Multiple clients used for maximum throughput • Clients distribute requests to servers using a hash function on pathnames
Base Case (Single Server) • Maximum throughput • 5075 transactions/second • Average time per transaction • For client ~ 200 ms • On server ~ 100 ms
FedFS Overheads • Files are physically placed on the node which receives client requests • Only metadata operations may involve communication • first open(file) • delete(file) • Observed communication overhead • Average of one roundtrip message among servers per transaction
Other Workloads • No client request sent to file’s correct location • All files created outside Federated DAFS • Only READ operations (open, read, close) • Potential increase in communication overhead • Optimized coherence protocol minimizes communication • Avoid communication at open and close in the common case • Data Caching helps reduce the frequency of communication for remote data access
Postmark Read Throughput Each transaction = READ
Communication Overhead Without Caching • Without caching, each read results in remote fetch • Each remote fetch costs ~65ms • request message (< 256 B) + response message (4096 B)
Work in Progress • Study other application workloads • Optimized coherence protocols to minimize communication in Federated DAFS • File migration • Alleviate performance degradation from communication overheads • Balance load • Dynamic reconfiguration of cluster • Study DAFS over a Wide Area Network
Conclusions • Efficient user-level DAFS implementation • Low overhead user-level communication used to provide lightweight clustering protocol (FedFS) • Federated DAFS minimizes overheads by reducing communication among server nodes in the cluster • Speedups of 3 on 4-node and 5 on 8-node clusters demonstrated using Federated DAFS
Thanks Distributed Computing Laboratory http://discolab.rutgers.edu