210 likes | 419 Views
Distributed File System: Data Storage for Networks Large and Small. Pei Cao Cisco Systems, Inc. . Review: DFS Design Considerations. Name space construction AAA Operator batching Client caching Data consistency Locking. Summing it Up: CIFS as an Example. Network transport in CIFS
E N D
Distributed File System: Data Storage for Networks Large and Small Pei Cao Cisco Systems, Inc.
Review: DFS Design Considerations • Name space construction • AAA • Operator batching • Client caching • Data consistency • Locking
Summing it Up: CIFS as an Example • Network transport in CIFS • Use SMB (Server Message block) messages over a reliable connection-oriented transport • TCP • NetBIOS over TCP • Use persistent connections called “sessions” • If a session is broken, client does the recovery
Design Choices in CIFS • Name space construction: • per-client linkage, multiple methods for server resolution • file://fs.xyz.com/users/alice/stuff.doc • \\cifsserver\users\alice\stuff.doc • E:\stuff.doc • CIFS also offers “redirection” method • A share can be replicated in multiple servers or moved • Client open server reply “STATUS_DFS_PATH_NOT_COVERED” client issues “TRANS2_DFS_GET_REFERRAL” server reply with new server
Design Choices in CIFS • AAA: Kerberos • Older systems use NTLM • Operator batching: supported • These methods have “AndX” variations: TREE_CONNECT, OPEN, CREATE, READ, WRITE, LOCK • Server implicitly takes results of preceding operations as input for subsequent operations • First command that encounters an error stops all subsequent processing in the batch
Design Choices in CIFS • Client caching • Cache both file data and file metadata, write-back cache, can read-ahead • Offers strong cache consistency using an invalidation-based approach • Data access consistency • Oplocks: similar to “tokens” in AFS v3 • “level II oplock”: read-only data locks • “exclusive oplock”: exclusive read/write data lock • “batch oplock”: exclusive read/write “open” lock and data lock and metadata lock • Transition among the oplocks • Observation: can have a hierarchy of lock managers
Design Choices in CIFS • File and data record locking • Offer “shared” (read-only) and “exclusive” (read/write) locks • Part of the file system; Mandatory • Can lock either a whole file or byte-range in the file • Lock request can specify a timeout for waiting • Enables atomic writes with the “ANDX” batching with Writes • “Lock/write/unlock” as a batched command sequence • Additional capability: “directory change notification”
DFS for Mobile Networks • What properties of DFS are desirable: • Handle frequent connection and disconnection • Enable clients to operate in disconnected state for an extended period of time • Ways to resolve/merge conflicts
Design Issues for DFS in Mobile Networks • What should be kept in client cache? • How to update the client cache copies with changes made on the server? • How to upload changes made by the client to the server? • How to resolve conflicts when more than one clients change a file during disconnected state?
Example System: Coda • Client cache content: • User can specify which directories should always be cached on the client • Also cache recently used files • Cache replacement: walk over the cached items every 10 min to reevaluate their priorities • Updates from server to client: • The server keeps a log of callbacks that couldn’t be delivered and deliver them upon client connection
Coda File System • Upload the changes from client to server • The client has to keep a “replay log” • Contents of the “replay log” • Ways to reduce the “replay log” size • Handling conflicts • Detecting conflicts • Resolving conflicts
Performance Issues in File Servers • Components of server load • Network protocol handling • File system implementation • Disk accesses • Read operations • Metadata • Data • Write operations • Metadata • Data • Workload characterization
DFS for High-Speed Networks: DAFS • Proposal from Network Appliance and companies • Goal: eliminate memory copies and protocol processing • Standard implementation: network buffers file system buffer cache user-level application buffers • Designed to take advantage of RDMA (“Remote DMA”) network protocols • Network transport provides direct memory memory transfer • Protocol processing is provided in hardware • Suitable for high-bandwidth, low-error-rate, low-latency network
DAFS Protocol • Data read from the client: • RDMA request from the server to copy file data directly into application buffer • Data write from the client • RDMA request from the server to copy application buffer into server memory • Implementation: • as a library linked to user application interface with RDMA network library directly • Eliminate two data copies • as a new file system implementation in the kernel • Eliminate one data copy • Performance advantage: • Example: 90 usec/op in NFS vs. 25 usec/op in DAFS
DAFS Features • Session-based • Offer authentication of client machines • Flow control by server • Stateful lock implementation with leases • Offers atomic writes • Offers operator batching
Clustered File Servers • Goal: scalability in file service • Build a high-performance file service using a collection of cheap file servers • Methods for Partitioning the Workload • Each server can support one “subtree” • Advantages • Disadvantages • Each server can support a group of clients • Advantages • Disadvantages • Client requests are sent to server in round-robin or load-balanced fashion • Advantages • Disadvantages
Non-Subtree-Partition Clustered File Servers • Design issues • On which disks should the data be stored? • Management of memory cache in file servers • Data consistency management • Metadata operation consistency • Data operation consistency • Server failure management • Single server failure fault tolerance • Disk failure fault tolerance
Mapping Between Disks and Servers • Direct-attached disks • Network-attached disks • Fiber-channel attached disks • iSCSI attached disks • Managing the network-attached disks: “volume manager”
Functionalities of a Volume Manager • Group multiple disk partitions into a “logical” disk volume • Volume can expand or shrink in size without affecting existing data • Volume can be RAID-0/1/5, tolerating disk failures • Volume can offer “snapshot” functionalities for easy backup • Volumes are “self-evident”
Implementations of Volume Manager • In-kernel implementation • Example: Linux volume manager, Veritas volume manager, etc. • Disk server implementation • Example: EMC storage systems
Serverless File Systems • Serverless file systems in WAN • Motivation: peer-to-peer storage; never lose the file • Serverless file system in LAN • Motivation: client powerful enough to be like servers; use all client’s memory to cache file data