1 / 37

FARSITE: Federated, Available and Reliable Storage for an Incompletely Trusted Environment

This Microsoft Research paper discusses a distributed file system that operates without a central server, using client machines to store and maintain files and directories. The system ensures availability and security through encryption, replication, and Byzantine-fault tolerance.

ccaraballo
Download Presentation

FARSITE: Federated, Available and Reliable Storage for an Incompletely Trusted Environment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. FARSITE: Federated, Available and Reliable Storage for an Incompletely Trusted Environment A. Atta, W. J. Bolowsky, M. Castro, G. Cermak,R. Chaiken, J. R. Douceur, J. Howell,J. R. Lorch, M. Theimer, R. P. Wattenhoffer Microsoft Research

  2. Paper highlights • Paper discusses a distributed file system lacking a central server • Files and directories reside on client machines • Files are encrypted and replicated • Directory metadata are maintained by Byzantine-replicated finite state machines

  3. Serverless file systems • Idea is not new • xFS (Anderson et al. SOSP 1995) • Objective is to utilize free disk space and processing power of client machines • Two major issues are • Availability of files • Security

  4. Design assumptions (I) • Farsite is intended to run on the desktops of a large corporation or a university: • Maximum scale of ~105 machines • Interconnected by a high-bandwidth low-latency network • Most machines up most of the time • Uncorrelated machine failures

  5. Design assumptions (II) • No files are both • Read by many users and • Frequently updated by at least one user (very infrequent in Windows NT file system) • Small but significant fraction of users will maliciously attempt to destroy or corrupt file data and metadata

  6. Design assumptions (III) • Large fraction of users may independently attempt unauthorized accesses • Each machine is under the control of its immediate user • Cannot be subverted by other people • No user sensitive data persist after logout or system reboot • Not true for any commodity OS

  7. Enabling technology trends (I) • General increase in unused disk capacity:for 4800 desktops at Microsoft research Year Unused disk space • 49% • 50% • 58%

  8. Enabling technology trends (II) • Lowered cost of cryptographic operations: • Can now encrypt data at 72MB/s • Faster than disk sequential I/O bandwidth (32MB/s)

  9. Namespace roots • Farsite provides hierarchical directory namespaces • Each namespace has its own root • Each root has a unique root name • Each root is managed by a designated set of machines forming a Byzantine-fault-tolerant group • No need for a protected set of machines

  10. Trust and certification (I) • Basic Requirements • Users must trust the machines that offer to present data or metadata • Machines must trust the validity of requests from remote users • System security must trust that machines that claim to be distinct are truly distinct • To prevent Sybil attacks

  11. Sybil attacks • (Douceur 2002) • Possible whenever redundancy is used to increase security • Single rogue entity can • Pretend to be many and • End controlling a large part of the system • Cannot prevent them without alogically centralized authority certifying identities

  12. Trust and certification (II) • Farsite manages trust throughpublic-key cryptographic certificates • Namespace certificates • User certificates • Machine certificates

  13. Trust and certification (III) • Bootstrapped by fiat: • Machines told to accept certificates that can be authenticated with some public keys • Associated private keys are called Certification Authorities (CA) • Certificates created either by CAs themselves or by users authorized to create certificates

  14. Trust and certification (IV) • User private keys are • Encrypted with a symmetric key derived from user password • Stored in a globally-readable directory in Farsite • Does not require users to modify their behavior • User or machine keys can be revoked

  15. Handling malicious behaviors • Most fault-tolerant file systems do not protect users’ files against malicious behaviors of hosts • They assume that a host will eitherbehave correctly or crash • Malicious behaviors are often calledByzantine failures • One or more hosts act as if they were controlled by very clever traitors

  16. System architecture (I) • Each Farsite client will deal with two different sets of hosts • A set of machines constituting adirectory group • A set of machines acting as file hosts • In practice these three roles are shared by all machines

  17. DirectoryGroup FileHost Member Member FileHost Client Member Member FileHost System architecture (II) Client sees one directory group

  18. The directory group (I) • Replicates directories on directory members • Directory integrity enforced through a Byzantine-fault-tolerant protocol • Works as long as less than one-third of the hosts misbehave in any manner (“traitor) • Requires a minimum of four hosts to tolerate one misbehaving host

  19. The directory group (II) • Decisions for all operations that are not determined by the client request are made through a cryptographically secure distributed random number generator • Issues leases on files to clients • Promise not to allow any incompatible access to the file during the duration of the lease without notifying the client

  20. The directory group (III) • Directory groups can split: • Randomly select a group of machines they know • Tell them to form a new directory group • Delegate a portion of their namespace to new group • Both user and directory group mutually authenticate themselves

  21. The file hosts (I) • Farsite stores encrypted replicas of each file to ensure file integrity and file availability • Continuously monitors host availability and relocates replicas whenever necessary • Does not allow all replicas of a given file to reside on hosts owned by the same user • Files that were recently accessed by a client are cached locally (for “roughly one week”)

  22. The file hosts (II) • Farsite does not use voting: • Correct replicas are identified by the directory host • Farsite does not update at once all replicas of a file: • Would be too slow • Uses instead a background update mechanism

  23. Semantic differences • Unlike NTFS, Farsite • Puts a limit on the number of clients that can have a file open for write • Allows a directory to be renamed even if there is an open handle on a file in the directory or any of its descendents • Uses background—”lazy”—propagation of directory updates

  24. Reliability and availability (I) • Trough redundancy • Metadata stored in a directory group of RD members remain accessible if no more than (RD - 1) / 3 members fail • Data replicated on RF file hosts remain accessible as long as one of these hosts remains alive

  25. Reliability and availability (II) • Farsite migrates duties of machines that have been unavailable for a long period of time to new machines (regeneration) • More aggressive approach to directory migration than to file-host migration • Farsite continuously monitors host availability and relocates replicas whenever necessary • Client cache files for a week after last access

  26. Security (I) • Write access control enforced through Access Control Lists managed by directory group • Requires Byzantine agreement • Read access control achieved through strong cryptography • File is encrypted with symmetric file key • File key is encrypted with public keys of all authorized users

  27. Security (II) • Same technique is applied to directory names • Members of directory group cannot read them • To ensure file integrity, Farsite stores a copy of a Merkle hash tree over the file data blocks in the directory group that manages the file’s metadata

  28. What is a Merkle hash tree? (I) • Consider a file made up of four blocks:A, B, C and D • We successively compute: • a =leaf_hash(A) , …, d = leaf_hash(D) • p = inner_hash( a, b), q = inner_hash( c, d) • r = inner_hash( p, q) • Recomputing r (the root hash) an comparing it with its supposed value will detect any tampering

  29. a=leaf_hash(A) b=leaf_hash(B) d =leaf_hash(D) c=leaf_hash(C) q=inner_hash(c, d) p=inner_hash(a, b) r=inner_hash(p,q) A B C D What is a Merkle hash tree? (II)

  30. Durability (I) • File creations, deletions and renames are not immediately forwarded to directory group • High cost of Byzantine protocol • First stored in a log on client • Much as in Coda disconnected mode • Log is pushed back to directory group • At fixed intervals • Whenever a lease is recalled

  31. Durability (II) • When a client reboots, it needs to send its committed updates to the directory group and have them accepted as authentic • Client will generate an authenticator key which it will distribute among members of the directory group • Can use this key to sign each committed update

  32. Consistency (I) • Directory group uses a lease mechanism: • Data read/write leases • Data read-only leases • Concurrent write accesses are handled by redirecting them to a single client machine • Guarantees correctness • Non scalable

  33. Consistency (II) • Leases have variable granularity • Single file • Entire subtree • No good way to handle read/write lease expiration on a disconnected client The fundamental paper on leases is C. G. Gray, .D. R. Cheriton: Leases: An Efficient Fault-Tolerant Mechanism for Distributed File Cache Consistency. SOSP 1989: pp. 202-210

  34. Consistency (III) • Specialname leases for files and directories • A name lease on a directory allows holder to create files and subdirectories under that directory with any non-extant name • More special-purpose leases were introduced to implement Windows file sharing semantics

  35. Scalability • Ensured through • Hint-based pathname translation:Hints are data items that are useful when they are correct and cause no harm when they are incorrect • Think of a phone number • Delayed-directory change notification

  36. Efficiency • Space efficiency: • Almost 50% of disk space could be reclaimed by eliminating duplicate files • Farsite detects files with duplicate contents and co-locates them in same set of file hosts • Performance: • Achieved through caching and delaying updates

  37. Evaluation • Designed to scale up to 105 machines • Roughly 300 new machines per day • Andrew benchmark two times slower than NTFS • Still to do • Implement disk quotas • Have mechanism to measure machine availability

More Related