230 likes | 624 Views
The Coda File System. Jeff Chheng Jun Du. Overview of Coda. Distributed file system Designed for scalability, security, and high availability Descendant of version 2 of Andrew File System (AFS), so follows same organization Virtue Venus Vice. Overview of Coda.
E N D
The Coda File System Jeff Chheng Jun Du
Overview of Coda • Distributed file system • Designed for scalability, security, and high availability • Descendant of version 2 of Andrew File System (AFS), so follows same organization • Virtue Venus Vice
Overview of Coda • Virtue (client) Venus (process) Vice (file server)
Communication • Remote procedure calls with RPC2 • More reliable than other RPC systems • Supports “side effects” – an application-specific protocol • Multicasting for invalidation: in series vs. in parallel
Processes • Clients represented by Venus processes • Servers represented by Vice processes • Both processes are organized as a collection of concurrent threads • Threads are non-preemptive
Naming • Files are grouped into volumes • A volume is like a Unix disk partition, but with smaller granularity • Volumes can be mounted • Naming inherited from server’s name space
File Identification • Files are copied and moved across multiple servers • ID is needed to track file to physical location • Replicated Volume Identifier (RVID) for logical volumes • Volume Identifier (VID) for physical volumes
Synchronization • Many DFS (e.g., AFS) support session semantics • Coda attempts to support transactional semantics • Attempts to solve the problem in large DFS where some or all file servers are temporarily unavailable
Sharing Files • When client opens file, file is transferred to client’s machine • When file is opened for writing, no other clients may open file • When file is opened for reading, others can open for reading or writing
Client Caching • Clients always cache entire files • Cache coherence maintained with callbacks • Servers record a callback promise for clients • Updating a file breaks the promise for other clients • Use promise to determine if cache needs updating
Server Replication • Volume Storage Group (VSG): collection of servers with copy of volume • Accessible VSG (AVSG): servers in a VSG a client can contact • If AVSG is empty, client is considered disconnected • Client receives from one member in AVSG, updates in parallel to all members
Server Replication Problem • What happens when two clients access two different AVSGs for the same file? • Use optimistic strategy for replication • Inconsistency detected and resolved with Coda version vector • Conflict resolution can be automated, but might require user intervention
Working While Disconnected • Unlike NFS, client will simply use local copy when disconnected • Closing file when d/c always works • Modifications are transferred to server when connection is reestablished • Mostly automatic, may need intervention • In practice, write-sharing is rare
Security • Mutual authentication with secret-key cryptosystem • Setting up a secure channel requires a secret token • Access is granted to disconnected clients
Access Control • Access control associated with directories (but not subdirectories) • Operations under access control: read, write, lookup, insert, delete, and administer • Execution never happens server-side, only client-side, so no permissions for it • Coda maintains info on users and groups • Negative rights possible
Drawbacks • Client-side (Venus) resource exhaustion • File cache full of modified files • RVM space becomes full • Not entirely scalable yet beyond 20-30 users and a few servers • Limited stability with systems containing terabytes of data
Improvements • Increase cache and RVM size or allow them to be stored in removable media • Compress file cache and RVM contents • Allow users to selectively back out of updates