Disconnected Operation In The Coda File System James J Kistler & M Satyanarayanan Carnegie Mellon University

Disconnected Operation In TheCoda File SystemJames J Kistler & M SatyanarayananCarnegie Mellon University Presented By Prashanth L Anmol N M Yulong Liu

Outline Introduction Design Overview Design Rationale Detailed Design Status & Evaluation Future Work Conclusion

Introduction Disconnected operation a temporary deviation from normal operation as a client of a shared repository Why enhance availability How data cache

Design overview Definition Coda application area: academic and research, not for highly concurrent, fine granularity data access, safety-critical systems. Venus, cache manager Volume, subtree of the Coda namespace mapped to individual file servers VSG, the set of replication sites for a Volume AVSG, currently accessible VSG Callback, when a workstation caches a file or directory, the server promise to notify it before allowing modification by others

Design Rationale Scalability Portable Workstation First- vs. Second-Class Replication Optimistic vs. Pessimistic Replica Control

Design Rationale Scalability Prepare for growth a priori, rather than afterthought Place functionality on clients rather than servers security and integrity issue Avoid system-wide rapid change

Design Rationale Portable Stations Manual caching --> Automatic caching Good prediction on future file access needs, how? First- vs. Second-Class Replication First-class replicas on servers over, Second-class replicas on clients, higher quality, more persistent, widely known, secure, available, complete and accurate. Cache coherence protocol, balance between performance and scalability with quality. When disconnection operation, data quality degraded for second-class, preserved for first-class. Is it true?

Design Rationale Optimistic vs. Pessimistic Replication Control Central to the design of disconnected operation Pessimistic, disallow or restrict read and write, no conflicts, acquire control (Locker) prior to disconnection exclusive control shared control Related Problems Acquire control, involuntary or voluntary disconnection Retain control Brief or Extended Shared or Exclusive

Design Rationale Optimistic, permit read and write anywhere, potential conflicts, detect and resolve them after their occurrence Provide the highest possible availability of data Application dependent Unix File System, low degree of write-sharing. Other systems? Conflicts resolution Automatically resolve when possible Manually repair, annoyance Cost?

Design And implementation • Client Structure • Venus States • Hoarding • Prioritized Cache Management • Hoard Walking • Emulation • Logging • Persistence • Resource Exhaustion • Reintegration • Replay Algorithm • Conflict Handling

Client Structure • Venus – a user level process • Adv: Portable and easy to debug • Disadv: lesser Performance • Venus intercepts Unix system calls via SUN Vnode interface • Mini Cache used to filter out Kernel-Venus interactions • Mini Cache does not support remote access, disconnected operation or server replication. • Mini Cache state changes may also be initiated by Venus on event of callback breaks.

Venus States • Hoarding- During Normal Connection. • Emulation- During Disconnection. • Reintegration- Reconnection after a Disconnection.

Hoarding Steps Hoard useful data in anticipation of disconnection Must Balance the needs of connected and disconnected operation. To improve performance, cache currently used files but also to be prepared for disconnection, cache critical files too. Reasons which make Hoarding difficult … File reference behavior. Unpredictable disconnections and reconnections. How to measure true cost of cache miss during Disconnection ? Activity of other clients must be accounted. Cache space is finite. Possible Solutions ? Use prioritized algorithm for caching Periodically re-evaluate which objects merit retention – Hoard Walking.

Prioritized Cache Management Logic Use both Implicit and Explicit information for cache management. Implicit : Consists of recent reference history (like usual cache algorithms) Explicit : Per workstation hoard database (HDB), whose entries are pathnames identifying objects of interest to the user of workstation. Simple Front End to update HDB customize HDB. support meta expansion of HDB entries. Entry may optionally indicate priority. Objects with lower priority are deleted when cache space is needed. Perform Hierarchical cache management. Assign infinite priority to directories with cached children.

Hoard Walking Why do we need Hoard Walking ? To ensure no uncached object has higher priority than a cached object. Steps Do a Hoard walk every 10 mins. Phase 1 evaluate name bindings of HDB entries to reflect update activity. Phase 2 evaluate priorities of all the entries to restore equilibrium. Optimizations For files and symbolic links, purge objects on callback break and refetch it on demand or during next hoard walk. For directories, don't purge on callback but mark it as suspicious. A callback break on directory means that an entry has been added to or deleted from a directory.

Emulation Actions Performed Responsibility for Access & Semantic checks. Generating temporary file identifiers. Logging maintains sufficient information to replay update activity when it reintegrates. Maintains a replay log. Follows many optimization mechanisms like reducing log lengths & maintaining a copy of the log in cache. Persistence Backing Up cache & related data structures in non-volatile storage. RVM-Recoverable Virtual Memory. Provides Bounded Persistence- bound id period between log flushes.

Emulation Resource Exhaustion Compress file cache & RVM contents. Back out updates made while disconnected. Using removable media. Reintegration Changes roles from pseudo-server to cache manager. Replay algorithm. Conflict Handling.

Reintegration How Long does reintegration take? How Large a local disk does one need? How Likely are conflicts? Duration of Reintegration Reintegration Process has 3 steps…. Allocating Permanent Fids. Replay at the servers. Second phase of update protocol. Benchmarks used – Andrew & Venus make. Observations Total time for reintegration is roughly the same for the two tasks. Reintegration time for Venus takes longer. Neither task involves any think time.

Status & Evaluation Cache Size Observations made Disk Size needed has to be larger to support both explicit & implicit sources of hoarding. Future work intended Cache size requirements for longer periods of disconnection. Sampling broader range of user activity. Evaluate the effect of hoarding. Likelihood of Conflicts Metric Used – Replace the AFS server by the Coda server. Observations

Conclusion Related work – Cedar, FACE, PCMAIL etc Future Work Adding Transactional support . Supporting Weakly connected operation. Conclusion Strengths Tried & Tested. Optimistic Replication. Weaknesses Relevance in current scenario. Relevance Application Dependent. Imply Certain Concepts.

Disconnected Operation In The Coda File System James J Kistler & M Satyanarayanan Carnegie Mellon University