1 / 35

Improving File System Synchrony

Explore how to balance performance and durability concerns in file systems, comparing local and distributed scenarios with a user-centric view and innovative implementations.

kricardo
Download Presentation

Improving File System Synchrony

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improving File System Synchrony CS 614 Lecture – Fall 2007 – Tuesday October 16 By Jonathan Winter

  2. Introduction Tues. Oct 16, 2007 – CS 614 – Improving File System Synchrony

  3. Motivation • File system I/O is a major performance bottleneck. • On chip computation and caches access are very fast. • Even main memory accesses are comparatively quick. • Distributed computing further exacerbates the problem. • Durability and fault-tolerance are also key concerns. • Files systems depend on mechanical disks. • Common source of crashes, data loss, and system incoherence. • Distributed file systems further complicate reliability issues. • Typically performance and durability must be traded off. • Ease-of-use is also important. • Synchronous I/O semantics make programming easier. Tues. Oct 16, 2007 – CS 614 – Improving File System Synchrony

  4. Outline • Overview of Local File System I/O Scenario • Overview of Distributed File System Scenario • A User-Centric View of Synchronous I/O • Similarities and Differences in Problem Domains • Details of the Speculator Infrastructure • Implementation of External Synchrony • Benchmark Descriptions • Performance Results • Conclusions Tues. Oct 16, 2007 – CS 614 – Improving File System Synchrony

  5. Local File System I/O • Traditional File Systems Come in Two Flavors. • Synchronous provide durability guarantees by blocking. • OS crashes and power failures will not cause data loss. • File modifications are ordered providing determinism. • Blocking and sequential execution for ordering reduces performance. • Asynchronous files systems don’t block on modifications. • Commit can occur long after completion. • Users can view output later invalidated by a crash. • Synchronization can be enforced through explicit commands (fsync). • fsync does not protect against data loss on a typical desktop OS. • Performance is higher through buffering and group commit. • ext3 is a standard journaling Linux local file system. • Can be configured in async, sync, and durable modes. Tues. Oct 16, 2007 – CS 614 – Improving File System Synchrony

  6. Distributed File Systems • Distributed file systems typically use synchronous I/O. • Provides straightforward abstraction of single namespace. • Enables cache coherence and durability. • Synchronous messages have long latencies over network. • Weaker consistency used (close-to-open) for speed. • Common systems include AFS and NFS. • Earlier research by authors created the Blue File System. • Provides single-copy semantics. • Distributed nature of network is transparent. Tues. Oct 16, 2007 – CS 614 – Improving File System Synchrony

  7. User-Centric View of Synchrony • Synchronous I/O assumes an application-centric view. • Durability of file system guaranteed for application. • System views application as external entity. • Application state must be kept consistent. • Application must not see uncommitted results. • Application must block on distributed file I/O. • User-centric view considers application state as internal. • Observable output must be synchronous. • Kernel and applications are both internal state. • Internal implementation can run asynchronously. • Only external output to the screen or network must be synchronous. • Execution of internal components can be speculative. • Results of speculative execution cannot be observed by outside. Tues. Oct 16, 2007 – CS 614 – Improving File System Synchrony

  8. Similarities Between Scenarios • Both local and distributed file system solutions require buffering output to user and external environment. • Asynchrony of implementation hidden from user. • Durability must be preserved in the presence of faults. • Speculative execution must not be seen until commit. • Both speculation and external synchrony require dependence tracking for uncommitted process and kernel state. • Tracking allows speculative execution rollback misspeculations. • Tracking determines which data should not yet be user visible. • Asynchronous implementation allows computation, IPC, I/O messages, network communication, and disk writes to overlap. • Major source of systems’ performance improvement. Tues. Oct 16, 2007 – CS 614 – Improving File System Synchrony

  9. Differences Between Scenarios • Speculative execution in distributed file systems requires checkpointing and recovery on misspeculation. • External synchrony does not speculate, it just allows internal state to run ahead of the output to user. • Speculative execution must block in some situations. • Checkpointing challenges limit the kinds of supported IPC. • Shared memory was not implemented in distributed setting. • External synchrony conservatively assumes all readers in shared memory inherit dependencies. Tues. Oct 16, 2007 – CS 614 – Improving File System Synchrony

  10. Details of the Speculator Infrastructure • Major bottlenecks in original NFS are blocking of processes and serialization of network traffic. • Speculation allows for concurrency of computation and I/O. Tues. Oct 16, 2007 – CS 614 – Improving File System Synchrony

  11. Conditions for Success of Speculations • File systems chosen as first target for Speculator because: • Results of speculative operations are highly predictable. • Clients cache data and concurrent updates are rare. • Speculating that cached data is valid is successful most of the time. • Network I/O much slower than checkpointing. • Checkpoint is low overhead and a lot speculative work can be completed in the time that the cached data is verified. • Computers have spare resources available for speculation. • Processors are idle significant portions of the time. • Extra memory is available for checkpoints. • Spare resources are available for use to speed up I/O throughput. Tues. Oct 16, 2007 – CS 614 – Improving File System Synchrony

  12. Speculation Interface • Speculator requires modifications to system calls to allow for speculative distributed I/O and to propagate dependencies. • Interface is designed to encapsulate implementation details. • Speculator provides: • create_speculation • commit_speculation • fail_speculation • Speculator doesn’t worry about details of speculative hypotheses. • Distributed file system is oblivious to checkpointing and recovery. • Partitioning of responsibilities allows for easy modification of internal implementation and expansion of support for IPC. Tues. Oct 16, 2007 – CS 614 – Improving File System Synchrony

  13. Speculation Implementation • Checkpointing performed by executing a copy-on-write fork. • Must save the state of open file descriptors and copy signals. • Forked child only run if speculation fails, discarded otherwise. • If speculation fails, child fork is given identify of original process. • Two data structures added to kernel to track speculative state. • Speculation structure created by create_speculation to track the set of kernel objects that depend on the new speculation. • The undo log is an ordered list of speculative operations with information to allow speculative operations to be undone. • Multiple speculations can be started for the same process, with multiple speculation structures and checkpoints. • If a previous speculation was read only, checkpoints are shared. • New checkpoints are required every 500ms to cap recovery time. Tues. Oct 16, 2007 – CS 614 – Improving File System Synchrony

  14. Ensuring Correct Speculative Execution • Two invariants must hold for correct execution. • Speculative state should not be visible to the user or external devices. • Output to screen, network, and other interfaces must be buffered. • Processes cannot view speculative state unless they are registered as dependent upon that state. • Non-speculative processes must block or become speculative when viewing speculative state. • Blocking can always be used to ensure correctness. • System calls that do not modify state or modify only private state can be performed speculatively unmodified. • Speculation flags set in file system superblocks and for read and write system calls to indicate dependency relationships. Tues. Oct 16, 2007 – CS 614 – Improving File System Synchrony

  15. Multi-Process Speculation • To extend the amount of possible speculative work, a speculative process can perform inter-process communication. • Dependencies must propagate from a process P to an object X when P modifies X and P depends on speculation that X does not. • Typically propagations are bi-directional between objects. • A commit_speculation will deleted the associated speculation structure and removed related undo log entries. • Fail_speculation will atomically perform rollback. • The undo log, undo entries, and speculations are generic. • Undo log entries point to type-specific state and functions to implement type-specific rollback for different forms of IPC. Tues. Oct 16, 2007 – CS 614 – Improving File System Synchrony

  16. Causal Dependency Propagation Tues. Oct 16, 2007 – CS 614 – Improving File System Synchrony

  17. Forms of Supported IPC • Distributed file system objects. • Cached copies used speculatively, deleted and retrieved if stale. • Local memory file system – RAMFS was modified. • Modified ext3 to allow speculation for local disk file system. • Speculative data never written to disk. • Calling fdatasync blocks the process. • Processes can observe speculative metadata in ext3 superblocks, bitmaps, and group descriptors. Metadata can be written to disk. • ext3 journal modified to separate speculative and non-speculative data in compound transactions. • Pipes and fifos handled like local file systems. Tues. Oct 16, 2007 – CS 614 – Improving File System Synchrony

  18. Forms of Supported IPC (continued) • Unix sockets propagate dependencies bi-directionally • Signals challenging because exitting process cannot restart. • Signaling processes are checkpointed and managed with queue. • During fork, child inherits all dependencies of parent. • Exiting processes not deallocated until all dependencies are resolved. • Other forms of IPC not supported: • System V IPC, futexes, and shared memory. • Processes block to ensure proper behavior. Tues. Oct 16, 2007 – CS 614 – Improving File System Synchrony

  19. Using Speculation in the File System • For read operations, cached version of file is required. • Speculation assumes file has not been modified, • RPCs changed from synchronous to asynchronous. • Server with full knowledge managing mutating operations. • Server permits other processes to see speculatively changed files only if the cached version matches the server version. • Server must process messages in same ordering as clients see. • Server never stores speculative data. • Clients group commit multiple operations with one disk write. • NFS modified to support Speculator (keeps close-to-open). • Blue File System modified to show speculation can enabled strong consistency and safety as well as good performance. Tues. Oct 16, 2007 – CS 614 – Improving File System Synchrony

  20. External Synchrony • Goal: Provide the reliability and ease-of-use of synchronous I/O and the performance of asynchronous. • Implementation called xsyncfs build on top of ext3. • File system transactions are completed in non-blocking manner but output is not allowed to be externalized. • All output is buffered in the OS to be released when all disk transactions depended on commit. • Processes with commit dependencies propagate output restrictions when interacting with other processes through IPC. • Xsyncfs uses output-triggered commits to balance throughput and latency. Tues. Oct 16, 2007 – CS 614 – Improving File System Synchrony

  21. Example of External Synchrony Tues. Oct 16, 2007 – CS 614 – Improving File System Synchrony

  22. External Synchrony Design Overview • Synchrony defined by externally observable behavior. • I/O is externally synchronous if output cannot be distinguished from output that could be produced from synchronous I/O. • Requires values of external outputs to be the same. • Outputs must occur in same causal order as defined by Lamport’s happens before relation. • Disk commits are considered external output. • File system does all the same processing as for synchronous. • Need not commit the modification to disk before returning. • Two optimizations made to improve performance. • Group committing is used (commits are atomic). • External output is buffered and processes continue execution. • Output guaranteed to be committed every 5 seconds. Tues. Oct 16, 2007 – CS 614 – Improving File System Synchrony

  23. External Synchrony Implementation • Xsyncfs leverages Speculator infrastructure for output buffering and dependency tracking for uncommitted state. • Checkpointing and rollback features unneeded and are disabled. • Speculator tracks commit dependencies between processes and uncommitted file system transactions. • Processes interacting with the dependent process are marked as dependent on the same set of uncommitted transactions. • Many-to-many relationships between objects tracked in undo logs. • ext3 operates in journaled mode. • Multiple modifications are grouped into compound transactions. • Single transaction active at any time and committed atomically. • Likewise only one transaction can be committing at a time. Tues. Oct 16, 2007 – CS 614 – Improving File System Synchrony

  24. External Synchrony Data Structures Tues. Oct 16, 2007 – CS 614 – Improving File System Synchrony

  25. Some Additional Issues • External synchrony must be augmented to support explicit synchronization operations such as sync and fdatasync. • A commit dependency is created between the calling process and active transaction, creating a visible event causing a commit. • Xsyncfs does not require application modification. • Programmers can write the same code as for synchronous. • Explicit synchronization is not needed. • Programmers don’t need to added group commit to the code. • Hand-tuned code can provide benefits in when programmers have specialized information. • However, xsyncfs has global information about external output which can be used to optimize commit throughput. Tues. Oct 16, 2007 – CS 614 – Improving File System Synchrony

  26. Evaluation Methodology • All experiments run on Pentium 4 processors. • RedHat Enterprise Linux release 3 (kernel 2.4.2.1) used. • Speculative execution evaluated for two scenarios. • First has no delay and second assumes 30ms round trip. • Packets routed through NISTnet network emulator. Tues. Oct 16, 2007 – CS 614 – Improving File System Synchrony

  27. Durability Experiment • Desired to obtain confirmation that ext3 does not guarantee durability. • Test consists of continuously writing to local file system and sending UDP messages after each write completes. • Power is cut during experiment and the file system state and log are compared. • ext3 did not provide durability when mounted asynchronously or synchronously and even when fsync commands where issued after writes. • The problem is that modifications are only written to the hard drive cache and not the platter unless write barriers are employed. Tues. Oct 16, 2007 – CS 614 – Improving File System Synchrony

  28. Workload Descriptions • PostMark Benchmark • Performs hundreds or thousands of transactions consisting of file reads, writes, creates, and deletes, and then removes all the files. • Replicates small file workloads of electronic mail, netnews, and web-based commerce. • Good test of file system throughput since there is little output or computation. • Apache Build Benchmark • Benchmark untars Apache 2.0.48 source tree, runs configure in an object directory, runs make, and then removes all the files. • File system must balance throughput and latency since there is a lot of screen output interleaved with disk I/O and computation. Tues. Oct 16, 2007 – CS 614 – Improving File System Synchrony

  29. Workload Descriptions (continued) • MySQL Benchmark • Runs OSDL TPC-C benchmark with MySQL 5.0.16 and the InnoDB storage engine. • Used to see how xsyncfs performs when application performs its own group commit strategy. • Both MySQL and TPC-C client are multi-threaded so this measures xsyncfs’s support for shared memory as well. • SPECweb99 Benchmark • Provides a network intensive application with 50 clients, saturating the server. • High level of network traffic challenges xsyncfs because the messages externalize state. Tues. Oct 16, 2007 – CS 614 – Improving File System Synchrony

  30. PostMark File System Benchmark Results Tues. Oct 16, 2007 – CS 614 – Improving File System Synchrony

  31. Apache Build Benchmark Results Tues. Oct 16, 2007 – CS 614 – Improving File System Synchrony

  32. MySQL and SPECweb99 Results Tues. Oct 16, 2007 – CS 614 – Improving File System Synchrony

  33. Average Latency of HTTP Requests • xsyncfs adds less than 33ms of delay to a request, less than the 50 ms commonly cited perception threshold. • xsyncfs performance significantly better on large request sizes. Tues. Oct 16, 2007 – CS 614 – Improving File System Synchrony

  34. Benefit of Output-Triggered Commits • The goal is to assess the speedup of this lazy approach to commits. • Output-triggered commits allows grouping but can cost latency. • Output-triggered commits perform better on all benchmarks except SPECweb99 where there is so much traffic that both policies have similar behavior. Tues. Oct 16, 2007 – CS 614 – Improving File System Synchrony

  35. Conclusions • Speed need not be sacrificed for durability and ease-of-use. • Both papers succeed in developing a system that achieves performance near that of an asynchronous implementation with the fault-tolerance and simplicity of the synchronous abstraction. • Key insight is user-centric view abstraction. • Speculator infrastructure provides powerful functionality through dependency tracking and checkpointing/rollback. • Papers focus on using system to speed up local and distributed file systems but many other applications are possible. • Amazing order-of-magnitude speedups are achieved. • Simple ideas that surprisingly took until 2005 to be developed. • Why didn’t I think of this for my research?  Tues. Oct 16, 2007 – CS 614 – Improving File System Synchrony

More Related