1 / 22

File System Performance

File System Performance. CSE451 Andrew Whitaker. Ways to Improve Performance. Access the disk less Caching! Be smarter about accessing the disk Turn small operations into large operations Turn scattered operations into sequential operations. Technique #1: Caching.

mirra
Download Presentation

File System Performance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. File System Performance CSE451 Andrew Whitaker

  2. Ways to Improve Performance • Access the disk less • Caching! • Be smarter about accessing the disk • Turn small operations into large operations • Turn scattered operations into sequential operations

  3. Technique #1: Caching • Memory is MUCH faster than disk • So, cache whatever we can in memory • File buffers • i-nodes • Directory entries (name => i-node) • Caching reads is a no-brainer • Caching writes is more interesting…

  4. Caching Writes • Two options • Synchronous: data is immediately written out to disk • AKA: write-through • Asynchronous: disk writes are delayed • AKA: write-back • Programmer’s perspective: what does it mean when the “write” system call returns? • With asynchronous writes, the data has not necessarily hit the disk

  5. Why Use Asynchronous Writes? • Allows us to batch-up multiple writes to the same block • Allows for better overlap of CPU and I/O • CPU does not stall waiting for the disk • Allows the disk scheduler to make better decisions • Application: write(a); write (b); write(c); • Disk: write(b); write(a); write(c); • Most data updates in UNIX systems use asynchronous writes by default • Programmer can override: fsync(fd);

  6. Problems with Asynchronous Writes • File system state can be lost during a crash • Missing blocks, missing files, missing directories, storage leaks, etc. • For this reason, meta-data updates tend to be done synchronously • File/directory creation or deletion

  7. Consistency Problems • Problems still arise, even with synchronous meta-data updates • For example, file creation must modify an i-node anda directory entry • Initialize the i-node • Record the <fileName, i-node> mapping in the directory • Disks do not support atomic operations

  8. Dealing with Consistency Problems • Always keep the disk in a “safe” state • Run a recovery program (like fsck) on startup

  9. i-check: File Consistency • Is each block on exactly one list? • Create a bit vector with as many entries as there are blocks • Follow the free list and each i-node block list • When a block is encountered, examine its bit • If the bit was 0, set it to 1 • If the bit was already 1 • if the block is both in a file and on the free list, remove it from the free list and cross your fingers • if the block is in two files, call support! • If there are any 0’s left at the end, put those blocks on the free list

  10. d-check: Directory Consistency • Do the directories form a tree? • Cycles are bad! • Does the link count of each file (i-node) equal the number of directory links to it?

  11. Technique #2: Better Data Layout • Recall basic file system structure: • Meta-data: i-nodes, free block list • Data: file data, directory data Metadata Data Note: i-nodes are far from the data blocks they describe

  12. Cylinder groups • Basic idea: group commonly accessed data and meta-data together • This reduces seeks • Details: • Disk is partitioned into groups of cylinders • Data blocks from a file are all placed in the same cylinder group • Files in same directory are placed in the same cylinder group • i-node for file placed in same cylinder group as file’s data

  13. Cylinder Group Analysis • Reduces or eliminates seeks for some common access patterns • Does not address rotational delay • Performance is workload dependent • Performance degrades if cylinders become full • Partial solution: pro-actively reserve space

  14. Log Structured File System • Let’s assume all reads are cached • An iffy assumption, but let’s suspend disbelief • Q: How can we turn all writes into large, sequential writes? • Insight: this is possible ifthe location of data on disk can change

  15. A Convention File System • Files live at fixed location • So, file system writes must use seeks • For example: • Write to Christine.txt • Write to Andrew.txt • Write to Colin.txt

  16. Log-structured File System • Use the disk as an append-only log • All writes go at the end of the log • The location of a file changes over time • Old data is not over-written • Until the file system becomes full Log growth Christine.txt Andrew.txt Colin.txt Christine.txt

  17. LFS Details • Everything gets written to the log • File data, i-nodes, directories • LFS tries to buffer many small writes into large segments • Typically 512k, 1MB

  18. How Can This Possibly Work? • Q: If nothing lives at a fixed location, how do we find “the data”? • A: Add a layer of indirection: An i-node map • Maps from i-node number to current location • The map resides at a fixed location on disk • NOT in the log! • The map is cached in memory for performance

  19. What Happens When the Disk Gets Full? • Partial solution: disk is managed in segments, which are threaded on disk • Basically, a linked-list • But, this re-introduces seeks!

  20. Segment Cleaner • Goal: make scattered segments contiguous again • Approach: • Read a segment • Write live data to the end of the log • Presto: The segment is now clean • This is very expensive • Each live byte is read and written

  21. LFS Analysis • For reads, LFS and a traditional FS are largely equivalent • LFS has better performance for small writes and meta-data operations • The LFS cleaner has a large impact on performance • How important is this?

  22. LFS in Practice • LFS is implemented, but not widely used • Reasons? • Assumptions about read behavior were not valid • Reads have not gone away • Performance improvements were not sufficient to offset increase complexity, higher variability • LFS comeback? • See Jim Gray’s article

More Related