File System Performance

File System Performance CSE451 Andrew Whitaker

Ways to Improve Performance • Access the disk less • Caching! • Be smarter about accessing the disk • Turn small operations into large operations • Turn scattered operations into sequential operations

Technique #1: Caching • Memory is MUCH faster than disk • So, cache whatever we can in memory • File buffers • i-nodes • Directory entries (name => i-node) • Caching reads is a no-brainer • Caching writes is more interesting…

Caching Writes • Two options • Synchronous: data is immediately written out to disk • AKA: write-through • Asynchronous: disk writes are delayed • AKA: write-back • Programmer’s perspective: what does it mean when the “write” system call returns? • With asynchronous writes, the data has not necessarily hit the disk

Why Use Asynchronous Writes? • Allows us to batch-up multiple writes to the same block • Allows for better overlap of CPU and I/O • CPU does not stall waiting for the disk • Allows the disk scheduler to make better decisions • Application: write(a); write (b); write(c); • Disk: write(b); write(a); write(c); • Most data updates in UNIX systems use asynchronous writes by default • Programmer can override: fsync(fd);

Problems with Asynchronous Writes • File system state can be lost during a crash • Missing blocks, missing files, missing directories, storage leaks, etc. • For this reason, meta-data updates tend to be done synchronously • File/directory creation or deletion

Consistency Problems • Problems still arise, even with synchronous meta-data updates • For example, file creation must modify an i-node anda directory entry • Initialize the i-node • Record the <fileName, i-node> mapping in the directory • Disks do not support atomic operations

Dealing with Consistency Problems • Always keep the disk in a “safe” state • Run a recovery program (like fsck) on startup

i-check: File Consistency • Is each block on exactly one list? • Create a bit vector with as many entries as there are blocks • Follow the free list and each i-node block list • When a block is encountered, examine its bit • If the bit was 0, set it to 1 • If the bit was already 1 • if the block is both in a file and on the free list, remove it from the free list and cross your fingers • if the block is in two files, call support! • If there are any 0’s left at the end, put those blocks on the free list

d-check: Directory Consistency • Do the directories form a tree? • Cycles are bad! • Does the link count of each file (i-node) equal the number of directory links to it?

Technique #2: Better Data Layout • Recall basic file system structure: • Meta-data: i-nodes, free block list • Data: file data, directory data Metadata Data Note: i-nodes are far from the data blocks they describe

Cylinder groups • Basic idea: group commonly accessed data and meta-data together • This reduces seeks • Details: • Disk is partitioned into groups of cylinders • Data blocks from a file are all placed in the same cylinder group • Files in same directory are placed in the same cylinder group • i-node for file placed in same cylinder group as file’s data

Cylinder Group Analysis • Reduces or eliminates seeks for some common access patterns • Does not address rotational delay • Performance is workload dependent • Performance degrades if cylinders become full • Partial solution: pro-actively reserve space

Log Structured File System • Let’s assume all reads are cached • An iffy assumption, but let’s suspend disbelief • Q: How can we turn all writes into large, sequential writes? • Insight: this is possible ifthe location of data on disk can change

A Convention File System • Files live at fixed location • So, file system writes must use seeks • For example: • Write to Christine.txt • Write to Andrew.txt • Write to Colin.txt

Log-structured File System • Use the disk as an append-only log • All writes go at the end of the log • The location of a file changes over time • Old data is not over-written • Until the file system becomes full Log growth Christine.txt Andrew.txt Colin.txt Christine.txt

LFS Details • Everything gets written to the log • File data, i-nodes, directories • LFS tries to buffer many small writes into large segments • Typically 512k, 1MB

How Can This Possibly Work? • Q: If nothing lives at a fixed location, how do we find “the data”? • A: Add a layer of indirection: An i-node map • Maps from i-node number to current location • The map resides at a fixed location on disk • NOT in the log! • The map is cached in memory for performance

What Happens When the Disk Gets Full? • Partial solution: disk is managed in segments, which are threaded on disk • Basically, a linked-list • But, this re-introduces seeks!

Segment Cleaner • Goal: make scattered segments contiguous again • Approach: • Read a segment • Write live data to the end of the log • Presto: The segment is now clean • This is very expensive • Each live byte is read and written

LFS Analysis • For reads, LFS and a traditional FS are largely equivalent • LFS has better performance for small writes and meta-data operations • The LFS cleaner has a large impact on performance • How important is this?

LFS in Practice • LFS is implemented, but not widely used • Reasons? • Assumptions about read behavior were not valid • Reads have not gone away • Performance improvements were not sufficient to offset increase complexity, higher variability • LFS comeback? • See Jim Gray’s article

File System Performance

File System Performance

Presentation Transcript

FILE SYSTEM

File Server Performance

Comparison and Performance Evaluation of SAN File System

Ceph: A Scalable, High-Performance Distributed File System

File System

File-System

File System

FILE SYSTEM

File System

File System

File System

File System

File System Performance

File System

File System