AN IMPLEMENTATION OF A LOG-STRUCTURED FILE SYSTEM FOR UNIX

AN IMPLEMENTATION OF ALOG-STRUCTURED FILE SYSTEM FOR UNIX Margo Seltzer, Harvard U. Keith Bostic, U. C. Berkeley Marshall Kirk McKusick, U. C. Berkeley Carl Staelin, HP Labs

Overview • Paper presents a redesign and implementation of the Sprite LFS • BSD-LFS is • Faster than conventional UNIX FFS (the “fast” file system of the early 80’s) • Not as fast as and enhanced version of FFS with read and write clustering

Historical Perspective • Early UNIX FS used small block sizes and did not try to optimize block placement • The UNIX FFS • Increased block sizes • Added cylinder groups • Incorporated rotational disk positioning to reduce delays when accessing sequential blocks

Limitations of FFS (I) • Synchronous file deletion and creation • Makes file system recoverable after a crash • Same result can be achieved through NVRAM hardware or logging software

Limitations of FFS (II) • Seek times between I/O requests for different files • Has most impact on performance whenever vast majority of files are small • FFS does not address the problem

Log-Structured File Systems • Attempt to address both limitations of FFS • Store all data in a single, continuous log • Optimized for • All writes • Writing files written in their entirety over a short period of time • Accessing files that were created or modified at the same time

General Organization • Disk is partitioned into segments • Writes are always sequential within a segment • Segment cleaner maintains a pool of empty (“clean”) segments through disk compaction • “Live” data existing in a a set of segments are regrouped in a smaller subset of segments

Overview

LFS Data Structures • Superblock: • Same function as one used by FFS • I-node map: • Maps i-node numbers into disk addresses • Segment usage tables: • Show number of live bytes in a segment and last modification time • Checkpoints: • Created every time system does a sync()

Limitations of Sprite LFS • Recovery does not verify the consistency of the file system directory structure • LFS consumes “excessive amounts” of main memory [ by 1993 standards] • Write requests are successful even if there is insufficient disk space • Segment validation is hardware dependent • All file systems use a single cleaner and a single cleaning policy • No measure of the cleaner overhead

Recovery (I) • Two major aspects • Bringing the file system to a physically consistent state • Verifying the logical structure of the file system • FFS achieves both goals through fsck • Rebuilds the whole file system • Verifies the directory structure and all block pointers

Recovery (II) • Sprite LFS uses a two-step recovery process: • Initializes first all the file structures from the most recent checkpoint • “Roll forward” to incorporate all subsequent modifications • Done by reading each segment intime order after the last checkpoint

Recovery (III) • Standard LFS recovery does not verify the directory structure • Weakness to be addressed in BSD-LFS

Memory Consumption • Sprite LFS reserves “large amounts” of main memory including four half-megabyte segments and many buffers • BSD-LFS: • Does not use special staging buffers • Does not reserve two read-only segments that can be reclaimed without any I/O • Implements cleaner as a user-level process

Block Accounting • Sprite LFS maintained a count of disk blocks available for physical writing • Blocks written to the cache but not written to disk do not affect that count • What if a block is “successfully” written to the cache but the disk becomes full before the blocks are actually written? • BSD-LFS keeps a separate count of disk blocks that are not yet committed to any dirty block in the cache

Segment Structure (I) • Sprint LFS places segment summary blocks at the end of the segment • Write containing the segment summary validates the whole segment • Makes two incorrect assumptions • Controller will not reorder write requests • Disk will always write the contents of a buffer in the order presented

Segment Structure (II) • BSD-LFS • Does not make these assumptions • Segment blocks can be written in any order • Segment summary is in front of each partial segment and contains a checksum of four bytes of every block in the partial segment • Partial segments constitute the atomic recovery units of BDS-LFS

File System Verification • BSD-LFS offers two recovery strategies • Quick roll forward from last checkpoint • Complete consistency check of the file system • Recovers lost or corrupted data • Same functionality as FFS fsck() • Takes a long time to run • Can be run in the background

The Cleaner • BSD-LFS makes it possible to implement the cleaner as auser process • Allows for multiple cleaning policies • Makes it easier to experiment with new policies

Implementation Issues • BSD-LFS uses on-disk data structures that are nearly identical to those used by FFS • Existing performance tools can continue to function with only minor modification • Makes system easier to implement and maintain • Two type of operations • Vfs operations affect the whole file system • Vnode operations affect individual files

More Implementation Issues • BSD-LFS does not implement block fragments • Less needed block sizes could be smaller • Still want large blocks to keep metadata to data ratio low • BSD-LFS should (but does not yet) allocate progressively larger blocks.

The Buffer Cache (I) • Had to modify the FFS buffer cache • Cannot assume that cache blocks can be flushed one at a time • Would destroy any performance advantage of LFS • LFS may need extra memory to write modified metadata and partial segment summary blocks

The Buffer Cache (II) • Cache blocks do not have a disk address until they are written to the disk • Violates assumption that all blocks have disk addresses • Cannot use disk address to access indirect blocks • BSD-LFS incorporates metadata block numbering (negative values)

The IFILE • Sprite-LFS maintained the i-node map and segment usage table as kernel data structures written to disk at checkpoint time • BSD-LFS places both data structures in a read-only file visible in the file system • Allows unlimited number of i-nodes • Cleaner can be migrated into user space • I-node map also contains a list of free i-nodes

Directory Operations (I) • BSD-LFS does not retain synchronous behavior of directory operations (create, link, mkdir, …) • Sprite-LFS maintains ordering of directory operations by maintaining a directory operation log inside the file system log • Before any directory updates are written to disk, it writes a log entry describing that operation

Directory Operations (II) • BSD-LFS has a unit of atomicity • the partial segment • It does not have a mechanism that guarantees that all i-nodes involved in a directory operation will fit into a single partial segment • BSD-LFS allows operations to span partial segments

Directory Operations (III) • Introduces a new recovery restriction • Cannot roll forward a partial segment that has an unfinished directory operation if the partial segment that completes the directory operation did not make it to disk(segment batching)

COMPARISON • BSD-LFS was found to perform • Better to the 4BSD FFS in a variety of benchmarks • Not significantly worse than FFS in any test • EFS, a version of FFS with read and write clustering was found to provide comparable and sometimes superior performance to BSD- LFS.

EFS • Extended version of FFS • Provides extent-based file system behavior • Parameter maxcontig specifies how many logically sequential disk blocks should be allocated contiguously • Large maxcontig is same as track allocation • EFS accumulates sequential dirty buffers in the cache before writing them as a cluster

Multi-user Andrew Benchmark • We measure execution times • LFS performs well in phases 1 and 2 (mostly writes) and poorly in phase 5 (random I/O)

CONCLUSIONS • A LFS operates best when it can write out many dirty buffers at once • Requires more buffer space in main memory • Delayed allocation of BSD-LFS complicates accounting of available free space • Issue was not correctly handled by Sprite-LFS • Cleaner might sometimes consumes more disk space than it frees • Must reserve additional disk space

AN IMPLEMENTATION OF A LOG-STRUCTURED FILE SYSTEM FOR UNIX

AN IMPLEMENTATION OF A LOG-STRUCTURED FILE SYSTEM FOR UNIX

Presentation Transcript

The Design and Implementation of a Log-Structured File System

The Design and Implementation of a Log-Structured File System

AN IMPLEMENTATION OF A LOG-STRUCTURED FILE SYSTEM FOR UNIX

A Fast File System for Unix

The Design and Implementation of a Log-Structured File System

Log-Structured File System (LFS)

File System of Unix

A FAST FILE SYSTEM FOR UNIX

Log-Structured File Systems

The Design and Implementation of a Log-Structured File System

A Fast File System for Unix

A Fast File System for UNIX

A Fast File System for UNIX

Log-Structured File Systems

THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED FILE SYSTEM

Log-structured File Systems

A Fast File System for Unix

A Fast File System For UNIX

The Design and Implementation of a Log-Structured File System

Log-Structured File Systems