150 likes | 303 Views
Jeff's Filesystem Papers Review Part II. Review of "The Design and Implementation of a Log-Structured File System". The Design and Implementation of a Log Structured File System. By Mendel Rosenblum and John K Ousterhout UCB
E N D
Jeff's Filesystem Papers Review Part II. Review of "The Design and Implementation of a Log-Structured File System"
The Design and Implementation of a Log Structured File System • By Mendel Rosenblum and John K Ousterhout • UCB • Ousterhout introduced the idea in an earlier paper with a few different configurations, this describes the concept as it exists after they implemented it in a new FS called Sprite. • Some empirical research was done after the implementation and they did prove that LFS is a good idea. This presentation is an academic review, the ideas presented are either quotes or paraphrases of the reviewed document.
Intro • Why? (problem statement) • CPU's getting faster • Memory getting faster • Disks not • Amdahl's Law • Bottlenecks move around, as CPU gets faster, bottleneck moves to memory or Disk, etc. • We need to find a way to use disks more efficiently • Assumption • Caching files in RAM improves READ performance of Filesystem, significantly more than WRITE performance. • Therefore Disk activity will become more Write-centric in the future.
2.1 • Disk improvement is in area of Price/Capacity and Physical size, not in seek-time • Even if IO improves, Seek-time will still be killer • Memory getting cheaper/faster.. therefore use memory cache to ease disk bottleneck. • Caches • difference between cache and buffer? • Buffer is used between 2 different speed devices, cache is to speed up subsequent similar or proximate accesses. • Caches can reorder writes to write more efficiently.
2.2 Workloads • 3 classes of File access patterns (from different paper) • scientific processing - read and write large file sequentially • transaction processing - many simultaneous requests, small chunks of data • engineering/office applications - access large number of small files in sequence • Engineering/office is the killer, and that is what LFS is designed for.
2.3 Problems with Existing FS's • UNIX FFS (fast file system...also Berkeley Developed) • puts files sequentially on disk • inode data in fixed location on disk • directory data at another location • Total of 5 seeks to create a new file (bad). • file data is written asynchronously so that program can continue w/o waiting for FS, BUT • Metadata is written synchronously so program is blocked when messing with things like inode data.
3 LFS • Buffer a sequence of FS changes in file cache and then write them sequentially to disk in a chunk to avoid seeks. • Essentially all data is merely a log entry. • Creates 2 problems... • How to read from log • How to keep freespace on disk • I.E. you start writing and writing forward forever eventually you will wrap at end of disk.
3.1 How to Read • Reads are at same speed as FFS after the inode is located. • Locating the Inode is what slows down FFS and where LFS is better. • FFS has Inode in static portion of disk • unrelated to physical location of data. • LFS stores Inode in proximity to data at the head (end) of log. • Because of this another (but much smaller) map is needed of the inodes. • So small that it is kept in cache all the time to not cause excessive seeks. • called the checkpoint region.
3.2 Free Space Management • Log Wraparound. • Choices. • Don't defragment, just write to next free block • GC-Style Stop everything and copy • Incremental Continuous and Copy • Solution - Segments • Divide Disk into segments • segment size chosen for optimal usage. • segment is written contiguously, and the disk is compacted in segments to avoid fragmentation. • This defrag is known as segment-cleaning
3.3 Segment Cleaning • Should be pretty obvious how to do it • 3 steps • Read a number of non-clean segments into memory • get only live (in use portion of segments) data • Write live data back to disk in clean segments • other logistical considerations in segment cleaning • update Inodes • update fixed structures such as checkpoint region • remember these are in cache, and as we will see later they are dumped to disk at predetermined intervals. • There is some other stuff dealt with as each segment has a header and other stuff, Read the paper for details.
3.4 Segment Cleaning - how to configure • When to do it? • Low priority, or when diskspace is needed • the authors choose when diskspace is needed with watermarks. • How many segments to clean at one time? • The more segments cleaned at one time, the more intelligent the cleaning can be and the better organized the disk. • watermarks chosen above • Which segments to clean?...coming • Since you can write the data back to disk any way you want to, you should write it back in the most efficient manner for its predicted use...coming
3.5-6 Determination of Segment-Cleaning configuration. • Here the authors went into empirical studies. • Wrote simulator and played with config of segment-cleaning to determine a good policy. • Results/Conclusions • differentiate between hot and cold segments based on past history • A hot segment is one that is likely to be written • A cold segment is one that is unlikely to be written • They came up with a policy called cost-benefit • cleans cold segments that are at least 75% full • cleans hot segments that are at least 15% full • The utilization and the "temperature" of a segment are maintained in an in-memory table.
4 Crash Recovery • FFS • Major problem is that entire disk must be scanned • most-recently-written data could be anywhere on disk • LFS • Most-recently-written data is at one location on disk. • Uses checkpoints and roll-forward to maintain consistency. • Borrowed ideas from dB technology
4.1 Crash Recovery • Checkpoint Region • 2 copies maintained at fixed location on disk • written to alternately, in case of crash while updating checkpoint data • At points in time: • IO is blocked • All cache data is written to end of log • Checkpoint data from cache is written to disk. • IOs then re-enabled • could instead be done at points based on amount of data written • note similarity to GC techniques. • Skipping roll-forward techniques as it is very complex and depends on segment header info, read the paper for more info, it just enhances checkpointing
5 Empirical test results • Comparison to FFS/SunOS • Basically, it is significantly better for small files and it is better or as good for large files in all cases except: • large files that were originally written random and are later accessed sequentially. • Crash-recovery was not rigorously empirically tested against FFS/SunOS.