450 likes | 598 Views
A Fast File System for Unix. Presented By: Parang Saraf. Marshall K. Mckusick, William N. Joy, Samual J. Leffler and Robert S. Fabry Computer Systems Research Group, UCB. CS 5204: Operating Systems, Virginia Tech. About the Paper.
E N D
A Fast File System for Unix Presented By: Parang Saraf Marshall K. Mckusick, William N. Joy, Samual J. Leffler and Robert S. Fabry Computer Systems Research Group, UCB CS 5204: Operating Systems, Virginia Tech
About the Paper • Considered as one of the most fundamental papers in operating systems • Have been cited around 930 times • Describes a new file system
Traditional File System • File System developed at Bell Laboratories • A file system is described by its Super-Block • Number of Data Blocks • Count of maximum number of files • Pointer to free list (linked list to all free blocks) • Disk drive is divided into partitions • Each disk partition may contain one file system • A file system never spans multiple partitions
Traditional File System – Inode • Each file has a descriptor associated with it – Inode. • Information includes: • Ownership of the file • Time stamps marking last modification and access time • Array of indices pointing to the data blocks • Direct Blocks – 8 • Indirect Blocks – Singly, Doubly and Triply
Traditional File System – Problem • Inode information segregated from Data • Long seek time from inode to its data • Files in single directory are not typically allocated consecutive slots for inode information • Many non-consecutive blocks of inodes are accessed when executing operations on inodes of several files in a directory • Sub-optimum allocation of data blocks • Small Block size – 512 bytes • Many Seeks – Next sequential block is not on the same cylinder • Limited read-ahead
Old File System • Developed at Berkeley • Increased Throughput • Changing the basic block size from 512 bytes to 1024 bytes • Each disk transfer accessed twice as much data • Less number to indirect blocks used • Increased Reliability • Staging modifications to critical file system information so that they could either be completed or repaired cleanly after a crash
Old File System – Problem • Old file system was still using just 4% of disk bandwidth • Main problem – Scrambled Free List
Old File System – Problem • Old file system was still using just 4% of disk bandwidth • Main problem – Scrambled Free List • Initially ordered for optimal access • Scrambled because files were created and removed • Eventually becomes entirely random – blocks allocated randomly • On creation provides transfer rates up to 175 kbps • Rate deteriorates to 30 kbps after a few weeks of moderate use • Possible Solution – Dump, rebuild and restore / Fragmentation
New File System • Each disk drive contains one or more file systems • A File System is described by its super-block, located at the beginning of the disk partition • Super-block is replicated to protect against catastrophic loss • Block size is any power of two >= 4096 bytes • Decided at the time of file system creation and can’t be changed • File Systems can have different block sizes
New File System – Cylinder Groups • Comprises of one or more consecutive cylinders
New File System – Cylinder Groups • Comprises of one or more consecutive cylinders • Disk partition is divided into one or more cylinder groups • Has associated book-keeping information: • A redundant copy of super-block • Space for inodes • A bit map describing available blocks – replaces free list • Summary information describing usage of data blocks
New File System – Cylinder Groups • Contains static number of inodes: • Allocated at file system creation time • Default policy – one inode for each 2048 bytes • Book-keeping information begins at varying offset from the beginning of the cylinder group • Redundant information spirals down into the cylinder • Any single track, cylinder or platter can be lost without losing copies of the super-block
New File System – Key Contributions • Optimizing storage utilization • File System Parameterization • Layout Policies
Optimizing Storage Utilization • New 4096 size blocks – transfers 4 times more • Problem with large blocks: • Wasted space due to small files
Optimizing Storage Utilization • Solution: • Divide the 4096 block into 2, 4 or 8 fragments to accommodate small files • Fragment size is specified at the time file system is created • Block map records the space available at fragment level
Optimizing Storage Utilization • Free List vs Bitmap
Optimizing Storage Utilization • Space allocation: • Space is allocated when a program does a write system call • Three possible conditions: • Enough space left in an already allocated block or fragment • File contains no fragmented blocks – allocate new blocks and fragments • File contains one or more fragmented blocks but has insufficient space to hold new data – new block is allocated, old fragments are copied and new fragments are appended
Optimizing Storage Utilization • Free space reserve • Minimum acceptable percentage of file system blocks that should be free – 90% • Only system administrator can allocate blocks after that • Important for the layout policies to be effective • After this the file system throughput is cut in half because of the inability to localize blocks in a file
Optimizing Storage Utilization • Wasted space comparison • Space wasted by 4096/1024 byte new file system is same as 1024 byte Old File System • New file system uses less space for indexing large files • Uses same amount of space for small files • Free space reserve should also be counted as wasted space
File System Parameterization • Optimum block allocation based on hardware parameters • Speed of Processor • Hardware support for mass storage transfers • Characteristics of the mass storage devices • Blocks are allocated on the same cylinder • Block allocation depends on whether the processor has an input/output channel or not
File System Parameterization Accessing which data is faster?
File System Parameterization Accessing which data is faster? Depends whether processor has I/O channel or not
File System Parameterization • Rotationally Optimal Blocks • Processors without I/O channels must field an interrupt and then prepare for a new disk transfer • Disk rotates during this time • Place blocks such that disk rotation is taken into account before the start of a new disk transfer operation • Cylinder group summary information includes count of blocks based on different rotational positions – 8 positions • Super-block contains a vector of lists called as Rotational Layout Tables – Used by system when allocating new blocks
Layout Policies • Layout policies divided into two distinct parts: • Global Policies • Local Allocation Routines • Two allocable resources: • Inodes • Data Blocks
Layout Policies • Global Policies • Uses file system wide summary information to make decisions regarding the placement of new inodes and data blocks • Tries to localize data that is concurrently accessed while spreads out unrelated data • Inodes: • Places all inodes of files in a directory in the same cylinder group • A new directory is placed in a cylinder group that has a greater than average number of free inodes and the smallest number of directories already in it – ensures that files are distributed throughout the disk
Layout Policies • Global Policies • Data Blocks: • Tries to place all data blocks for a file in the same cylinder group • None of the cylinder groups should ever become completely full • Heuristic Solution – redirect block allocation to a different cylinder group when a file exceeds 48 kb and at every MB thereafter • Ensures that cost of one long seek per MB is small • New cylinder groups are chosen from those cylinder groups that have a greater than average number of free blocks left • Finally it calls Local Allocation Routines for block allocation
Layout Policies • Local Allocation Routines • Allocates a free block as requested by the Global layout policies • Uses a four level allocation • First Level – use the next free block that is rotationally closest to the requested block on the same cylinder Cylinder 0
Layout Policies • Local Allocation Routines • Second Level – if there are no free blocks on the same cylinder, a free block in the same cylinder group is selected Cylinder 0 Cylinder Group Cylinder 1
Layout Policies • Local Allocation Routines • Third Level – if the cylinder group is full, use the quadratic hash function to hash the cylinder group number to find another cylinder group to look for a free block • Fourth Level – if the hash fails, use an exhaustive search on all cylinder groups • Quadratic Hash • is used because of its speed in finding unused slots in nearly full hash tables • File systems parameterized to maintain 10% free space rarely use this
Performance • Measured Throughput
Performance • List Directory command performance • For large directories containing many directories, disk access for inodes is cut by a factor of two • For large directories containing only files, disk access for inodes is cut by a factor of eight • Both reads and writes are faster in new file system • Because larger block sizes are used • The overhead of allocating is more but cost per byte allocation is same • Reading rate is always at least as fast as writing rate • Writes are slower for 4096 byte block as compared to 8096 byte block • In old file system writing was 50% faster than reading
New File System - Limitations • Limited by memory to memory copy operations required to move data from disk buffers in the system’s address space to data buffers in the user’s address space • Buffer alignment of both address space • One block is allocated to a file at a time • Pre-allocate several blocks at once and releasing unused ones on file closing
Functional Enhancements • Long File Name • File Locking • Symbolic Links • Rename • Quotas
Long File Name • Maximum length of file name is 255 characters • Directories are allocated 512 byte units called chunks • Chunks are broken into Directory Entries: • Contains information necessary to map the name of file with inode • First three fields are fixed length – inode number, size of entry and length of file name
File Locking • Hard Lock – always enforced when a program tries to access a file • Advisory shared or exclusive locks – requested by the programs • System administrator privilege can override locks • No deadlock detection is attempted
Symbolic Links • A symbolic link is implemented as a file that contains a pathname • Pathname can be relative or absolute • On encountering a symbolic link while interpreting a component of a pathname, the contents of the symbolic link is prepended to the rest of the pathname
Rename • Old file system required three system calls for renaming • Target file could be left with temporary name due to crash • New rename system call added that guarantees the existence of the target name • Renaming works both on directory and files
Quotas • Old file system – any single user can allocate all the available space in the file system • Quota restricts the amount of file system resources that a user can obtain • Sets limits to both inodes and number of disk blocks • Hard and soft limits
Key Take-Away points • Substantially higher throughput rates – large block size • Flexible allocation policies • Better locality of reference • Less wastage • Adapted to wide range of peripheral and processor characteristics
References • Presentation on “A Fast File System” by: • Zhifei Wang : www.cs.pdx.edu/~walpole/class/cs533/spring2006/slides/191.ppt • pdc-amd01.poly.edu/~wein/cs6243/ppts/fastfile.ppt • Sean Mondesire and Subramanian Kasi : www.cs.ucf.edu/courses/cop5611/spring05/item/FFS.ppt • www.scs.ryerson.ca/~aabhari/File_System.ppt • http://flylib.com/books/en/3.224.1.79/1/ • http://osr507doc.sco.com/en/HANDBOOK/graphics/harddisk.gif