File System Implementation Methods Explained

Chapter 14: File-System Implementation

Chapter 14: File-System Implementation • Directory Implementation • Allocation Methods • Free-Space Management • Efficiency and Performance • Recovery

Objectives • To describe the details of implementing local file systems and directory structures • To describe the implementation of remote file systems • To discuss block allocation and free-block algorithms and trade-offs

Disk Space Allocation Methods • A disk is a direct-access device and thus gives us flexibility in the implementation of files. • The main issue is how to allocate space to these files so that disk space is utilized effectively and files can be accessed quickly. • Three major methods of allocating disk space are in wide • Contiguous • Linked • Indexed

Contiguous Allocation Each file occupies a set of contiguous blocks If the blocks are allocated to the file in such a way that all the logical blocks of the file get the contiguous physical block in the hard disk then such allocation scheme is known as contiguous allocation. • Best performance in most cases • Simple – only starting location (block #) and length (number of blocks) are required • Problems include: • Finding space for file, • Knowing file size (max size), • External fragmentation, • Need for compaction • Off-line (downtime) or • On-line

Contiguous Allocation (Cont.) • Mapping from logical to physical (assume block size if 512) • Block to be accessed = “starting address” + Q • Displacement into block = R Q LA/512 R Advantages • It is simple to implement. • We will get Excellent read performance. • Supports Random Access into files. Disadvantages • This method suffers from both internal and external fragmentation. This makes it inefficient in terms of memory utilization. • Increasing file size is difficult because it depends on the availability of contiguous memory at a particular instance.

Extent-Based Systems • Many newer file systems (i.e., Veritas File System) use a modified contiguous allocation scheme • Extent-based file systems allocate disk blocks in chunks called extents. • An extentis a contiguous block of disks • Extents are allocated for file allocation • A file consists of one or more extents

Linked Allocation Each file is a linked list of blocks In this scheme, each file is a linked list of disk blocks which need not be contiguous. The disk blocks can be scattered anywhere on the disk.The directory entry contains a pointer to the starting and the ending file block. Each block contains a pointer to the next block occupied by the file. • No external fragmentation • Each block contains pointer to next block • Free space management system called when new block needed • Locating a block can take many I/Os and disk seeks • Reliability can be a problem (what if one of the pointers get corrupted?) • Improve efficiency by clustering blocks into groups but increases internal fragmentation

pointer block = Linked Allocation (Cont.) • Each file is a linked list of disk blocks: blocks may be scattered anywhere on the disk • Mapping: • Block to be accessed is the Qth block in the linked chain of blocks representing the file • Displacement into block = R + 1 Q LA/511 R

Linked Allocation (Cont.)

Linked Allocation (Cont.) • Advantages: • This is very flexible in terms of file size. File size can be increased easily since the system does not have to look for a contiguous chunk of memory. • This method does not suffer from external fragmentation. This makes it relatively better in terms of memory utilization. • Disadvantages: • Because the file blocks are distributed randomly on the disk, a large number of seeks are needed to access every block individually. This makes linked allocation slower. • It does not support random or direct access. We can not directly access the blocks of a file. A block k of a file can be accessed by traversing k blocks sequentially (sequential access ) from the starting block of the file via block pointers. • Pointers required in the linked allocation incur some extra overhead.

Indexed Allocation • Indexed allocation • Each file has its own index block(s) of pointers to its data blocks • Logical view

Example of Indexed Allocation Instead of maintaining a file allocation table of all the disk pointers, Indexed allocation scheme stores all the disk pointers in one of the blocks called as indexed block. Indexed block doesn't hold the file data, but it holds the pointers to all the disk blocks allocated to that particular file. Directory entry will only contain the index block address.

Example of Indexed Allocation • Advantages: • This supports direct access to the blocks occupied by the file and therefore provides fast access to the file blocks. • It overcomes the problem of external fragmentation. • Disadvantages: • A bad index block could cause the lost of entire file. • Size of a file depends upon the number of pointers, a index block can hold. • Having an index block for a small file is totally wastage. • More pointer overhead

Indexed Allocation (Cont.) • Need an index table • Random access • Dynamic access without external fragmentation, but have overhead of index block • Mapping from logical to physical in a file of maximum size of 256K bytes and block size of 512 bytes. We need only 1 block for index table • Q = displacement into index table • R = displacement into block Q LA/512 R

Indexed Allocation – Large Files • Mapping from logical to physical in a file of unbounded length (block size of 512 words) • Can be accomplished with 3 different methods: • Linked scheme. • Multilevel index • Combined scheme.

Indexed Allocation – Linked Scheme Link blocks of index table Q1 • Q1= block of index table • R1is used as follows: LA / (512 x 511) R1 Q2 R1 / 512 R2 • Q2 = displacement into block of index table • R2 displacement into block of file:

Indexed Allocation – Multilevel Index Two-level index (4K blocks could store 1,024 four-byte pointers in outer index  1,048,567 data blocks and file size of up to 4GB) Q1 LA / (512 x 512) R1 • Q1 = displacement into outer-index • R1 is used as follows: Q2 R1 / 512 R2 • Q2 = displacement into block of index table • R2 displacement into block of file:

Indexed Allocation – Multilevel (Cont.)

Combined Scheme: UNIX UFS • 4K bytes per block, 32-bit addresses • Keep the first 15 pointers of the index block in the file’s inode • The first 12 of these point to direct blocks; i.e., they contain addresses of the first 12 data blocks of the file. No need for a separate index block small files (of no more than 12 blocks). Up to 48 KB of data can be accessed directly. • The next three pointers point to indirect blocks. • The first points to a single indirect block, which is an index block containing not data but the addresses of blocks that do contain data. • The second points to a double indirect block, which contains the address of a block that contains the addresses of blocks that contain pointers to the actual data blocks. • The last pointer contains the address of a triple indirect block.

Combined Scheme: UNIX UFS Example

Performance • Best method depends on file access type • Contiguous great for sequential and random • Linked good for sequential, not random • Declare access type at creation -> select either contiguous or linked • Indexed more complex • Single block access could require 2 index block reads then data block read

Performance (Cont.) • Adding instructions to the execution path to save one disk I/O is reasonable • Intel Core i7 Extreme Edition 990x (2011) at 3.46Ghz = 159,000 MIPS • http://en.wikipedia.org/wiki/Instructions_per_second • Typical disk drive at 250 I/Os per second • 159,000 MIPS / 250 = 630 million instructions during one disk I/O • Fast SSD drives provide 60,000 IOPS • 159,000 MIPS / 60,000 = 2.65 millions instructions during one disk I/O

Free-Space Management • File system maintains free-space list to track available blocks/clusters • (Using term “block” for simplicity) • Four different methods: • Bit map • Free list • Grouping • Counting

Free-Space Management • A file system is responsible to allocate the free blocks to the file therefore it has to keep track of all the free blocks present in the disk. There are mainly two approaches by using which, the free blocks in the disk are managed. 1. Bit Vector • In this approach, the free space list is implemented as a bit map vector. It contains the number of bits where each bit represents each block. • If the block is empty then the bit is 1 otherwise it is 0. Initially all the blocks are empty therefore each bit in the bit map vector contains 1. • LAs the space allocation proceeds, the file system starts allocating blocks to the files and setting the respective bit to 0. 2. Linked List • It is another approach for free space management. This approach suggests linking together all the free blocks and keeping a pointer in the cache which points to the first free block. • Therefore, all the free blocks on the disks will be linked together with a pointer. Whenever a block gets allocated, its previous free block will be linked to its next free block.

Free-Space Management – bit map • Bit vector or bit map (n blocks) 0 1 2 n-1 … 1  block[i] free 0  block[i] occupied bit[i] =  Block number calculation (number of bits per word) * (number of 0-value words) + offset of first 1 bit CPUs have instructions to return offset within word of first “1” bit

Bit Map (Cont.) • Bit map requires extra space • Example: block size = 4KB = 212 bytes disk size = 240 bytes (1 terabyte) n = 240/212 = 228 bits (or 32MB) if clusters of 4 blocks -> 8MB of memory • But -- easy to get contiguous files

Free List • Linked list (free list) • Cannot get contiguous space easily • No waste of space • No need to traverse the entire list (if # free blocks recorded)

Grouping and Counting • Grouping • Modify linked list to store address of next n-1 free blocks in first free block, plus a pointer to next block that contains free-block-pointers (like this one) • Counting • Because space is frequently contiguously used and freed, with contiguous-allocation allocation, extents, or clustering • Keep address of first free block and count of following free blocks • Free space list then has entries containing addresses and counts

Recovery • Consistency checking– compares data in directory structure with data blocks on disk, and tries to fix inconsistencies • Can be slow and sometimes fails • Use system programs to back updata from disk to another storage device (magnetic tape, other magnetic disk, optical) • Recover lost file or disk by restoringdata from backup

End of Chapter 14

File System Implementation Methods Explained

File System Implementation Methods Explained

Presentation Transcript

Chapter 10: File System

ThreadOS: File System Implementation

Chapter 12: File System Implementation

File-System Implementation

Operating Systems

The Design and Implementation of a Log-Structured File System

Chapter 10: File-System Interface Chapter 11: File System Implementation

Chapter 11: File System Implementation

Chapter 12: File System Implementation

File System Implementation

Chapter 11: File System Implementation

Chapter 11: File System Implementation

Chapter 12: File System Implementation

Chapter 11: File System Implementation

Chapter 12: File System Implementation

Chapter 12: File System Implementation

File System Implementation

Chapter 11: File System Implementation

Chapter 10: File-System Interface

Chapter 11: File-System Implementation

Chapter 11: File System Implementation

Chapter 11: File System Implementation