Understanding File System Challenges and Implementation

School of Computing Science Simon Fraser University CMPT 300: Operating Systems I Chapters 10, 11, 12: File System and Disk Scheduling Dr. Mohamed Hefeeda

Objectives • Understand how to store and manage information on secondary storage systems • Understand file system: • Interface • Structure • Implementation • Note: file system is the most visible part of the OS to users

Secondary Storage Systems • Various storage media • Magnetic disks • Magnetic tapes • Optical disks • …. • Each medium has different physical characteristics • Storing bits on disks is different from storing them on CDs • Yet, OS provides a uniform logical view of storage to users • To efficiently store, locate, and retrieve data from a storage system, OS creates one or more file systems on it

File System Challenges • File systems involve two design problems • File system interface: how file system looks to users • Define a file, file attributes, operations on files, and how files are organized into directories • File system implementation:algorithms and datastructures to map logical file system onto physical devices • Block allocation, free-space management, searching a directory, data caching, …

File System: Layered Structure • Interface: file and directory structure • Maintains pointers to logical block addresses Application Programs Logical File System • Implementation: block allocation, … • Maps logical into physical addresses File-organization Module Device Drivers • Implementation: device-specific instructions • Writes specific bit patterns to device controller StorageDevices

File System Interface: File Concept • From user’s perspective, a file is the smallest storage unit • A file is a named collection of related information recorded on a secondary storage • Information stored in a file could be of various types: • Text, numeric data • Binary data • Source code • Executable programs • …..

File Attributes • Name – only information kept in human-readable form • Identifier – unique tag (number) identifies file within file system • Type – needed for systems that support different types • Location – pointer to file location on device • Size – current file size • Protection – controls who can do reading, writing, executing • Time, date, and user identification – data for protection, security, and usage monitoring • Information about files are kept in a directory, which is maintained on the disk as well • Each file has an entry in the directory

File Operations • Create • Write • Read • Reposition within file • Delete • Truncate • More operations (e.g., copy) can be composed of these primitives • To perform these operations, we open the file (details later)

File System Interface: Directory Concept • Directory is a logical grouping of files • A directory contains an entry for each file under it • Some systems (UNIX) treat directories just as files • In fact, UNIX treats everything as a file • Operations on a directory • Search for a file • Create a file • Delete a file • List a directory • Rename a file • Traverse the file system

Directory Structure • Design the directory structure to achieve • Efficiency – locating a file quickly • Naming – convenient to users • Two users can have same name for different files • The same file can have several different names (aliases, links) • Grouping – logical grouping of files by properties, (e.g., all Java programs, all games, …) • Tree-structured directories are the most common

Tree-Structured Directories

Tree-Structured Directories (cont’d) • Efficient searching • Grouping capability • Things get complicated when we start adding links • Directory is no longer a tree  acyclic-graph structure

Acyclic-Graph Directories • When file is deleted while some links still point to it  • Dangling pointers!

Acyclic-Graph Directories (cont’d) • Solution for dangling links in Unix • Symbolic link • Just leave the dangling pointer for the user to delete • Try: • $ ln –s file.txt file_symLink.txt • ls –l • rm file.txt • ls –l • Hard link • Keep a reference count on the file • Only delete the physical file when all links to it are deleted • Try: • $ ln file.txt file_link.txt • ls –l • rm file.txt • ls –l • Links may even create a cycle creating a general graph

General Graph Directory

General Graph Directory (cont’d) • Suppose we are backing up the entire file system or searching for a file through the directory • With links, we may visit the same subdirectory several times • Very costly (remember directory is stored on disk) • We may even loop for ever if we have cycles! • Solution?Simply: • Bypass links during directory traversal!

root users bob alice data code File system on a storage device File System Mounting • A file system must be mounted before it can be accessed • OS is given name of the device and a mount point • OS checks device to make sure it has a valid file system • Then, OS makes the new file system available

Virtual File Systems • Multiple file systems can be mounted at same time (typical) • disk: UFS (Unix), NTF (Windows), ext2 (Linux), ext3, … • CD: iso 9660 • File systems on other machines, e.g., Network File System (NFS) • Each file system has its own file and directory structure, allocation methods, algorithms and data structure, … • To shield users from all these differences, OS implements a virtual file system (VFS) layer • VFS provides a commoninterface (API) to all file systems • E.g., applications use open(), read(), write(), … without worrying about which file system(s) is (are) being used

Virtual File System

File System Implementation • To implement a file system, we need • On-disk structures, e.g., • directory structure, number of blocks, location of free blocks, boot information, … • (In addition to data blocks, of course) • In-memory structures to: • improve performance (caching) • manage file system

On-disk Structures • Boot block: information to boot the OS • Volume control block: information about the volume (partition) • number of blocks, block size, free block count, … • UFS calls it superblock • NTFS calls it master file table (relational database) • File control block (FCB): per file, details about the file, e.g., • size, location of data blocks, file permissions, ownership • UFS calls it inode • NTFS stores this info in the master file table • Directory structure: how files are organized into directories • UFS uses inodes • NTFS stores this info in the master file table

On-disk Structures Free block Boot block Superblock Directory structure File File control block Data block

In-memory Structures • Mount table: info on each mounted volume (partition) • Directory-structure cache: info on recently accessed directories • System-wide open-file table: contains a copy of the FCB of each open file in the system • And info on which process currently using which file • Per-process open-file table: contains an entry for each file opened by this process, which has • a pointer to the corresponding entry in the system-wide open file table • and info regarding the usage of the file by this process, e.g., current file pointer, open mode (read, write), ..

Opening a File • Search the directory to find the file control block • May need to bring (from disk) multiple directory blocks into memory, if they are not already cached • Consider the case: open(“/dir1/dir2/dir3/file.txt”) • Create an entry in the per-process open-file table (PFT) • Check whether the system-wide open-file table has an entry for this file • if it does • increment its reference count • Make the entry in PFT point to this entry • If it does not • Create a new entry, set its reference count to 1 • Make the entry in PFT point to the new entry • Return a pointer (file descriptor) to the entry in PFT • Successive file operations (read, write, …) use the file descriptor

Opening and Reading from a File

Creating a File • Allocate a new file control block (FCB) • For faster file creation, FCBs are usually pre-allocated  • Find a free FCB • Read relevant directory blocks in memory • Update them to reflect the new file and write them back to disk • Allocate free blocks for the data of to the file • How do we allocate free blocks to files? And • How do we know where the free blocks are?

Allocation Methods • Problem: Allocate free blocks to files • Given: Disks allow random access of blocks • Objectives: Efficient disk space utilization, and fast file access • Threecommon allocation methods • Contiguous • Linked • Indexed

Q LA/512 R Contiguous Allocation • Each file occupies a set of contiguous blocks • Needs only start address (block #) and length (number of blocks) • Mapping of logical address (LA) • Physical block = Q + start • Offset within block = R • Block size = 512

Contiguous Allocation (cont’d) • Pros • Simple • Supports random access efficiently • Minimal disk head seeks  fast • Cons? • External fragmentation • Files may not be able to grow

pointer data block Linked Allocation • Each file is a linked list of blocks • Blocks could be anywhere • Each block has a pointer to the next block • Need start block and end block (to append to file)

Q LA/511 R Linked Allocation (cont’d) • Mapping of logical addresses • Physical block is at Qth location in the chain • but, how do we get to it? Traverse the chain! • Offset within block = R + 1 • Assume pointer takes 1 byte, and block size is 512 bytes

Linked Allocation (cont’d) • Pros • No waste of space (except for pointers) • Simple: need only start and end addresses • Supports dynamic growing of files • Cons • No random access (or very costly to support) • Reliability: one block is corrupted, the chain is broken

Indexed Allocation • Bring all pointers together into an index block

Q LA/512 R Indexed Allocation (cont'd) • Mapping of logical addresses • Q = displacement into index block • R = offset within the block • Pros • Supports random access • Supports dynamic growing of files • No external fragmentation • Cons • Overhead of index blocks • A file of one or a few data blocks needs an index block • How do we choose the size of index blocks?

Indexed Allocation (cont'd) • First, consider a file with one index block • Assume each pointer takes 4 bytes, and block size is 512 bytes • What is the maximum file size supported? • Index block may have up to 512/4 =128 entries  • max file size = 128 * 512 = 64 KB • Now how do we support larger files? • Increase size of index blocks  waste space for small files • Better solutions?

Indexed Allocation (cont’d) • Linked index blocks • Last word in index block points to another index block • May need to traverse the index linked list (long access time) • Multilevel index • First-level index block points to a set of second-level index blocks which refer to data blocks • Shorter access time but more space overhead • Combined (used in Unix File System) • Multilevel and linked • Each file has an index block (inode), which contains • Pointers that point to data blocks directly (for small files) • Pointers that point to index blocks, which in turn may point to either data blocks or another level of index blocks • UNIX supports up to three level of index blocks

Combined Scheme: UNIX inode Assume block size of 4KB, 4-byte pointers, 12 direct entries, 1 single, 1 double and1 triple indirect, what is the max file size supported? (12 + 1024 + 1024*1024 + 1024*1024*1024) *4KB >> what the 32-bit file pointer can address (=4GB)!

How Do We Know Where Free Blocks Are? • Bit map • Every block has a bit: 0 = occupied, 1 = free • 00011110 01100000 00001111 1………..1 • Simple to implement • Easy to find contiguous blocks • Supported by hardware • Single instruction to find offset of first bit with value 1 in a word (of 32 bits)  fast searching • Disadvantages • Bit map is stored on disk  slow to access • Solution: cache it in memory • Bit maps are not small for large disks  waste of space • 40-GB disk with 1-KB blocks  40 M blocks  5-MB bitmap • This makes it difficult to cache the entire bitmap

How Do We Know Where Free Blocks Are? • Linked List • No waste of disk space • But, not easy to get contiguous space

Disk Scheduling • Processes issue disk read/write requests • Kernel maps these requests to physical block addresses • These requests are sent to disk controller • Problem: If there are multiple outstanding requests (in a disk queue), which one should be serviced first? • Objectives • Fast disk access time • High disk bandwidth (#bytes/sec transferred between disk and memory) • Fairness (may be!) • Before presenting scheduling algorithms, let us understand the structure and operation of magnetic disks

Disk Physical Structure • Several platters, each is divided into circulartracks, which are subdivided into sectors • Head moves horizontally from one track to another • Disk rotates at high speed (60--200 times/sec) • Tracks accessed at same head position make a cylinder • Drive can be directly attached to computer via I/O bus (EIDE, ATA, SCSI), or it could be attached through the network (ISCSI)

Disk Logical Structure • Disk is viewed as a one-dimensional array of logical blocks • The logical block is the smallest unit of transfer • Block = sector • The array of blocks is mapped into sectors of the disk sequentially: • Block 0 is at the first sector of the first track on the outermost cylinder • Mapping proceeds in order through that track, • Then the rest of the tracks in that cylinder, • Then through the rest of the cylinders from outermost to innermost • Block Address: <cylinder, track, sector>

Disk Operation • Accessing (reading/writing) a block • Move the head to desired track (seektime) • Wait for desired sector to rotate under the head (rotational latency time) • Transfer the block to a local buffer, then to main memory (transfer time) • We try to minimize the seektime, which is proportional to the seek distance (distance moved by the head)

Disk Scheduling Algorithms • Several algorithms exist to schedule the servicing of disk I/O requests • FCFS • SSTF • SCAN, C-SCAN • LOOK, C-LOOK • We illustrate them with a request queue (0 -199 cylinders) 98, 183, 37, 122, 14, 124, 65, 67 • Assume initial head position at cylinder 53

First Come First Served • Find the total head movements to service the request queue: 98, 183, 37, 122, 14, 124, 65, 67 • Let us work it out Total head movements = 640 cylinders

Shortest Seek Time First • Select request with minimum seek time from current head position • Total head movements = 236 cylinders • May cause starvation of some requests

SCAN • Disk arm starts at one end and moves toward the other end, servicing requests • When it gets to the other end, movement is reversed • Total head movements = 208 cylinders

Circular SCAN (C-SCAN) • Provides a more uniform wait time than SCAN • The head moves from one end to the other, servicing requests as it goes • When it reaches the other end, however, it immediately returns to the beginning of the disk, without servicing any requests on the return trip • Treats cylinders as a circularlist that wraps around from the last cylinder to the first one

C-SCAN (cont’d)

C-LOOK • Version of C-SCAN • Arm only goes as far as the last request in each direction, • Then reverses direction immediately

Understanding File System Challenges and Implementation