190 likes | 207 Views
CS 346 – Chapter 12. File systems Structure Information to maintain How to access a file Directory implementation Disk allocation methods efficient use, quick access Managing free space Efficiency, performance Recovery. Structure. File system is usually built on top of disks
E N D
CS 346 – Chapter 12 • File systems • Structure • Information to maintain • How to access a file • Directory implementation • Disk allocation methods efficient use, quick access • Managing free space • Efficiency, performance • Recovery
Structure • File system is usually built on top of disks • Medium can be rewritten in place • Relatively easy to move to another place on disk • Purpose of file system • Provide a user interface to access files • Define a mapping between logical files and space on a secondary storage device • FS have several levels/layers of abstraction & functionality, e.g. 4 • Logical file system • File organization module • Basic file system • I/O control
Layers • Logical file system • Maintain file’s metadata: inside a “file control block” aka “inode” • Directory data structure • File-organization module • Translates between logical and physical data blocks of a file. In other words it knows everybody’s real address. • e.g. logical block numbers might always start 0 • Basic file system • Manipulate specific sectors on disk. • Maintain buffers for file I/O • I/O control • Device drivers give machine-language commands to device to accomplish the file I/O. • (Different file systems can use the same device drivers.)
FS information On disk: • Boot (control) block = first block on a volume. Give inst on how to load the OS. • Volume control block = “superblock” • Statistics about the volume: # of blocks, their size, how many are free and which ones • Directory data structure: point to each file • File control block (inode) for each file (contains what info?) In memory: • Which volumes are currently mounted • Cache of recently accessed directories (faster access) • Which files are currently open • Per process • System-wide • Buffers holding currently processing file I/O
Opening file • open( ) system call passes file name to logical FS • See if anyone else already has this file opened. • How? • What if it is? • If not already open, search the directory • If found, • copy file control block (inode) to system-wide open file table • Set pointer in process’ open file table (Why not the inode?) • Also in process’ table: dynamic stuff like initialize current location within file, whether opened for read or write, etc. Should we copy this to inode also? • open( ) returns file descriptor (i.e. pointer to per-process table entry). Use this for future I/O on this file.
Multiple file systems • We generally don’t have 1 file system in charge of the entire disk • Disks usually have partitions… • Raw partition • Where you don’t want/need to have files • Ex. Swap space; information related to backups • Boot partition – should be treated special / separate • Contains program to ask user which OS to boot • Multiple OS can give rise to different FS • Use “virtual file system” to manage multiple FS • Hide from user the fact > 1 FS
Virtual FS • See Figure 12.4 • Purpose: act as an interface between the logical FS the user interacts with, and the actual local/remote file system • Defines essential object types, for example • File metadata, e.g. inode • Info about an open file • Superblock: info about an entire file system • Directory entries • For each type of object, set of operations defined, to be implemented by individual FS • Ex. For a file: open, read, write, …
Directory rep’n • A question of which data structure to use • Linear list? • Essentially an array of pointers (we point to the data blocks) • Advantage / Disadvantage? • Other data structures are possible: any good? • Sorted list • Binary search tree; B-tree • Hash table
1. Contiguous allocation • Advantage – few seeks needed on disk • Ex. We would like a file to reside entirely in one track if possible • If you know disk address of first block, and length of file, you know where entire file is. • Problems • Where to put a new file: dynamic storage allocation: best fit, worst fit, first fit • External fragmentation • Can’t predict a file deletion that would give you a better fit • Don’t know size of brand new file • Preallocating extra space: internal fragmentation • Can compact (defragment) files. Tedious operation. • File “extents”: a modification to contiguous scheme
2. Linked allocation • File is linked list of disk blocks (Figure 11.6) • File’s directory entry points to first & last blocks (in addition to maintaining other file attributes) • Avoids disadvantages of contiguous allocation • No fragmentation, don’t need to know size in advance, … • Criticism • Linked list inefficient to access data “directly” as opposed to sequentially. Ex. Editor requests to go to 3 millionth line. • What if 1 of the pointers becomes damaged? • Minor overhead from the pointer in each block. Can define “clusters” of the file to be contiguous blocks, but this suffers some fragmentation.
File allocation table • Located at start of disk • Table has an entry for each disk block • Has fixed size and is indexed by disk number • Purpose of these table entries is to point to next block in a file, like emulating a linked list with an array • File’s directory entry contains the starting block number. • See Figure 11.7 • Performance problem: • Need to do 2 seek operations every time you go to a new block in the file. Why? • Direct access with FAT is faster than pure linked allocation. Why?
3. Indexed allocation • The file on disk begins with an index block • Index block contains pointers to the various disk blocks containing the actual data of the file. • When file first created, all pointers set to null. One by one, they get initialized as file grows. • File’s directory entry contains block number of the index block. See Figure 11.8 • If all blocks on disk are exactly 1KB in size, how big of a file can we support using this scheme?
Bigger files • Linked indexed allocation: Can continue pointers in another index block. In general, can have a linked list of index blocks. • How big a file can we have with 2 linked index blocks? • Multilevel indexed allocation • Begin with a first-level index block. Entries contain addresses of second-level index blocks. • Each second-level index block has pointers to actual file data. • How big a file can we have? • Direct & indirect indexed allocation • File’s directory entry can hold several block numbers itself. • Followed by: single indirect block, double indirect block, triple indirect block. See figure 11.9
Free space • As long as volume is mounted, system maintains free-space “list” of unused blocks. Question is how to represent this info. • Bit vector: how much space? Keep it in memory? • Collect all free blocks into linked list. • We don’t typically traverse this list. Just grab/insert one. • Grouping technique • First “free” block used to store addresses of n – 1 actual free blocks. Last address stores location of another indirect block of free addresses. • Counting: store address along with number of contiguous free blocks
Efficiency • Where should inodes be on the disk? All in one place or scattered about? • Using linked allocation (treating data blocks as a linked list on disk) • How to keep a lid on the number of nodes in a list? • How to reduce internal fragmentation? • File metadata may include the last time file accessed • How expensive is this operation? Response/alternative? • Size of pointers (locations holding address) • Should the system’s global tables (process, open files) be fixed or variable length?
Performance Some techniques to optimize disk usage • Disk controller: store contents of a whole track • Why? What is necessary to accomplish this? • Buffer cache and page cache • “Cache” a file’s data blocks / physical pages of virtual memory • Caching the pages may be more efficient: • Pages can be individually larger than individual data blocks • Fewer computational steps to do virtual memory access than interfacing with the file system. • Not too efficient to employ both kinds of caches • “Double caching problem” with memory-mapped I/O: data first arrives into a page cache because the device is paged… and then copied to/from buffer cache
Performance (2) • Do you want writes to be synchronous or asynchronous? • Pass a parameter to open( ) system call to specify which you want. • Which is better typically? • When one is preferred over the other? • Page replacement: default policy like LRU may be bad in some situations • Consider sequential access to a file: Remove a page as soon as the next one is in use. Request the next few pages in advance.
Recovery • Need to protect from • Loss of data • Inconsistency (corruption) of data – resulting from what? • Consistency checking • Scan file metadata, see if it all makes sense • See if data blocks match a file correctly: traverse all pointers • Check free block list • What if something is wrong? • Is some information more critical than others? What extra protection to give? • Log transactions: tell which operations are pending, not complete • Full vs. incremental backups
Network file system • Principles • Each machine has its own file system • Client-server relationships may appear anywhere • Sharing only affects the client • To access remote directory, mount it • A remote directory is inserted in place of an existing (empty) directory whose contents now becomes hidden. • It will then look & behave like part of your local file system • Supports user mobility: access your files anywhere in the network • Protocol • Server has list of valid file systems that can be made available, and access rights (for each possible client)