1 / 52

Understanding File System Challenges and Implementation

Learn about file system interface, structure, and implementation. Explore challenges and solutions of managing data efficiently on storage devices. Understand the layered structure and file attributes. Discover directory concepts and operations.

reneg
Download Presentation

Understanding File System Challenges and Implementation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. School of Computing Science Simon Fraser University CMPT 300: Operating Systems I Chapters 10, 11, 12: File System and Disk Scheduling Dr. Mohamed Hefeeda

  2. Objectives • Understand how to store and manage information on secondary storage systems • Understand file system: • Interface • Structure • Implementation • Note: file system is the most visible part of the OS to users

  3. Secondary Storage Systems • Various storage media • Magnetic disks • Magnetic tapes • Optical disks • …. • Each medium has different physical characteristics • Storing bits on disks is different from storing them on CDs • Yet, OS provides a uniform logical view of storage to users • To efficiently store, locate, and retrieve data from a storage system, OS creates one or more file systems on it

  4. File System Challenges • File systems involve two design problems • File system interface: how file system looks to users • Define a file, file attributes, operations on files, and how files are organized into directories • File system implementation:algorithms and datastructures to map logical file system onto physical devices • Block allocation, free-space management, searching a directory, data caching, …

  5. File System: Layered Structure • Interface: file and directory structure • Maintains pointers to logical block addresses Application Programs Logical File System • Implementation: block allocation, … • Maps logical into physical addresses File-organization Module Device Drivers • Implementation: device-specific instructions • Writes specific bit patterns to device controller StorageDevices

  6. File System Interface: File Concept • From user’s perspective, a file is the smallest storage unit • A file is a named collection of related information recorded on a secondary storage • Information stored in a file could be of various types: • Text, numeric data • Binary data • Source code • Executable programs • …..

  7. File Attributes • Name – only information kept in human-readable form • Identifier – unique tag (number) identifies file within file system • Type – needed for systems that support different types • Location – pointer to file location on device • Size – current file size • Protection – controls who can do reading, writing, executing • Time, date, and user identification – data for protection, security, and usage monitoring • Information about files are kept in a directory, which is maintained on the disk as well • Each file has an entry in the directory

  8. File Operations • Create • Write • Read • Reposition within file • Delete • Truncate • More operations (e.g., copy) can be composed of these primitives • To perform these operations, we open the file (details later)

  9. File System Interface: Directory Concept • Directory is a logical grouping of files • A directory contains an entry for each file under it • Some systems (UNIX) treat directories just as files • In fact, UNIX treats everything as a file • Operations on a directory • Search for a file • Create a file • Delete a file • List a directory • Rename a file • Traverse the file system

  10. Directory Structure • Design the directory structure to achieve • Efficiency – locating a file quickly • Naming – convenient to users • Two users can have same name for different files • The same file can have several different names (aliases, links) • Grouping – logical grouping of files by properties, (e.g., all Java programs, all games, …) • Tree-structured directories are the most common

  11. Tree-Structured Directories

  12. Tree-Structured Directories (cont’d) • Efficient searching • Grouping capability • Things get complicated when we start adding links • Directory is no longer a tree  acyclic-graph structure

  13. Acyclic-Graph Directories • When file is deleted while some links still point to it  • Dangling pointers!

  14. Acyclic-Graph Directories (cont’d) • Solution for dangling links in Unix • Symbolic link • Just leave the dangling pointer for the user to delete • Try: • $ ln –s file.txt file_symLink.txt • ls –l • rm file.txt • ls –l • Hard link • Keep a reference count on the file • Only delete the physical file when all links to it are deleted • Try: • $ ln file.txt file_link.txt • ls –l • rm file.txt • ls –l • Links may even create a cycle creating a general graph

  15. General Graph Directory

  16. General Graph Directory (cont’d) • Suppose we are backing up the entire file system or searching for a file through the directory • With links, we may visit the same subdirectory several times • Very costly (remember directory is stored on disk) • We may even loop for ever if we have cycles! • Solution?Simply: • Bypass links during directory traversal!

  17. root users bob alice data code File system on a storage device File System Mounting • A file system must be mounted before it can be accessed • OS is given name of the device and a mount point • OS checks device to make sure it has a valid file system • Then, OS makes the new file system available

  18. Virtual File Systems • Multiple file systems can be mounted at same time (typical) • disk: UFS (Unix), NTF (Windows), ext2 (Linux), ext3, … • CD: iso 9660 • File systems on other machines, e.g., Network File System (NFS) • Each file system has its own file and directory structure, allocation methods, algorithms and data structure, … • To shield users from all these differences, OS implements a virtual file system (VFS) layer • VFS provides a commoninterface (API) to all file systems • E.g., applications use open(), read(), write(), … without worrying about which file system(s) is (are) being used

  19. Virtual File System

  20. File System Implementation • To implement a file system, we need • On-disk structures, e.g., • directory structure, number of blocks, location of free blocks, boot information, … • (In addition to data blocks, of course) • In-memory structures to: • improve performance (caching) • manage file system

  21. On-disk Structures • Boot block: information to boot the OS • Volume control block: information about the volume (partition) • number of blocks, block size, free block count, … • UFS calls it superblock • NTFS calls it master file table (relational database) • File control block (FCB): per file, details about the file, e.g., • size, location of data blocks, file permissions, ownership • UFS calls it inode • NTFS stores this info in the master file table • Directory structure: how files are organized into directories • UFS uses inodes • NTFS stores this info in the master file table

  22. On-disk Structures Free block Boot block Superblock Directory structure File File control block Data block

  23. In-memory Structures • Mount table: info on each mounted volume (partition) • Directory-structure cache: info on recently accessed directories • System-wide open-file table: contains a copy of the FCB of each open file in the system • And info on which process currently using which file • Per-process open-file table: contains an entry for each file opened by this process, which has • a pointer to the corresponding entry in the system-wide open file table • and info regarding the usage of the file by this process, e.g., current file pointer, open mode (read, write), ..

  24. Opening a File • Search the directory to find the file control block • May need to bring (from disk) multiple directory blocks into memory, if they are not already cached • Consider the case: open(“/dir1/dir2/dir3/file.txt”) • Create an entry in the per-process open-file table (PFT) • Check whether the system-wide open-file table has an entry for this file • if it does • increment its reference count • Make the entry in PFT point to this entry • If it does not • Create a new entry, set its reference count to 1 • Make the entry in PFT point to the new entry • Return a pointer (file descriptor) to the entry in PFT • Successive file operations (read, write, …) use the file descriptor

  25. Opening and Reading from a File

  26. Creating a File • Allocate a new file control block (FCB) • For faster file creation, FCBs are usually pre-allocated  • Find a free FCB • Read relevant directory blocks in memory • Update them to reflect the new file and write them back to disk • Allocate free blocks for the data of to the file • How do we allocate free blocks to files? And • How do we know where the free blocks are?

  27. Allocation Methods • Problem: Allocate free blocks to files • Given: Disks allow random access of blocks • Objectives: Efficient disk space utilization, and fast file access • Threecommon allocation methods • Contiguous • Linked • Indexed

  28. Q LA/512 R Contiguous Allocation • Each file occupies a set of contiguous blocks • Needs only start address (block #) and length (number of blocks) • Mapping of logical address (LA) • Physical block = Q + start • Offset within block = R • Block size = 512

  29. Contiguous Allocation (cont’d) • Pros • Simple • Supports random access efficiently • Minimal disk head seeks  fast • Cons? • External fragmentation • Files may not be able to grow

  30. pointer data block Linked Allocation • Each file is a linked list of blocks • Blocks could be anywhere • Each block has a pointer to the next block • Need start block and end block (to append to file)

  31. Q LA/511 R Linked Allocation (cont’d) • Mapping of logical addresses • Physical block is at Qth location in the chain • but, how do we get to it? Traverse the chain! • Offset within block = R + 1 • Assume pointer takes 1 byte, and block size is 512 bytes

  32. Linked Allocation (cont’d) • Pros • No waste of space (except for pointers) • Simple: need only start and end addresses • Supports dynamic growing of files • Cons • No random access (or very costly to support) • Reliability: one block is corrupted, the chain is broken

  33. Indexed Allocation • Bring all pointers together into an index block

  34. Q LA/512 R Indexed Allocation (cont'd) • Mapping of logical addresses • Q = displacement into index block • R = offset within the block • Pros • Supports random access • Supports dynamic growing of files • No external fragmentation • Cons • Overhead of index blocks • A file of one or a few data blocks needs an index block • How do we choose the size of index blocks?

  35. Indexed Allocation (cont'd) • First, consider a file with one index block • Assume each pointer takes 4 bytes, and block size is 512 bytes • What is the maximum file size supported? • Index block may have up to 512/4 =128 entries  • max file size = 128 * 512 = 64 KB • Now how do we support larger files? • Increase size of index blocks  waste space for small files • Better solutions?

  36. Indexed Allocation (cont’d) • Linked index blocks • Last word in index block points to another index block • May need to traverse the index linked list (long access time) • Multilevel index • First-level index block points to a set of second-level index blocks which refer to data blocks • Shorter access time but more space overhead • Combined (used in Unix File System) • Multilevel and linked • Each file has an index block (inode), which contains • Pointers that point to data blocks directly (for small files) • Pointers that point to index blocks, which in turn may point to either data blocks or another level of index blocks • UNIX supports up to three level of index blocks

  37. Combined Scheme: UNIX inode Assume block size of 4KB, 4-byte pointers, 12 direct entries, 1 single, 1 double and1 triple indirect, what is the max file size supported? (12 + 1024 + 1024*1024 + 1024*1024*1024) *4KB >> what the 32-bit file pointer can address (=4GB)!

  38. How Do We Know Where Free Blocks Are? • Bit map • Every block has a bit: 0 = occupied, 1 = free • 00011110 01100000 00001111 1………..1 • Simple to implement • Easy to find contiguous blocks • Supported by hardware • Single instruction to find offset of first bit with value 1 in a word (of 32 bits)  fast searching • Disadvantages • Bit map is stored on disk  slow to access • Solution: cache it in memory • Bit maps are not small for large disks  waste of space • 40-GB disk with 1-KB blocks  40 M blocks  5-MB bitmap • This makes it difficult to cache the entire bitmap

  39. How Do We Know Where Free Blocks Are? • Linked List • No waste of disk space • But, not easy to get contiguous space

  40. Disk Scheduling • Processes issue disk read/write requests • Kernel maps these requests to physical block addresses • These requests are sent to disk controller • Problem: If there are multiple outstanding requests (in a disk queue), which one should be serviced first? • Objectives • Fast disk access time • High disk bandwidth (#bytes/sec transferred between disk and memory) • Fairness (may be!) • Before presenting scheduling algorithms, let us understand the structure and operation of magnetic disks

  41. Disk Physical Structure • Several platters, each is divided into circulartracks, which are subdivided into sectors • Head moves horizontally from one track to another • Disk rotates at high speed (60--200 times/sec) • Tracks accessed at same head position make a cylinder • Drive can be directly attached to computer via I/O bus (EIDE, ATA, SCSI), or it could be attached through the network (ISCSI)

  42. Disk Logical Structure • Disk is viewed as a one-dimensional array of logical blocks • The logical block is the smallest unit of transfer • Block = sector • The array of blocks is mapped into sectors of the disk sequentially: • Block 0 is at the first sector of the first track on the outermost cylinder • Mapping proceeds in order through that track, • Then the rest of the tracks in that cylinder, • Then through the rest of the cylinders from outermost to innermost • Block Address: <cylinder, track, sector>

  43. Disk Operation • Accessing (reading/writing) a block • Move the head to desired track (seektime) • Wait for desired sector to rotate under the head (rotational latency time) • Transfer the block to a local buffer, then to main memory (transfer time) • We try to minimize the seektime, which is proportional to the seek distance (distance moved by the head)

  44. Disk Scheduling Algorithms • Several algorithms exist to schedule the servicing of disk I/O requests • FCFS • SSTF • SCAN, C-SCAN • LOOK, C-LOOK • We illustrate them with a request queue (0 -199 cylinders) 98, 183, 37, 122, 14, 124, 65, 67 • Assume initial head position at cylinder 53

  45. First Come First Served • Find the total head movements to service the request queue: 98, 183, 37, 122, 14, 124, 65, 67 • Let us work it out Total head movements = 640 cylinders

  46. Shortest Seek Time First • Select request with minimum seek time from current head position • Total head movements = 236 cylinders • May cause starvation of some requests

  47. SCAN • Disk arm starts at one end and moves toward the other end, servicing requests • When it gets to the other end, movement is reversed • Total head movements = 208 cylinders

  48. Circular SCAN (C-SCAN) • Provides a more uniform wait time than SCAN • The head moves from one end to the other, servicing requests as it goes • When it reaches the other end, however, it immediately returns to the beginning of the disk, without servicing any requests on the return trip • Treats cylinders as a circularlist that wraps around from the last cylinder to the first one

  49. C-SCAN (cont’d)

  50. C-LOOK • Version of C-SCAN • Arm only goes as far as the last request in each direction, • Then reverses direction immediately

More Related