530 likes | 552 Views
Explore I/O subsystems from OS responsibilities to device drivers, and delve into disk storage systems, including important concepts like file abstraction, persistence, seek optimization, and more.
E N D
Review – I/O subsystems • OS responsibilities to device drivers • Device API, buffering, naming, interrupt handling • Installing device drivers • Static vs. dynamic • Discovering devices that are present • Polling and identification • Bitmap display device • Bitblt to “paint” characters on screen • Window management File Systems
Network I/O & Project #3 • Socket abstraction • An I/O stream to a network connection • Connection: • A serial conversation between processes on separate machines • Separate from all other connections • Project #3: simple web server • Listen on a port; open a socket; make a connection File Systems
Review – Disks • Implementer of File abstraction • Storage of large amounts of data for very long times • Persistence, reliability • Controlled like I/O devices, but integral part of information storage subsystem • Rapidly increasing capacities, dropping prices • $0.5–$6.0 per gigabyte • Slowly improving transfer rates, seek performance • Only a factor of 5-10 in three decades! File Systems
Review – Disks (continued) • Organized into cylinders, tracks, sectors • Random access • Any sector can be read & written independently of any other • Very high bandwidth for consecutive reads or writes • Seek time is (often) dominating factor in performance • Bad blocks are a fact of life • Most detected during formatting • Some occur during operation • Controller or OS must step around them • Seek optimization algorithms • Popular study topics, less popular in real systems • Long seek queues system is out of balance File Systems
Files and File Systems CS-3013 & CS-502Operating Systems File Systems
File(an abstraction) • A (potentially) large amount of information or data that lives a (potentially) very long time • Often much larger than the memory of the computer • Often much longer than any computation • Sometimes longer than life of machine itself • (Usually) organized as a linear array of bytes or blocks • Internal structure is imposed by application • (Occasionally) blocks may be variable length • (Often) requiring concurrent access by multiple processes • Even by processes on different machines! File Systems
File Systems and Disks • User view • File is a named,persistent collection of data • OS & file system view • File is collection of disk blocks • File System maps file names and offsets to disk blocks • Outline for this section • Files • Directories • Implementation of Files • Implementation of Directories File Systems
Fundamental ambiguity • Is the file the “container of the information” or the “information” itself? • Almost all systems confuse the two. • Almost all people confuse the two. File Systems
Example – Suppose that you e-mail me a document • Later, how do either of us know that we are using the same version of the document? • Windows/Outlook/Exchange/MacOS: • Time-stamp is a pretty good indication that they are • Time-stamps preserved on copy, drag and drop, transmission via e-mail, etc. • Unix/Linux • By default, time-stamps not preserved on copy, ftp, e-mail, etc. • Time-stamp associated with container, not with information File Systems
Rule of Thumb • Almost always, application thinks in terms of the information • Most systems think in terms of containers Professional Guidance: Be aware of the distinction, even when the system is not File Systems
Name: Although the name is not always what you think it is! Type: May be encoded in the name (e.g., .cpp, .txt) Dates: Creation, updated, last accessed, etc. (Usually) associated with container Better if associated with content Size: Length in number of bytes; occasionally rounded up Protection: Owner, group, etc. Authority to read, update, extend, etc. Locks: For managing concurrent access … Attributes of Files File Systems
File Metadata • Information about a file • Maintained by the file system • Separate from file itself • Possibly attached to the file • E.g., in block #-1 • Used by OS in variety of ways • Occasionally used by application File Systems
File Types File Systems
Non-attribute of files – Location • Example 1: mv ~lauer/project1.doc ~cs502/public_html/S06 • Example 2: • System moves file from disk block 10,000 to disk block 20,000 • System restores a file from backup • May or may not be reflected in metadata File Systems
Attribute anomaly – Unix File Naming ln ~lauer/project1.doc ~cs502/public_html/S06 ln ~lauer/project1.doc NewProject1.doc • Unix hard links allow one file to have more than one name and/or location • The real name of a Unix file is its i-node – an internal name known only to the OS; points to metadata • Hard links are not used very often in modern Unix practice • Usually safe to regard last element of path as name File Systems
Operations on Files • Open, Close • Gain or relinquish access to a file • OS returns a file handle – an internal data structure letting it cache internal information needed for efficient file access • Read, Write, Truncate • Read: return a sequence of n bytes from file • Write: replace n bytes in file, and/or append to end • Truncate: throw away all but the first n bytes of file • Seek, Tell • Seek: reposition file pointer for subsequent reads and writes • Tell: get current file pointer • Create, Delete: • Conjure up a new file; or blow away an existing one File Systems
File – a very powerful abstraction • Documents, code • Databases • Very large, possibly spanning multiple disks • Streams • Input, output, keyboard, display • Pipes, network connections, … • Virtual memory backing store • … File Systems
File Access Methods • Sequential access • Read all bytes/records from the beginning • Cannot jump around, could possibly rewind or back up • Convenient when medium was magnetic tape or punched cards • Random access • Bytes/records read in any order • Essential for data base systems • Read can be … • move file pointer (seek), then read or … • read and then move file marker • Keyed or indexed access – access items in file based on the contents of an (part of an) item in the file • Provided in older commercial operating systems (IBM ISAM) • (Usually) handled separately by database applications in modern systems File Systems
Questions? File Systems
Directory – A Special Kind of File • A tool for users & applications to organize and find files • User-friendly names • Names that are meaningful over long periods of time • The data structure for OS to locate files (i.e., containers) on disk File Systems
Directory structures • Single level • One directory per system, one entry pointing to each file • Small, single-user or single-use systems • PDA, cell phone, etc. • Two-level • Single “master” directory per system • Each entry points to one single-level directory per user • Uncommon in modern operating systems • Hierarchical • Any directory entry may point to • Individual file • Another directory • Used in most modern operating systems File Systems
Directory Considerations • Efficiency – locating a file quickly. • Naming – convenient to users. • Separate users can use same name for separate files. • The same file can have different names for different users. • Names need only be unique within a directory • Grouping – logical grouping of files by properties • e.g., all Java programs, all games, … File Systems
Directory Organization – Hierarchical • Most systems support idea of current (working) directory • Absolute names – fully qualified from root of file system • /usr/group/foo.c • Relative names – specified with respect to working directory • foo.c • A special name – the working directory itself • “.” • Modified Hierarchical – Acyclic Graph (no loops) and General Graph • Allow directories and files to have multiple names • Links are file names (directory entries) that point to existing (source) files File Systems
Links • Hard links: bi-directional relationship between file names and file • A hard link is directory entry that points to a source file’s metadata • Metadata maintains reference count of the number of hard links pointing to it – link reference count • Link reference count is decremented when a hard link is deleted • File data is deleted and space freed when the link reference count goes to zero • Symbolic (soft) links: uni-directional relationship between a file name and the file • Directory entry points to new metadata • Metadata points to the source file • If the source file is deleted, the link pointer is invalid File Systems
Path Name Translation • Assume that I want to open “/home/lauer/foo.c” fd = open(“/home/lauer/foo.c”, O_RDWR); • File System does the following • Opens directory “/” – the root directory is in a known place on disk • Search root directory for the directory home and get its location • Open home and search for the directory lauer and get its location • Open lauer and search for the file foo.c and get its location • Open the file foo.c • Note that the process needs the appropriate permissions • File Systems spend a lot of time walking down directory paths • This is why open calls are separate from other file operations • File System attempts to cache prefix lookups to speed up common searches File Systems
Directory Operations • Create: • Make a new directory • Add, Delete entry: • Invoked by file create & destroy, directory create & destroy • Find, List: • Search or enumerate directory entries • Rename: • Change name of an entry without changing anything else about it • Link, Unlink: • Add or remove entry pointing to another entry elsewhere • Introduces possibility of loops in directory graph • Destroy: • Removes directory; must be empty File Systems
More on Directories • Fundamental mechanism for interpreting file names in an operating system • Orphan: a file not named in any directory • Cannot be opened by any application (or even OS) • (Often) does not even have name! • Tools • FSCK – check & repair file system, find orphans • Delete_on_close attribute • Special directory entry: “..” parent in hierarchy • Essential for maintaining integrity of directory system • Useful for relative naming File Systems
Break File Systems
Implementation of Files • Map file abstraction to physical disk blocks • Some goals • Efficient in time, space, use of disk resources • Fast enough for application requirements • Effective for a wide variety of file sizes • Many small files (< 1 page) • Huge files (100’s of gigabytes, terabytes, spanning disks) • Everything in between File Systems
File Allocation Schemes • Contiguous • Blocks of file stored in consecutive disk sectors • Directory points to first entry • Linked • Blocks of file scattered across disk, as linked list • Directory points to first entry • Indexed • Separate index block contains pointers to file blocks • Directory points to index block File Systems
Contiguous Allocation • Ideal for large, static files • Databases, fixed system structures, OS code • CD-ROM, DVD • Simple address calculation • Block address sector address • Fast multi-block reads and writes • Minimize seeks between blocks • Prone to fragmentation when … • Files come and go • Files change size • Similar to unpaged virtual memory File Systems
Bad Block Management –Contiguous Allocation • Bad blocks must be concealed • Foul up the block-to-sector calculation • Methods • Spare sectors in each track, remapped by formatting • Look-aside list of bad sectors • Check each sector request against hash table • If present, substitute a replacement sector behind the scene • Handling • Disk controller, invisible to OS • Lower levels of OS file system; invisible to application File Systems
Contiguous Allocation – Extents • Extent: a contiguously allocated subset of a file • Directory entry contains • (For file with one extent) the extent itself • (For file with multiple extents) pointer to an extent block describing multiple extents • Advantages • Speed, ease of address calculation of contiguous file • Avoids (some of) the fragmentation issues • Can be extended to support files across multiple disks • Disadvantages • Degenerates to indexed allocation in Unix-like systems File Systems
Blocks scattered across disk Each block contains pointer to next block Directory points to first and last blocks Sector header: Pointer to next block ID and block number of file Linked Allocation File Systems
Linked Allocation • Advantages • No space fragmentation! • Easy to create, extend files • Ideal for lots of small files • Disadvantages • Lots of disk arm movement • Space taken up by links • Sequential access only! File Systems
Variation on Linked Allocation – File Allocation Table • Instead of a link oneach block, put alllinks in one table • the FAT (File Allocation Table) • One entry per physical block in disk • Directory points to first & last blocks of file • Each block points to next block (or EOF) File Systems
FAT File Systems • Advantages • Advantages of Linked File System • FAT can be cached in memory • Searchable at CPU speeds, pseudo-random access • Disadvantages • Limited size, not suitable for very large disks • FAT cache describes entire disk, not just open files! • Not fast enough for large databases • Used in MS-DOS, early Windows systems File Systems
Disk Defragmentation • Re-organize blocks in disk so that file is (mostly) contiguous • Link or FAT organization preserved • Minimizes disk arm movement during sequential accesses File Systems
Bad Block Management –Linked and FAT Systems • In OS:– format all sectors of disk • Don’t reserve any spare sectors • Allocate bad blocks to a hidden file for the purpose • If a block becomes bad, append to the hidden file • Advantages • Very simple • No look-aside or sector remapping needed • Totally transparent without any hidden mechanism File Systems
Indexed Allocation • i-node: • Part of file metadata • Data structure lists the sector address of each block of file • Advantages • True random access • Only i-nodes of open files need to be cached • Supports small and large files File Systems
Unix i-nodes • Direct blocks: • Pointers to first n sectors • Single indirect table: • Extra block containing pointers to blocks n+1 .. n+m • Double indirect table: • Extra block containing single indirect blocks • … File Systems
Indexed Allocation • Access to every block of file is via i-node • Bad block management • Same as Linked/FAT systems • Disadvantage • Not as fast as contiguous allocation for large databases • Requires reference to i-node for every access vs. • Simple calculation of block to sector address File Systems
Questions? File Systems
Implementation of Directories • A list of [name, information] pairs • Must be scalable from very few entries to very many • Name: • User-friendly, variable length • Any language • Fast access by name • Information: • File metadata (itself) • Pointer to file metadata block (or i-node) on disk • Pointer to first & last blocks of file • Pointer to extent block(s) • … File Systems
name1 attributes name2 attributes name3 attributes name4 attributes … … Very Simple Directory • Short, fixed length names • Attribute & disk addresses contained in directory • MS-DOS, etc. File Systems
name1 name2 name3 name4 … Simple Directory i-node i-node i-node i-node Data structurescontaining attributes • Short, fixed length names • Attributes in separate blocks (e.g., i-nodes) • Attribute pointers are disk addresses (or i-node numbers) • Older Unix versions File Systems
attributes attributes attributes attributes … … name1 longer_name3 very_long_name4 name2 … More Interesting Directory • Variable length file names • Stored in heap at end • Modern Unix, Windows • Linear or logarithmic search for name • Compaction needed after • Deletion, Rename File Systems
Very Large Directories • Hash-table implementation • Each hash chain like a small directory with variable-length names • Must be sorted for listing File Systems
Free Block Management • Bitmap • Very compact on disk • Expensive to search • Supports contiguous allocation • Free list • Linked list of free blocks • Each block contains pointer to next free block • Only head of list needs to be cached in memory • Very fast to search and allocate • Contiguous allocation vary difficult File Systems
Free Block ManagementBit Vector 0 1 2 n-1 … 0 block[i] free 1 block[i] occupied bit[i] = Free block number calculation (number of bits per word) * (number of 0-value words) + offset of first 1 bit File Systems