270 likes | 281 Views
File Systems. CSE451 Andrew Whitaker. Outline. File System Interface The programmer/user’s perspective File System Implementation. File System Goal #1. Allow a single disk (or partition) to be treated as many smaller storage containers Files can have arbitrary size
E N D
File Systems CSE451 Andrew Whitaker
Outline • File System Interface • The programmer/user’s perspective • File System Implementation
File System Goal #1 • Allow a single disk (or partition) to be treated as many smaller storage containers • Files can have arbitrary size • Files can grow and shrink • Size is not stated up front
“path” File System Goal #2 • Provide a hierarchical name-space for referring to files • Key idea: directories as containers for files / home/ var/ tmp/ usr/ chris andrew kris
File System Goal #3 • Protected sharing of information • Allow users / programs to share data • Provide access control mechanisms to limit sharing drwxr-xr-x 4 gaetano www 4096 Mar 15 2005 sewpc drwxrwx--x 4 zahorjan www 4096 Mar 15 2005 software drwxrwxr-x 9 levy www 4096 Mar 16 2005 sosp16 -rw------- 1 lazowska www 2006 Oct 9 1998 staff drwxrwxr-x 3 beame ctheory 4096 Jun 1 2002 stoc96
Workload Characteristics • Most files are small • Median size ~= 4 kb • A few files are very large • A “heavy-tailed” distribution • Most files are read sequentially • Many files are quickly deleted • Windows NT: 80% of newly created files are deleted within 4 seconds
File System Implementation • Let’s start simple: • No directories • All files are at the “root” • Files are identified by a unique number
Blocks and Sectors • Disk exposes sectors (512 bytes) • Files are built from blocksof 1+ sectors • File system maps from “virtual” blocks (within a file) to physical disk blocks file 2 file 1 disk
déjà vu: File Systems versus Paging • Similarity: chunk-based allocation • Address spaces are built from pages • Files built from blocks • These are often the same size! • OS maintains the mapping between virtual and physical resources • Page tables map from virtual page to physical frame • File system maps from “virtual” block to physical disk block
Differences Between Paging and File Systems • Persistence • File system state must survive restarts • Translation performance • Virtual address translation must be very fast (done at processor speed) • Block mapping can be much slower • Layout issues • Disk performance is highly influenced by layout • Paging performance is (largely) unaffected • Any page frame is as good as any other • Files rarely have holes
Basic Disk Layout • Data region contains actual file data • Metadata region contains information about files and the file system • Block size • Block mappings (virtual block to physical block) • Protection information Metadata Data
Approach #1: Pre-allocated Disk Partitions • On file creation, carve out a contiguous disk allocation • Record the partition info in the meta-data region Note: this is exactly like base/limit registers for memory
Problems With Static Partitions • Must know (or guess) file size in advance • Penalty for getting this wrong is high • Tends to create external fragmentation • Space between partitions • Major advantage: perfect data layout • Contiguous layout is optimal for sequential reads and writes disk file 0 file 1 file 2 file 3 file 4
Alternative to Static Partitions • Allocate disk space lazily • Allow for block allocations that are not contiguous • Eliminates external fragmentation • But, results in sub-optimal data layout file Challenge: must keep track of virtual-to-physical block mappings disk
Approach #2: Block Tables (Silbershatz: Index Blocks) • In the meta-data region, maintain an array of block tables • Block table maintains the mappings from virtual file blocks to physical disk blocks … Block table for file 0 Block table for file 1 Block table for file 2 Block table for file 3
Possible Block Table Implementation block address virtual block # offset Disk data region Block 0 block table Block 1 physical address Block 2 Phys block # Phys block # offset Block 3 … Block 4 What does this remind you of?
Analyzing Block Tables • This is very close to what UNIX does! • “Block table” is called an inode • One remaining problem: choosing the block table size • Small size prohibits large files • Large size wastes space for small files • Solution: multi-level block-tables • Allocate a small number of mappings in the inode • Allow for indirection to supply mappings for larger files
UNIX i-nodes (Unix Version 7) • Each i-node contains 13 pointers • The first 10 are “direct” • Pointers to real data blocks • The 11th pointer is a “single indirect block” • A pointer to a block full of pointers to real data blocks • The 12th pointer is a “doubly indirect block” • A pointer to a block full of pointers to blocks full of pointers to real data blocks • The 13th pointer is a “triply indirect block” • You get the idea…
0 1 … 10 11 … … … … … … 12 i-nodes, Visualized Q: How is this different than multiple level page tables?
Checkpoint • What we have • Arbitrary size files that can grow and shrink dynamically • What we don’t have • File names • Directories
Completing the File System • Let’s create special files that contain the mappings from file names to numbers • Let’s call these files “directories”
UNIX Directory Implementation • Directories are implemented as files • Contains mappings from file names to I-nodes • Directories can contain other directories • This gives us the file system hierarchy • The root directory has a well-known I-node
Path name translation • Let’s say you want to open “/one/two/three.txt” fd = open(“/one/two/three.txt”, O_RDWR); • What goes on inside the file system? • Read the i-node for “/” • Read the directory contents for this i-node • Read the i-node for “one” • Read the directory contents for this i-node • Read the i-node for “two” • Read the directory contents for this i-node • Find the i-node for “three.txt • Create an open-file entry for this i-node
File Links • The same file can have multiple names • Because every file is uniquely identified by a number
Hard Link • A hard link is a mapping from a file name (path) to an i-node • Stored in a directory file • Each link refers to the same file • open (“foo.txt”) is equivalent to open (“bar.txt”) • What happens on deletion? • Each i-node contains a reference count • On link deletion, decrement the ref count • When the count reaches zero, the OS releases the file
Soft Links • Problems with hard links: • They can’t span file systems (why?) • They can’t refer to directories (why?) • Soft links address these issues • A soft link is a file containing a complete path • When the OS encounters a soft link, it re-writes the path to include the linked location • Note: soft links do not modify the i-node ref count • This makes it possible to have “broken” soft links
Summary • Files serve as a virtualized storage abstraction • Arbitrary size • Grow and shrink dynamically • The process of mapping from virtual to physical blocks resembles page tables • With some key differences • In UNIX, files are identified by number • Directories are files that map from names to numbers