430 likes | 533 Views
CS 6560 Operating System Design. Lecture 11 File Systems. References (please read). Web http://www.tldp.org/LDP/tlk/fs/filesystem.html http://www.tldp.org/HOWTO/Filesystems-HOWTO.html http://www.linuxjournal.com/article.php?sid=2108 Also available in BB
E N D
CS 6560 Operating System Design Lecture 11 File Systems
References (please read) Web • http://www.tldp.org/LDP/tlk/fs/filesystem.html • http://www.tldp.org/HOWTO/Filesystems-HOWTO.html • http://www.linuxjournal.com/article.php?sid=2108 Also available in BB KLEIMAN, S. R. 1986. Vnodes: An architecture for multiple file system types in Sun UNIX. In USENIX Conference Proceedings (June). USENIX, Berkeley, Calif., 238-247
Building Understanding • An architecture describes components and connections among these components of a system. • For many systems, there are multiple ways to view a system. • Example: file systems
File System Levels of Abstraction • User Level: Files and directories that a user sees in a hierarchical naming space • Mounted Device Level: Collection of devices, each of which hold a separate file system • File system Level: i-node tables and associated data blocks • Block Level: An array of data blocks. • Physical Level: sectors spinning on a hard disk, CDROM, DVD. (and even static data on a USB device)
Files • Files are named units of persistent storage of information. (Here, persistent means that the names and the information does not disappear when the computer is shutdown and then turned back on.) • Files can be structured, but we will follow the lead of Thompson and Ritchie, and only consider the case where they are arrays of bytes. • Basic file operations consist of the classic Unix-like: open, close, read (so many bytes), write (so many bytes), seek, and execute. • In Unix-like systems, files exist, independent of their names.
Thompson Model: Single Naming Space • There is just one naming space in which files are named. • It is structured as a rooted directed graph (digraph) whose vertices are the files and whose edges are directory links.
Thompson Model: Mounting • This naming space is constructed by mounting devices together at directory links. • Each device has its own file system. • One file system serves as the root file system. Its root is the root of the entire digraph.
Thompson Model: File System • Within each file system, file names are also structured as a rooted directed graph (digraph) whose vertices are the files and whose edges are directory links. • Directories are files that supply and name the edges to this graph. Each directory contains a list of links that link a filename to an inode number. • The inode number (or i-number) uniquely identifies the file within a particular device (filesystem). The pair of device • Several filesystems can be mounted together to form the larger rooted digraph. The root of one file system is grafted to a directory in another.
Thompson Model: Internal Structure • Internally a Thompson file system consists of • Boot block • Can contain a boot program • Superblock • A block that contains management data for entire filesystem, including a list of free inodes. • I-node table • An indexed table of inodes (spans several blocks with 64 (originally) bytes per inode). Each inode is a reusable structure that contains attributes for one file, but no file name. Not all elements of the inode list are used. Some are free. • Data blocks • Contains actual data
Classic Unix File System Thompson Model of a filesystem
Thompson Implementation of Directories • Directory is a list of 16-byte directory entries • Each directory entry consists of • i-node number (2 bytes) • name (14 characters max) • Directories were files that could be read. i-node number (2 bytes) File Name (14 bytes)
Linux File systems • Linux Virtual File System (today) • Linux Ext2 (and Ext3) physical file systems
File System Case Study: Linux VFS • Linux Virtual File System • Serves as a common abstract interface between the system call interface and the actual file systems. • VFS provides uniform access to a large number of different real file systems such as MSDOS, VFAT, NTFS, Apple, OS/2, NFS, Ext2, ext3, … • Supports special filesystems such as proc, pipefs, ramfs, tmpfs, sysfs • Works with the buffer cache • Supported by slab caches for inodes and directory links. • Grew from Sun Microsystem’s Vnodes file system (1986) (see KLEIMAN, S. R. 1986. Vnodes: An architecture for multiple file system types in Sun UNIX. In USENIX Conference Proceedings (June). USENIX, Berkeley, Calif., 238-247) (on BB)
Components of the FS Directory Cache VFS EXT2 VFAT EXT3 NFS NTFS proc Inode Cache Buffer Cache Disk Driver Disk Driver
Common File System Model • VFS uses a common file system model in which every file is accessed in the same way. • This modeled after the original concepts of Thompson and Ritchie’s Unix file system. • It appears to users (via shells and application programs) as • Filesystems • Files • Directories • Paths • Symbolic links
Filesystems • Each filesystem • Represents one storage device for files. • Has a unique device id. • Appears in two ways • as a directed graph whose vertices are its files and edges are hard links. Internal nodes are called directories. • as a list of its files, indexed by inode number. • Has a root that is one of its own directories and when mounted has a mount point that points to a directory on another filesystem.
Files • Files have type: regular file, directory, pipe, character device, block, … • Files are essentially treated as linear arrays of bytes with open, close, read, write, seek, lock operations (although not every operation is available for each file). • Each file has an inode that stores a set of attributes which include inode number, permissions, time stamps, ownership, and type (including the type directory and the type symbolic link). • File attributes do not include the file name.
How these relate • Files are organized in non overlapping filesystems. • Each file is associated with an inode number that uniquely identifies it within its filesystem.
Directories • Directoties are special types of files. Each directory appears as an internal node to some filesystem. • Each directory has a set of directoryentries consisting of two special internal links named “.” and “..” and all outgoing hard links. • The entry “.” links a directory to itself. • The entry “..” is an incoming link that specifies the parent directory. The parent directory has an entry which is outgoing link to this directory. • Each outgoing link has a filename that is a string satisfying some rules regarding admissible characters and length. (These rules depend upon the filesystem type.) • Each entry has an inode number that uniquely identifies a file within the same filesystem. • Directories cannot be accessed with read and write operations, they are accessed through “opendir” and “readdir” operations. (This is a change from the original Thompson model.)
Paths • Paths can be used as arguments to operations such as open, chdir, and mkdir. • Paths consist of filenames separated by “/”s. • They may begin with /, .,..,and ~. • Paths are absolute (relative to the root), if they begin with “/”, or relative (relative to the current directory), if they don’t. • Paths beginning with “/” and containing no “..” and “.” define the tree. (Any “.” is ignored, but initial “.” is treated much like an absolute path.) (example?) • Paths may include symbolic links and mount points, as well as hard links (example?)
Symbolic Links • Each Symbolic link is a file that connects a directory in one filesystem to a file in possibly different filesystem. • Symbolic links relate a directory entry to a path.
Directory Tree • The entire system of files is organized like a tree (digraph) for each process. Several nodes of this tree may share the same file due to multiple hard and symbolic links. • Each process has two access points to this tree: a root directory and a current directory. These correspond to paths that begin with “/” and “.” • System calls such as creat, open, chdir, mkdir, rm, rmdir, readdir operate on this tree.
Mount Operations • The “mount” command (system call “mount”) mounts a filesystem specified by by a block special device file and by mount point. The “/etc/fstab” file assists with this and determines mounts at boot time. • Mounting adds the filesystem and secures it to the mount point. • The “umount” command (system call “umount”) unmounts a filesystem. • Only the superuser can mount and umount filesystems.
VFS Internal Structure • Linux VFS has the following internal objects that exist in the virtual memory of the kernel. These objects have operation lists that consist of function pointers to functions in the real file systems. • superblock objects • inode objects • file objects • dentry objects • Linux VFS also has structure types that are not treated as objects. • vfsmount • file_system_type
VFS File Objects and Structures • superblock object • Represents an entire mounted filesystem • inode object • Represents a particular file in a mounted filesystem • file object • Represents an instance of an opened file • dentry object • Represents a path component = name, indoe • vfsmount • Represents a mount point • file_system_type • Represents a filesystem type
Superblock Objects • A superblock object stores information and operations on a mounted file system. • For disk-based file systems, this corresponds to the file system control block (FSCB) or superblock on the disk. • Information includes: block size, maximum file size, filesystem type, disk synch status, flags, mount point, reference count. • Operations act on the inodes (read, write, release, delete) in the filesystem and the superblock itself (release, write, get statistics). (see the textbook for details.) • The superblocks are organized in a list headed by a global variable “super_blocks” and a list for each file system type. • They can head lists of associated file objects and inode objects.
The super_operations Structure struct super_operations { /* fill the structure */ void (*read_inode) (struct inode *);<\n> int (*notify_change) (struct inode *, struct iattr *); void (*write_inode) (struct inode *); void (*put_inode) (struct inode *); void (*put_super) (struct super_block *); void (*write_super) (struct super_block *); void (*statfs) (struct super_block *, struct statfs *, int); int (*remount_fs) (struct super_block *, int *, char *); } Superblock operations
Inode Objects • An inode object stores information and operations about a specific file. • Each inode object belongs to a mounted filesystem. • Each inode is associated with an inode number that uniquely identifies it within its filesystem. • Inode information includes file attributes such as time, ownership, size (but no filename). • Operations include create new disk inode, lookup directory entry, create new inode object of various types, create a hard link, create a symbolic link, move files within filesystem, follow symbolic links, truncate files, check permissions. • All inode objects are contained in a kernel virtual memory slab cache called the inode cache. • Inode objects can head lists of dentries and buffers. • Inode objects can point to block or character device drivers.
File Objects • Each file object stores information and operations about an opened file and the process that opened it. • Among the information stored in an an file object is a file position for reading and writing. • File operations include seek, read, write, read directory, ioctl, poll, memory map, open, release, flush, synch with disk, lock (see the book for details). • Each file object is associated with a dentry object and a mounted file system. • Each file object can be part of a doubly linked list.
Dentry Objects • A dentry object stores information about a hard link. • The dentry objects are organized in a container called the dentry cache. • Memory for dentry objects are maintained by the memory manager’s slab allocator. • Information in a dentry object includes the filename as it appears in the component of paths. • Dentry objects have state: used, unused, and invalid. • Each dentry object can point to an inode object and a superblock object. • Dentry operations include revalidate, hash (for fast lookup in the dcache), name comparison, delete dentry, release dentry. (See the textbook for details.)
The Dentry Cache • The dentry cache consists of dentry of objects organized in three ways • Active dentry objects organized like a tree with a root and parent-child relationships maintained at each dentry object. This corresponds to the absolute paths without “..” and “.”. • A least recently used list for memory management. • A hash table that provides fast look up from path to dentry. • The dentry cache provides a front end to an inode cache.
VFS and Processes • Recall: Each process has an an entry in the process table called a process descriptor, implemented as “task_struct” type. • The entire process table is implemented as a linked list and also as a hash table. • Each process descriptor contains management (scheduling) information about the process and pointers to other structures including: tty (terminal driver), fs (virtual filesystem root and current directory entries), files (currently open files), mm (virtual memory descriptor), sig (signal handling info).
signal_struct fs_struct files_struct (Open Files) mm_struct (Memory Descriptor) Process Descriptor Scheduling info Process hierarchy info fs files mm sig … Process Descriptor
fs_struct • This table specifies the dentry objects of the process’ root and current directory. • It also contains the process’ umask (a bit mask used to automatically turn off permissions when creating files).
vfsmount structure vfsmount structure superblock object dentry object dentry object superblock object fs_struct Process Descriptor’s fs_struct root Scheduling info Process hierarchy info fs files current mm sig … Process Descriptor
files_struct • This table specifies the opened files of a process • It contains a pointer to an array of file objects, indexed by the file descriptor, returned from creating the file or inherited. • It contains bit maps to indicate which file objects are active and which are to be closed on exec. • It also contains the current and max number of file objects and the number of processes sharing this table.
inode object dentry object inode object dentry object dentry object dentry object inode object inode object files_struct (Open Files) Scheduling info file object file object file object file object Process hierarchy info fs files mm sig … Process Descriptor Open Files of a Process … fd open files (has f_pos) directory link represents actual files
Relationships • The open files table may be shared by several processes. This happens when processes share their address space (threads). • Each open files table has a list of open files (file objects), indexed by file descriptor (returned from opening the file) • Each open file (file object) can be shared by several open files tables and hence by several processes. This happens when a process forks. Parent and child share the same open files. • Each open file (file object) has one dentry object. • Each dentry object can be shared by several file objects. This happens with dup and some redirection. • Each dentry object has one inode object. • Each inode object can be shared by several dentry objects. This happens because of hard and symbolic links. • Each inode has one superblock object, making it belong to one mounted filesystem.
file_system_type objects • The system maintains a (linked) list of valid file types. Each file type is represented by a file_system_type object. • This object has a name for the file system type. • This object has a method for creating a superblock object. • It also points to the module that governs this file type (if modulerized) • This object heads a list of superblock objects that belong to this file type.
The file_system_type Structure struct file_system_type { struct super_block *(*read_super) (struct super_block *, void *, int); const char *name; int requires_dev; /* there's a linked list of types */ /* struct file_system_type * next; /* } File system types
File_system_type object superblock object superblock object File_system_type object superblock object superblock object File_system_type object superblock object superblock object Mounted Filesystems by Type dentry of mount dentry of mount dentry of mount dentry of mount dentry of mount dentry of mount