450 likes | 626 Views
File System Interface and Implementations. Fred Kuhns CS523 – Operating Systems. FS Framework in UNIX. Provides persistent storage Facilities for managing data file - abstraction for data container, supports sequential and random access
E N D
File System Interface and Implementations Fred Kuhns CS523 – Operating Systems
FS Framework in UNIX • Provides persistent storage • Facilities for managing data • file - abstraction for data container, supports sequential and random access • file system - permits organizing, manipulating and accessing files • User interface specifies behavior and semantics of relevant system calls • Interface exported abstractions: files, directories, file descriptorsand differentfile systems CS523 – Operating Systems
Kernel, Files and Directories • kernel provides control operations to name, organize and control access to files but it does not interpret contents • Running programs have an associated current working directory. Permits use of relative pathnames. Otherwise complete pathnames are required. • File viewed as a collection of bytes • Applications requiring more structure must define and implement themselves CS523 – Operating Systems
Kernel, Files and Directories • files and directories form hierarchical tree structure name space. • tree forms a directed acyclic graph • Directory entry for a file is known as a hard link. • Files may also have symbolic links • File may have one or more links • POSIX defines library routines {opendir(), readdir(), rewinddir(), closedir()} struct dirent { ino_t d_ino; char d_name[NAME_MAX + 1]; } CS523 – Operating Systems
File and Directory Organization / (hard) links bin etc dev usr vmunix sh local etc /usr/local/bin/bash bin bash CS523 – Operating Systems
File Attributes • Type – directory, regular file, FIFO, symbolic link, special. • Reference count – number of hard links {link(), unlink()} • size in bytes • device id – device file resides on • inode number - one inode per file, inodes are unique within a disk partition (device id) • ownership - user and group id {chown()} • access modes - Permissions and modes {chmod()} • {read, write execute} for {owner, group or other} • timestamps – three different timestamps: last access, last modify, last attributes modified. {utime()} CS523 – Operating Systems
Permissions and Modes • Three Mode Flags = {suid, sgid and sticky} • suid – • File: if set and executable then set the user’s effective user id • Directory: Not used • sgid – • File: if set and executable then set the effective group id. If sgid is set but not executable then mandatory file/record locking • Directory: if set then new files inherit group of directory otherwise group of creator. • sticky – • File: if set and executable file then keep copy of program in swap area. • Directory: if set and directory writable then remove/rename if EUID = owner of file/directory or if process has write permission for file. Otherwise any process with write permission to directory may remove or rename. CS523 – Operating Systems
User View of Files • File Descriptors (open, dup, dup2, fork) • All I/O is through file descriptors • references the open file object • per process object • file descriptors may be dup’ed {dup(), dup2()}, copied on fork {fork()} or passed to unrelated process {(see ioctl() or sendmsg(), recvmsg()}permitting multiple descriptors to reference one object. • File Object - holds context • created by an open() system call • stores file offset • reference to vnode • vnode - abstract representation of a file CS523 – Operating Systems
How it works fd = open(path, oflag, mode); lseek(), read(), write() affect offset File Descriptors {{0, uf_ofile} {1, uf_ofile} {2 , uf_ofile} {3 , uf_ofile} {4 , uf_ofile} {5 , uf_ofile}} Open File Objects {*f_vnode,f_offset,f_count,...}, {*f_vnode,f_offset,f_count,...}, {*f_vnode,f_offset,f_count,...}, {*f_vnode,f_offset,f_count,...}, {*f_vnode,f_offset,f_count,...}} Vnode/vfs In-memory representation of file Vnode/vfs In-memory representation of file Vnode/vfs In-memory representation of file Vnode/vfs In-memory representation of file Vnode/vfs In-memory representation of file CS523 – Operating Systems
Overview System calls vnode interface /proc tmpfs UFS HSFS PCFS RFS NFS swapfs cdrom diskette disk Process address space Anonymous memory Example from Solaris CS523 – Operating Systems
File Systems • File hierarchy composed of one or more File Systems • One File System is designated the Root File System • Attached to mount points • File can not span multiple File Systems • Resides on one logical disk CS523 – Operating Systems
Logical Disks • Viewed as linear sequence of fixed sized, randomly accessible blocks. • device driver maps FS blocks to underlying storage device. • created using newfs or mkfs utilities • A file system must reside in a logical disk, however a logical disk need not contain a file system (for example the swap device). • Typically logical disk corresponds to partion of a physical disk. However, logical disk may: • map to multiple physical disks • be mirrored on several physical disks • striped across multiple disks or other RAID techniques. CS523 – Operating Systems
File Abstraction • Abstracts different types of I/O objects • for example directories, symbolic links, disks, terminals, printers, and pseudodevices (memory, pipes sockets etc). • Control interface includes fstat, ioctl, fcntl • Symbolic links: file contains a pathname to the linked file/directory. {lstat(), symlink(), readlink()} • Pipe and FIFO files: • FIFO created using mknod(), lives in the file system name space • Pipe created using pipe(), persists as long as opened for reading or writing. CS523 – Operating Systems
OO Style Interfaces Instance of derived class Abstract base class Struct interface_t { // Common functions: open (), close () // Common data: type, count // Pure virtual functions *ops (Null pointer) // Private data *data (Null pointer) } Struct interface_t { open (), close () type, count *ops *data } {my_read() my_write() my_init() my_open() … } {device_no, free_list, lock, …} CS523 – Operating Systems
Sun’s (SVR4) Vfs/Vnode Framework • Concurrently support multiple file system types • transparent interoperation of different file systems within one file hierarchy • enable file sharing over network • abstract interface allowing easy integration of new file systems by vendors CS523 – Operating Systems
Objectives • Operation performed on behalf of current process • Support serialized access, I.e. locking • must be stateless • must be reentrant • encourage use of global resources (cache, buffer) • support client server architectures • use dynamic storage allocation CS523 – Operating Systems
Vnode/vfs interface • Define abstract interfaces • vfs: Fundamental abstraction representing a file system to the kernel • Contains pointerss to file system (vfs) dependent operations such as mount, unmount. • vnode: Fundamental abstraction representing a file in the kernel • defines interface to the file, pointer to file system specific routines. Reference counted. • accessed in two ways: • 1) I/O related system calls • 2) pathname traversal CS523 – Operating Systems
vfs Overview fs dependent fs dependent Struct vfsops { *vfs_mount, *vfs_root, …} Struct vfsops { *vfs_mount, *vfs_root, …} rootvfs private data private data Struct vfs { *vfs_next, *vfs_vnodecovered, *vfs_ops, *vfs_data, …} Struct vfs { *vfs_next, *vfs_vnodecovered, *vfs_ops, *vfs_data, …} Struct vnode { *v_vfsp, *v_vfsmountedhere,…} Struct vnode { *v_vfsp, *v_vfsmountedhere,…} Struct vnode { *v_vfsp, *v_vfsmountedhere,…} / (root) /usr / (mounted fs) CS523 – Operating Systems
Mounting a FS • mount(spec, dir, flags, type, dataptr, datalen); • SVR5 uses a global virtual file system switch table (vfssw) • allocate and initialize private data • initialize vfs struct • locate and initialize root vnode of FS in memory (VFS_ROOT) CS523 – Operating Systems
Pathname traversal • Path traversal must, for each path component perform the following: • Verify vnode is directory, if not then stop • invoke VOP_LOOKUP (ufs_lookup()), • if component found, return pointer to vnode. • if not found and last component return vnode of parent directory • Otherwise not end and not found then ENOENT error. • If a component corresponds to a mount point then locate root vnode of mounted fs. • If component is a symbolic link, then append path • vnodes reference counts incremented during lookup • May use a Directory Lookup Cache (name to vnode) CS523 – Operating Systems
Other vfs/vnode interfaces • 4.4 BSD vfs/vnode interface • Adds state to interface • enhanced lookup • vnode locking across multiple operations • OSF/1 • uses timestamps to optimize lookups CS523 – Operating Systems
Local File Systems • S5fs - System V file system. Based on the original implementation. • FFS/UFS - BSD developed filesystem with optimized disk usage algorithms CS523 – Operating Systems
S5fs - Disk layout • Viewed as a linear array of blocks • Typical disk block size 512, 1024, 2048 bytes • Physical block number is the block’s index in array • disk uses cylinder, track and sector • first few blocks are the boot area, which is followed by theinode list (fixed size) CS523 – Operating Systems
Disk Layout tract sector heads cylinder platters Rotational speed disk seek time CS523 – Operating Systems
S5fs disk layout data bootarea inode list superblock Boot area - code to initialize bootstrap the system Superblock- metadata for filesystem. Size of FS, size of inode list, number of free blocks/inodes, free block/inode list inode list - linear array of 64byte inode structs CS523 – Operating Systems
s5fs - some details inode name Di_mode (2) di_nlinks (2) di_uid (2) di_gid (2) di_size (4) di_addr (39) di_gen (1) di_atime (4) di_mtime (4) di_ctime (4) . 8 .. 45 “” 0 myfile 123 2 byte 14byte On-disk inode directory CS523 – Operating Systems
256 links 256 links Locating file data blocks 3 B/index => 224 = 16 M blocks Assume 1024 Byte Blocks or 16 GB of data 0 1 2 3 4 5 6 7 8 9 10 - indirect 11 - double indirect 12 - triple indirect 256 links 3 Bytes/entry 256 blocks 256 links 256 links 64K Blocks 256 links 256 links 256 links 16M Blocks CS523 – Operating Systems
S5fs Kernel Implementation • In-Core Inodes - also include vnode, device id, inode number, flags • Inode lookup uses a hash queue based on inode number (may also use device number) • kernel locks inode for reading/writing • Read/Write use a buffer cache or VM CS523 – Operating Systems
Problems with s5fs • Superblock – contains essential information but is not replicated. • on-disk inodes – inodes physically located at front of disk, may result in long seek times • Disk block allocation – free block order is not optimized (blocks of a file may not be “close”) • Disk block size – 512 or 1024 Byte blocks • file name size – max of 14 chars CS523 – Operating Systems
Berkeley Fast File System - FFS • Disk partition divided into cylinder groups • superblocks restructured and replicated across partition • Constant information • cylinder group summary info such as free inodes and free block • support block fragments – typcial block size 8KB, fragment can be as small as 512B • Long file names • new disk block allocation strategy CS523 – Operating Systems
FFS Allocation strategy • Goal: Collocate similar data/info • attempt to locate file inodes in same cyl group as directory • new directories created in different cyl groups • choose from list of groups with above average free inode counts • attempt to place file data blocks and inode in same cyl group • Change cyl group when file size reaches 48KB, and thereafter every 1 MB. • allocate sequential blocks at a rotationally optimal position. • Choose cyl group with “best” free count CS523 – Operating Systems
Is FFS/UFS Better? • Measurements have shown substantial performance benefits over s5fs • FFS however, is sub-optimal when the disk is nearly full. Thus 10% is always kept free. • Modern disks however, no longer match the underlying assumptions of FFS CS523 – Operating Systems
Traditional Buffer Cache Free (LRU) Hash (device,inode) CS523 – Operating Systems
Other Limitations of s5fs and FFS • Performance - hardware designs and modern architectures have redefined the computing environment • Crash Recovery do you like waiting for fsck()? • Security - do we need more than just 7 bits • File Size limitations CS523 – Operating Systems
Performance Issues • FFS has a target rotational delay which estimates the time spent by kernel calculating the next read/write. • alternative is to read/write entire track • factor in that many disks have built-in caches • Due to the buffer cache, most disk I/O operations are writes. Note, given locality of reference assumptions most writes should be deferred. • Synchronous writes of metadata • Disk head seeks are expensive CS523 – Operating Systems
Sun-FFS (cluster) • Goal: Cluster I/O Operations to improve performance • Keeps disk block allocator • Assume rotational interleaving is not necessary: • sets rotational delay to 0 • store cluster size in superblock, overloading maxcontig • read clustering: read in physically contiguous blocks for file up to maxcontig blocks. • write clustering: pages are left in cache untill either a synchronous write is necessary or contigsize blocks can be written. CS523 – Operating Systems
4.4BSD Log-Structured FS • Entire disk dedicated to log – completely describes the file system. • Log divided into segments, with each segment pointing to the next (non-contiguous segments) • all writes are to tail of log file • garbage collection by a cleaner daemon to permit the log to wrap around. • Segment describes physical partitioning of disk and is comprised of partial segments. CS523 – Operating Systems
BSD-LFS • Directory and inode structures retained, issue is locating inodes • inodes written to disk as part of log, modified inodes written to a new location on disk. • Requires new data structure: inode map. A map of all inodes and their location on disk. Map is periodically written to disk (checkpointed). CS523 – Operating Systems
Segments • Segment usage table: contains Bytes stored in segment and time of last modification • partial segment is an atomic write and contains • checksum, • for each file with data blocks in segment the inode number, version and logical block numbers. • disk address of each inode contained in PS CS523 – Operating Systems
Example Write • Dirty buffer collected until it has a full segment. • logical blocks are ordered, inode updated and segment written to tail of log file. Old copies of file blocks and inode are now free and available to the garbage collector. CS523 – Operating Systems
Log-structured FS • Requires a large cache for read efficiency • Write efficiency is obtained since the system is always writing to the end of the log file. • Why does this help? • Why does performance compare to Sun-FFS? • What about crash recovery? • locate checkpointed imap and segment table, update from subsequent log entries (rely on timestamps) • cycle through timestamps until reach last checkpoint CS523 – Operating Systems
Garbage Collection • log wraps from end to start of disk necessitating GC • GC reads segment and identifies valid entries which are written to tail, allowing segment to be freed. • GC implemented by cleaner process which uses the ifile (system files holding the imap and segment table) CS523 – Operating Systems
Assessing BSD-LFS • all changed metadata may not make it into a signal partial segment. Complicates recovery • Block allocation when segment written to disk, thus must ensure blocks will be available when time to write. • Requires large physical memory for the large cache. • BSD-LFS superior to FFS but compared to Sun-FFS advantages are less clear. • BSD-LFS faster at metadata operations • Sun-FFS faster with I/O intensive applications • comparable for general purpose use. CS523 – Operating Systems
4.4BSD Portal FS Portal daemon User process fd <path> /p/<path> fd Sockets Protal file system CS523 – Operating Systems
Stackable Filesystems • For a given mount point, there is now possible many file systems application application /mylocal MyFS /local UFS CS523 – Operating Systems