470 likes | 668 Views
Local Filesystems. CIS657. Filesystem Layers. Stackable Filesystems. Local Filesystems. File Stores. Stackable Filesystems provide composable operations and flexible name spaces Local Filesystems are the “foundations” of the name space
E N D
Local Filesystems CIS657
Filesystem Layers Stackable Filesystems Local Filesystems File Stores • Stackable Filesystems provide composable operations and flexible name spaces • Local Filesystems are the “foundations” of the name space • File stores deal with the layout of the blocks on the disk
Common Features in Local Filesystems • Hierarchical naming • Locking • Quotas • Attribute management • Protection • Provided by UFS in BSD Unix • (Unix File System = Berkeley FFS)
Pathname searching Name creation Name change/deletion Attribute manipulation Object interpretation Process control Object management lookup create, mknod, link, symlink, mkdir rename, remove, rmdir access, getattr, setattr open, readdir, readlink, mmap, close advlock, ioctl, select lock, unlock, inactive, reclaim, abortop Filesystem Operations
Lookup • We’ve seen filesystem-independent lookup in our discussion of vnodes • Defer filesystem-dependent discussion until we see inodes
Name Creation • create: creates regular files and AF_LOCAL domain sockets • link, symlink: add names to existing objects • mknod: creates character special devices • mkdir: creates directories
Name Change & Deletion • rename: delete a name (not the object!) in one location and create a new name in another • remove: removes a name; if this is the last reference to an object, remove the object too • rmdir: removes directories
Attribute Manipulation • getattr: get attributes • Number of links • Timestamps • Flags • Uid, gid • Etc. • setattr: set attributes (can’t set all) • access: check whether process can read/write file
Object Interpretation • open/close: for special files, tell the device driver about device activation or shutdown • readdir: reads fs-specific directory structure into standard form • readlink: returns contents of symlink • mmap: prepares object for mapping into process address space
Process Control of Objects • select: find out if an object is ready for I/O • ioctl: pass request to specialized device (the catchall call) • advlock: get or release an advisory lock on an object (what’s an advisory lock?)
Object Management • inactive/reclaim: covered under vnode discussion • lock/unlock: lock and unlock objects (such as directories); ignored in stateless filesystems such as NFS
Inodes • Index nodes (inodes) are the core of the local filesystem management of files kernel data structures disk open file entry inode process descriptor inode data vnode
Type, access mode File’s owner Group-access identifier Timestamp last read/written Timestamp inode most recently updated File size Number of blocks used by file (including indirect blocks) Number of references to the file Flags (e.g. immutable) Generation number Block size of data blocks for the inode Size of extended attributes Inode fields What’s missing? Why?
data data data … … data data data … data data … … data data data data … data data … data Inode mode owners tstamps size direct blocks single ind. double ind triple ind. block cnt blocksize ref cnt xattr size data flags xattr blcks generation data
Inode Management • Like most other things, inodes are cached • Kept on hash table in the kernel (hashed on inode and device numbers) • When vnode’s inactive() or reclaim() is called, this passes through to inactivate or reclaim the inode.
Naming Consider this simple file system tree .. . 2 .. vmunix . usr 4 5 .. foo . bin 6 7 ex groff vi 10 9
Directories • Allocated in chunks • Chunk holds at least one directory entry • Entries may not span chunks • Linked list of entries • Index into inode structure • Type entry • Size of entry • Size of file name in bytes • Name of file
Directory Chunks and Entries # file 5 foo.c # dir 3 bar # file 3 biz A directory chunk with three entries 0 ? An empty directory chunk
Name Lookups • A (the most?) common request is for name lookup in directories • Kernel iterates through the directory entries • Compare lengths • If match, compare names • When found, put into name cache (remember, with positive and negative entries?)
Looking Up All Entries In a Dir • The kernel optimizes requests for all entries in a directory by maintaining a “last lookup” offset • Start next search at the last lookup • Makes sequential access O(n) instead of O(n2)
Pathname Translation: /usr/bin/vi • See pg. 312 in book & overhead
Links • One inode per file • Multiple names possible—links • A directory entry is a hard link • When last link to a file is removed, the inode is deallocated • Recap: name == link == dir entry
Links in Action:Initial Situation /home/sjc /home/pam … … foo biz ref count = 3 … … file inode … bar … /home/sdo
Links in Action:“ rm /home/pam/biz” /home/sjc /home/pam … … foo ref count = 2 … … file inode … bar … /home/sdo
Links in Action:“touch /home/pam/biz” /home/sjc /home/pam … … foo biz ref count = 2 … … file inode … ref count = 1 bar file inode … /home/sdo
To Reestablish the Link /home/sjc /home/pam • Use ln (link) command: ln /home/sjc/foo /home/pam/biz … … ref count = 3 foo biz file inode … … … bar … /home/sdo
Symbolic (Soft) Links • “Just files” • “type” field in directory entry indicates this is a symlink • File contains a pathname • Prepend the contents of the file to the remainder of the pathname • If an absolute path, use that path • If relative, interpret relative to the directory where the link was found
Example Symbolic Links /home/sjc ref count = 1 … file inode foo … /home/sdo ref count = 1 … /home/sjc/foo bleargh …
Symlinks in Action:Initial Situation /home/sjc /home/pam … ref count = 1 … foo file inode biz … … ref count = 1 /home/sjc/foo
Symlinks in Action:“ rm /home/sjc/foo” /home/sjc /home/pam … … biz … … ref count = 1 /home/sjc/foo X
Links in Action:“touch /home/sjc/foo” /home/sjc /home/pam … ref count = 1 … foo file inode biz … … ref count = 1 /home/sjc/foo Note: foo is now a new file.
Treatment of Symbolic Links • In almost all cases, a system call on a symbolic link is passed through to the file referenced by the symlink • Symbolic links can form loops in the filesystem (hard links can’t) • Symbolic links can refer to other filesystems (hard links can’t) • A shell often tracks traversal of symbolic links through the “cd” command—why?
Quotas • Limit the amount of file space used by • Users • Groups • Hard limit: the level of usage at which no further allocation can be done • Soft limit: the level of usage at which a warning is generated; if soft limit is violated for a long time, it becomes the hard limit
Quotas II • Separate quotas for both data blocks and inodes • Check quota at allocation time • Check user quota first • Then check group quota • If either fails, return error up as if filesystem were full • Kept in files in the root of the filesystem
Quota File Structures struct mount for / struct ufs_mount vnode for /quota.user vnode for /usr/quota.user struct mount for /usr struct ufs_mount vnode for /usr/quota.group struct mount for /arch struct nfs_mount
A Quota Record uid 0: block quota (soft limit) uid 1: block quota (hard limit) uid 2: current number of blocks … time to begin enforcing block quota inode quota (soft limit) uid i: … inode quota (hard limit) current number of inodes uid n: time to begin enforcing inode quota
dquot entries • dquot entries hold active quotas in kernel memory (cache, as usual) • Fast lookup via hash table • Loaded when file is first opened for writing • Checked on each write • Save pointer to dquot structs in the inode
File Locking • Locks (advisory!) can be placed on an arbitrary byte range in a file • The range lock structure for a file gives a list of active locks • A list of pending locks hangs off of each active lock
Example Lock Structures i_lockf lf_next lf_next lf_next … type=EX type=SH type=SH ID = 1 ID = 2 ID = 3 range=1,3 range=7,12 range=7,14 inode lf_block lf_block lf_block lf_next lf_next type=SH type=EX ID = 4 ID = 1 range=3,10 range=9,12 lf_block lf_block
lf_next type=SH ID = 2 range=3,5 lf_block Another Lock…Deadlock! i_lockf lf_next lf_next lf_next … type=EX type=SH type=SH ID = 1 ID = 2 ID = 3 range=1,3 range=7,12 range=7,14 inode lf_block lf_block lf_block lf_next lf_next Check for deadlock by looking for cycles. type=SH type=EX ID = 4 ID = 1 range=3,10 range=9,12 lf_block lf_block
Five Possibilities for Lock Overlap on Acquisition • Direct: exact match • Subset: new lock range is entirely contained within old lock range • Superset: new lock range entirely contains old lock range • Extend Past: new lock range starts part way into and extends past old lock range • Extend Into: new lock range starts before and extends into old lock range
Five Possibilities forLock Overlap on Acquisition II Exact Subset Superset Past Into Existing New Becomes
Lock Overlap Example:Initial State i_lockf lf_next lf_next lf_next … type=EX type=SH type=SH ID = 1 ID = 1 ID = 1 range=1,3 range=5,10 range=12,19 inode lf_block lf_block lf_block lf_next type=EX 20 0 10 ID = 2 range=3,12 lf_block
New Request lf_next type=EX ID = 1 range=3,13 lf_block 20 0 10
Result i_lockf lf_next lf_next lf_next … type=EX type=EX type=SH ID = 1 ID = 1 ID = 1 range=1,2 range=3,13 range=14,19 inode lf_block lf_block lf_block lf_next type=EX ID = 2 20 0 10 range=3,12 lf_block
Five Possibilities for Request/Lock Overlap on Release • Direct: exact match • Subset: release request is entirely contained within lock range • Superset: release request entirely contains lock range • Extend Past: release request starts part way into and extends past lock range • Extend Into: release request starts before and extends into lock range
Five Possibilities forRequest/Lock Overlap on Release II Exact Subset Superset Past Into Existing New Becomes