320 likes | 659 Views
Linux Virtual File System. Peter J. Braam. Aims. Present the data structures in Linux VFS Provide information about flow of control Describe methods and invariants needed to implement a new file system Illustrate with some examples.
E N D
Linux Virtual File System Peter J. Braam P.J.Braam/CMU -- 1
Aims • Present the data structures in Linux VFS • Provide information about flow of control • Describe methods and invariants needed to implement a new file system • Illustrate with some examples P.J.Braam/CMU -- 2
BSD implemented VFS for NFS: aim dispatch to different filesystems VMS had elaborate filesystem NT/Win95 have VFS type interfaces Newer systems integrate VM with buffer cache. History File access VFS nfs ufs Coda disk udp Venus P.J.Braam/CMU -- 3
Media based ext2 - Linux native ufs - BSD fat - DOS FS vfat - win 95 hpfs - OS/2 minix - well…. Isofs - CDROM sysv - Sysv Unix hfs - Macintosh affs - Amiga Fast FS NTFS - NT’s FS adfs - Acorn-strongarm Network nfs Coda AFS - Andrew FS smbfs - LanManager ncpfs - Novell Special ones procfs -/proc umsdos - Unix in DOS userfs - redirector to user Linux Filesystems P.J.Braam/CMU -- 4
Forthcoming: devfs - device file system DFS - DCE distributed FS Varia: cfs - crypt filesystem cfs - cache filesystem ftpfs - ftp filesystem mailfs - mail filesystem pgfs - Postgres versioning file system Linux Filesystems (ctd) • Linux serves (unrelated to the VFS!) • NFS - user & kernel • Coda • AppleShare - netatalk/CAP • SMB - samba • NCP - Novell P.J.Braam/CMU -- 5
Usefulness Linux is Obsolete Andrew Tanenbaum P.J.Braam/CMU -- 6
Multiple interfaces build up VFS: files dentries inodes superblock quota VFS can do all caching & provides utility fctns to FS FS provides methods to VFS; many are optional Linux VFS File access VFS nfs VFS ext2fs Coda FS VFS disk udp Venus P.J.Braam/CMU -- 7
User level file access • Typical user level types and code: • pathnames: “/myfile” • file descriptors: fd = open(“/myfile”…) • attributes in struct stat: stat(“/myfile”, &mybuf), chmod, chown... • offsets: write, read, lseek • directory handles: DIR *dh = opendir(“/mydir”) • directory entries: struct dirent *ent = readdir(dh) P.J.Braam/CMU -- 8
VFS • Manages kernel level file abstractions in one format for all file systems • Receives system call requests from user level (e.g. write, open, stat, link) • Interacts with a specific file system based on mount point traversal • Receives requests from other parts of the kernel, mostly from memory management P.J.Braam/CMU -- 9
File system level • Individual File Systems • responsible for managing file & directory data • responsible for managing meta-data: timestamps, owners, protection etc • translates data between • particular FS data: e.g. disk data, NFS data, Coda/AFS data • VFS data: attributes etc in standard format • e.g. nfs_getattr(….) returns attributes in VFS format, acquires attributes in NFS format to do so. P.J.Braam/CMU -- 10
Anatomy of stat system call sys_stat(path, buf) { dentry = namei(path); if ( dentry == NULL ) return -ENOENT; inode = dentry->d_inode; rc =inode->i_op->i_permission(inode); if ( rc ) return -EPERM; rc = inode->i_op->i_getattr(inode, buf); dput(dentry); return rc; } Establish VFS data Call into inode layer of filesystem Call into inode layer of filesystem P.J.Braam/CMU -- 11
Anatomy of fstatfs system call sys_fstatfs(fd, buf) { /* for things like “df” */ file = fget(fd); if ( file == NULL ) return -EBADF; superb = file->f_dentry->d_inode->i_super; rc = superb->sb_op->sb_statfs(sb, buf); return rc; } Translate fd to VFS data structure Call into superblock layer of filesystem P.J.Braam/CMU -- 12
VFS data structures for: VFS handle to the file: inode (BSD: vnode) User instantiated file handle: file (BSD: file) The whole filesystem: superblock (BSD: vfs) A name to inode translation: dentry Data structures P.J.Braam/CMU -- 13
Shorthand method notation • super block methods: sss_methodname • inode methods: iii_methodname • dentry methods: ddd_methodname • file methods: fff_methodname • instead of : inode i_op lookup we write iii_lookup P.J.Braam/CMU -- 14
namei FS VFS struct dentry *namei(parent, name) { if (dentry = d_lookup(parent,name)) else ddd_hash(parent, name) ddd_revalidate(dentry) iii_lookup(parent, name) sss_read_inode(…) struct inode *iget(ino, dev) { /* try cache else .. */ } P.J.Braam/CMU -- 15
Superblocks • Handle metadata only (attributes etc) • Responsible for retrieving and storing metadata from the FS media or peers • Struct superblocks hold things like: • device, blocksize, dirty flags, list of dirty inodes • super operations • wait queue • pointer to the root inode of this FS P.J.Braam/CMU -- 16
Super Operations (sss_) • Ops on Inodes: • read_inode • put_inode • write_inode • delete_inode • clear_inode • notify_change • Superblock manips: • read_super (mount) • put_super (unmount) • write_super (unmount) • statfs (attributes) P.J.Braam/CMU -- 17
Inodes • Inodes are VFS abstraction for the file • Inode has operations (iii_methods) • VFS maintains an inode cache, NOT the individual FS’s (compare NT, BSD etc) • Inodes contain an FS specific area where: • ext2 stores disk block numbers etc • AFS would store the FID • Extraordinary inode ops are good for dealing with stale NFS file handles etc. P.J.Braam/CMU -- 18
What’s inside an inode - 1 list_head i_hash list_head i_list list_head i_dentry int i_count long i_ino int i_dev {m,a,c}time {u,g}id mode size n_link caching Identifies file Usual stuff P.J.Braam/CMU -- 19
What’s inside an inode -2 superblock i_sb inode_ops i_op wait objects, semaphore lock vm_area_struct pipe/socket info page information union { ext2fs_inode_info i_ext2 nfs_inode_info i_nfs coda_inode_info i_coda ..} u Which FS For mmap, networking waiting FS Specific info: blockno’s fids etc P.J.Braam/CMU -- 20
Inode state • Inode can be on one or two lists: • (hash & in_use) or (hash & dirty ) or unused • inode has a use count i_count • Transitions • unused hash: iget calls sss_read_inode • dirty in_use: sss_write_inode • hashunused: call on sss_clear_inode, but if i_nlink = 0: iput calls sss_delete_inode when i_count falls to 0 P.J.Braam/CMU -- 21
Inode Cache Fs storage Fs storage Fs storage Players: 1. iget: if i_count>0 ++ 2. iput: if i_count>1 - - 3. free_inodes 4. syncing inodes Inode_hashtable sss_clear_inode (freeing inos) or sss_delete_inode (iput) sss_read_inode (iget) Unused inodes Dirty inodes sss_write_inode (sync one) media fs only (mark_inode_dirty) Used inodes P.J.Braam/CMU -- 22
Sales Red Hat Software sold 240,000 copies of Red Hat Linux in 1997 and expects to reach 400,000 in 1998.Estimates of installed servers (InfoWorld):- Linux: 7 million- OS/2: 5 million- Macintosh: 1 million P.J.Braam/CMU -- 23
lookup: return inode calls iget creation/removal create link unlink symlink mkdir rmdir mknod rename symbolic links readlink follow link pages readpage, writepage, updatepage - read or write page. Generic for mediafs. bmap - return disk block number of logical block special operations revalidate - see dentry sect truncate permission Inode operations (iii_) P.J.Braam/CMU -- 24
Dentry world • Dentry is a name to inode translation structure • Cached agressively by VFS • Eliminates lookups by FS & private caches • timing on Coda FS: ls -lR 1000 files after priming cache • linux 2.0.32: 7.2secs • linux 2.1.92: 0.6secs • disk fs: less benefit, NFS even more • Negative entries! • Namei is dramatically simplified P.J.Braam/CMU -- 25
Inside dentry’s • name • pointer to inode • pointer to parent dentry • list head of children • chains for lots of lists • use count P.J.Braam/CMU -- 26
Dentry associated lists Legend: inode dentry dentry inode relationship dentry tree relationship inode I_dentry list head inode i_dentry list head = d_inode pointer = d_parent pointer d_child chains place: d_alloc remove: d_prune, d_invalidate, d_put d_alias chains place: d_instantiate remove: dentry_iput P.J.Braam/CMU -- 27
Dcache dentry_hashtable (d_hash chains) • namei tries cache: d_lookup • ddd_compare • Success: ddd_revalidate • d_invalidate if fails • proceed if success • Failure: iii_lookup • find inode • iget • sss_read_inode • finish: • d_add • can give negative entry in dcache dhash(parent, name) list head prune d_invalidate d_drop namei iii_lookup d_add unused dentries (d_lru chains) P.J.Braam/CMU -- 28
Dentry methods • ddd_revalidate: can force new lookup • ddd_hash: compute hash value of name • ddd_compare: are names equal? • ddd_delete, ddd_put, ddd_iput: FS cleanup opportunity P.J.Braam/CMU -- 29
Dentry particulars: • ddd_hash and ddd_compare have to deal with extraordinary cases for msdos/vfat: • case insensitive • long and short filename pleasantries • ddd_revalidate -- can force new lookup if inode not in use: • used for NFS/SMBfs aging • used for Coda/AFS callbacks P.J.Braam/CMU -- 30
Style Dijkstra probably hates me Linus Torvalds P.J.Braam/CMU -- 31
Memory mapping • vm_area structure has • vm_operations • inode, addresses etc. • vm_operations • map, unmap • swapin, swapout • nopage -- read when page isn’t in VM • mmap • calls on iii_readpage • keeps a use count on the inode until unmap P.J.Braam/CMU -- 32