1 / 32

Linux Virtual File System

Linux Virtual File System. Peter J. Braam. Aims. Present the data structures in Linux VFS Provide information about flow of control Describe methods and invariants needed to implement a new file system Illustrate with some examples.

khuyen
Download Presentation

Linux Virtual File System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Linux Virtual File System Peter J. Braam P.J.Braam/CMU -- 1

  2. Aims • Present the data structures in Linux VFS • Provide information about flow of control • Describe methods and invariants needed to implement a new file system • Illustrate with some examples P.J.Braam/CMU -- 2

  3. BSD implemented VFS for NFS: aim dispatch to different filesystems VMS had elaborate filesystem NT/Win95 have VFS type interfaces Newer systems integrate VM with buffer cache. History File access VFS nfs ufs Coda disk udp Venus P.J.Braam/CMU -- 3

  4. Media based ext2 - Linux native ufs - BSD fat - DOS FS vfat - win 95 hpfs - OS/2 minix - well…. Isofs - CDROM sysv - Sysv Unix hfs - Macintosh affs - Amiga Fast FS NTFS - NT’s FS adfs - Acorn-strongarm Network nfs Coda AFS - Andrew FS smbfs - LanManager ncpfs - Novell Special ones procfs -/proc umsdos - Unix in DOS userfs - redirector to user Linux Filesystems P.J.Braam/CMU -- 4

  5. Forthcoming: devfs - device file system DFS - DCE distributed FS Varia: cfs - crypt filesystem cfs - cache filesystem ftpfs - ftp filesystem mailfs - mail filesystem pgfs - Postgres versioning file system Linux Filesystems (ctd) • Linux serves (unrelated to the VFS!) • NFS - user & kernel • Coda • AppleShare - netatalk/CAP • SMB - samba • NCP - Novell P.J.Braam/CMU -- 5

  6. Usefulness Linux is Obsolete Andrew Tanenbaum P.J.Braam/CMU -- 6

  7. Multiple interfaces build up VFS: files dentries inodes superblock quota VFS can do all caching & provides utility fctns to FS FS provides methods to VFS; many are optional Linux VFS File access VFS nfs VFS ext2fs Coda FS VFS disk udp Venus P.J.Braam/CMU -- 7

  8. User level file access • Typical user level types and code: • pathnames: “/myfile” • file descriptors: fd = open(“/myfile”…) • attributes in struct stat: stat(“/myfile”, &mybuf), chmod, chown... • offsets: write, read, lseek • directory handles: DIR *dh = opendir(“/mydir”) • directory entries: struct dirent *ent = readdir(dh) P.J.Braam/CMU -- 8

  9. VFS • Manages kernel level file abstractions in one format for all file systems • Receives system call requests from user level (e.g. write, open, stat, link) • Interacts with a specific file system based on mount point traversal • Receives requests from other parts of the kernel, mostly from memory management P.J.Braam/CMU -- 9

  10. File system level • Individual File Systems • responsible for managing file & directory data • responsible for managing meta-data: timestamps, owners, protection etc • translates data between • particular FS data: e.g. disk data, NFS data, Coda/AFS data • VFS data: attributes etc in standard format • e.g. nfs_getattr(….) returns attributes in VFS format, acquires attributes in NFS format to do so. P.J.Braam/CMU -- 10

  11. Anatomy of stat system call sys_stat(path, buf) { dentry = namei(path); if ( dentry == NULL ) return -ENOENT; inode = dentry->d_inode; rc =inode->i_op->i_permission(inode); if ( rc ) return -EPERM; rc = inode->i_op->i_getattr(inode, buf); dput(dentry); return rc; } Establish VFS data Call into inode layer of filesystem Call into inode layer of filesystem P.J.Braam/CMU -- 11

  12. Anatomy of fstatfs system call sys_fstatfs(fd, buf) { /* for things like “df” */ file = fget(fd); if ( file == NULL ) return -EBADF; superb = file->f_dentry->d_inode->i_super; rc = superb->sb_op->sb_statfs(sb, buf); return rc; } Translate fd to VFS data structure Call into superblock layer of filesystem P.J.Braam/CMU -- 12

  13. VFS data structures for: VFS handle to the file: inode (BSD: vnode) User instantiated file handle: file (BSD: file) The whole filesystem: superblock (BSD: vfs) A name to inode translation: dentry Data structures P.J.Braam/CMU -- 13

  14. Shorthand method notation • super block methods: sss_methodname • inode methods: iii_methodname • dentry methods: ddd_methodname • file methods: fff_methodname • instead of : inode i_op lookup we write iii_lookup P.J.Braam/CMU -- 14

  15. namei FS VFS struct dentry *namei(parent, name) { if (dentry = d_lookup(parent,name)) else ddd_hash(parent, name) ddd_revalidate(dentry) iii_lookup(parent, name) sss_read_inode(…) struct inode *iget(ino, dev) { /* try cache else .. */ } P.J.Braam/CMU -- 15

  16. Superblocks • Handle metadata only (attributes etc) • Responsible for retrieving and storing metadata from the FS media or peers • Struct superblocks hold things like: • device, blocksize, dirty flags, list of dirty inodes • super operations • wait queue • pointer to the root inode of this FS P.J.Braam/CMU -- 16

  17. Super Operations (sss_) • Ops on Inodes: • read_inode • put_inode • write_inode • delete_inode • clear_inode • notify_change • Superblock manips: • read_super (mount) • put_super (unmount) • write_super (unmount) • statfs (attributes) P.J.Braam/CMU -- 17

  18. Inodes • Inodes are VFS abstraction for the file • Inode has operations (iii_methods) • VFS maintains an inode cache, NOT the individual FS’s (compare NT, BSD etc) • Inodes contain an FS specific area where: • ext2 stores disk block numbers etc • AFS would store the FID • Extraordinary inode ops are good for dealing with stale NFS file handles etc. P.J.Braam/CMU -- 18

  19. What’s inside an inode - 1 list_head i_hash list_head i_list list_head i_dentry int i_count long i_ino int i_dev {m,a,c}time {u,g}id mode size n_link caching Identifies file Usual stuff P.J.Braam/CMU -- 19

  20. What’s inside an inode -2 superblock i_sb inode_ops i_op wait objects, semaphore lock vm_area_struct pipe/socket info page information union { ext2fs_inode_info i_ext2 nfs_inode_info i_nfs coda_inode_info i_coda ..} u Which FS For mmap, networking waiting FS Specific info: blockno’s fids etc P.J.Braam/CMU -- 20

  21. Inode state • Inode can be on one or two lists: • (hash & in_use) or (hash & dirty ) or unused • inode has a use count i_count • Transitions • unused  hash: iget calls sss_read_inode • dirty in_use: sss_write_inode • hashunused: call on sss_clear_inode, but if i_nlink = 0: iput calls sss_delete_inode when i_count falls to 0 P.J.Braam/CMU -- 21

  22. Inode Cache Fs storage Fs storage Fs storage Players: 1. iget: if i_count>0 ++ 2. iput: if i_count>1 - - 3. free_inodes 4. syncing inodes Inode_hashtable sss_clear_inode (freeing inos) or sss_delete_inode (iput) sss_read_inode (iget) Unused inodes Dirty inodes sss_write_inode (sync one) media fs only (mark_inode_dirty) Used inodes P.J.Braam/CMU -- 22

  23. Sales Red Hat Software sold 240,000 copies of Red Hat Linux in 1997 and expects to reach 400,000 in 1998.Estimates of installed servers (InfoWorld):- Linux: 7 million- OS/2: 5 million- Macintosh: 1 million P.J.Braam/CMU -- 23

  24. lookup: return inode calls iget creation/removal create link unlink symlink mkdir rmdir mknod rename symbolic links readlink follow link pages readpage, writepage, updatepage - read or write page. Generic for mediafs. bmap - return disk block number of logical block special operations revalidate - see dentry sect truncate permission Inode operations (iii_) P.J.Braam/CMU -- 24

  25. Dentry world • Dentry is a name to inode translation structure • Cached agressively by VFS • Eliminates lookups by FS & private caches • timing on Coda FS: ls -lR 1000 files after priming cache • linux 2.0.32: 7.2secs • linux 2.1.92: 0.6secs • disk fs: less benefit, NFS even more • Negative entries! • Namei is dramatically simplified P.J.Braam/CMU -- 25

  26. Inside dentry’s • name • pointer to inode • pointer to parent dentry • list head of children • chains for lots of lists • use count P.J.Braam/CMU -- 26

  27. Dentry associated lists Legend: inode dentry dentry inode relationship dentry tree relationship inode I_dentry list head inode i_dentry list head = d_inode pointer = d_parent pointer d_child chains place: d_alloc remove: d_prune, d_invalidate, d_put d_alias chains place: d_instantiate remove: dentry_iput P.J.Braam/CMU -- 27

  28. Dcache dentry_hashtable (d_hash chains) • namei tries cache: d_lookup • ddd_compare • Success: ddd_revalidate • d_invalidate if fails • proceed if success • Failure: iii_lookup • find inode • iget • sss_read_inode • finish: • d_add • can give negative entry in dcache dhash(parent, name) list head prune d_invalidate d_drop namei iii_lookup d_add unused dentries (d_lru chains) P.J.Braam/CMU -- 28

  29. Dentry methods • ddd_revalidate: can force new lookup • ddd_hash: compute hash value of name • ddd_compare: are names equal? • ddd_delete, ddd_put, ddd_iput: FS cleanup opportunity P.J.Braam/CMU -- 29

  30. Dentry particulars: • ddd_hash and ddd_compare have to deal with extraordinary cases for msdos/vfat: • case insensitive • long and short filename pleasantries • ddd_revalidate -- can force new lookup if inode not in use: • used for NFS/SMBfs aging • used for Coda/AFS callbacks P.J.Braam/CMU -- 30

  31. Style Dijkstra probably hates me Linus Torvalds P.J.Braam/CMU -- 31

  32. Memory mapping • vm_area structure has • vm_operations • inode, addresses etc. • vm_operations • map, unmap • swapin, swapout • nopage -- read when page isn’t in VM • mmap • calls on iii_readpage • keeps a use count on the inode until unmap P.J.Braam/CMU -- 32

More Related