400 likes | 437 Views
제 48 강 : Virtual File System (VFS). Ch 12 Virtual File System (VFS). File System Many Different Implementations. FS A -- Bitmap. Bootblock Superblock -- per FS Metadata eg free list: Inode list -- per File Metadata Data block. FS B -- linked list. FS C.
E N D
제48강 : Virtual File System (VFS) Ch 12 Virtual File System (VFS)
File SystemMany Different Implementations FSA -- Bitmap Bootblock Superblock -- per FS Metadata eg free list: Inode list -- per File Metadata Data block FSB -- linked list FSC . . . . • Disk File Systems: • MS-DOS, System V, VFS, NTFS, UDF, JFS … • Network File Systems: • NFS, Coda, AFS, SMB, NCP …
/bin Linux /bin/date/x DOS /usr/a Solaris /dev/lan/y NFS <Mounting> <Logically> Bootblock Superblock Inode list Data block / usr Linux bin dev Solaris date sh getty lan Bootblock Superblock Inode list Data block x y DOS DOS ** pathname determines FS type Bootblock Superblock Inode list Data block Solaris
/ a bin usr dev x y b • Linux can access 40+ different file systems • Linux provides Standard Interface for all File Systems • Linux has VFS(virtual) layer above actual (physical) file systems • VFS defines Standard Objects (superblock, inode, file, dentry) • standard data structures • standard operations VFS “standard objects” superblock get-free-space() inode update-inode() file file-open() DOS Linux Solaris
/ a bin usr dev x y b $ cp /a/b/c/x/y/z VFS Layer User Space cp(1) read(2) f1=open(“/a/b/c”,…) Windows f2=create(“/x/y/z”,…)Solaris read (f1, &buf,…) Windows write(f2, buf,…) Solaris K E R N E L VFS Layer open() read() write() single virtual interface at the top such as open(), read(), write() …. for multiple heterogeneous file systems VFS passes requests such as read() to actual physical file systems Linux_read() DOS_read() Solaris__read() Solaris Ext2 F.S. MS-DOS /bin/ls /a/b/c /x/y/z
user process K E R N E L VFS Layer standard operation: read() / write() /open()/… standard data structure: superblock / inode / … Actual physical File System Layer different implementation Ext2 inode/superblock/file read() /write() /open() Solaris inode/superblock/file read() /write() /open() Windows inode/superblock/file read() /write() /open()
Remember task_struct? /* Open file table structure */ struct files_struct { atomic_t count; spinlock_t file_lock; struct file ** fd; struct file * fd_array[]; }; struct task_struct { volatile long state; struct thread_info *thread_info; unsigned long flags; int prio, static_prio; struct list_head tasks; struct mm_struct *mm, *active_mm; struct task_struct *parent; struct list_head children; struct list_head sibling; struct tty_struct *tty; /* ipc stuff */ struct sysv_sem sysvsem; /* CPU-specific state of this task */ struct thread_struct thread; ------------------------------------- /* open file information */ struct files_struct*files; ------------------------------------- /* file system information */ struct fs_struct*fs; ------------------------------------- /* namespace */ ; }; /* Root file system. Present Working Directory */ struct fs_struct { atomic_t count; rwlock_t lock; int umask; struct dentry * root, * pwd, * altroot; /* each task/user is allowed to have different tree */ struct vfsmount * rootmnt, * pwdmnt,; };
structfiles_struct{ open files } Per Process Data Structures struct fs_struct{ root directory of process present workingdirectory } struct namespace{ mounted file system } PA • Each process may have • unique root directory • unique view of mounted file systems • unique file system hierarchy PB • task_struct points to these structs
Linux VFS Standard Objects VFS defines standard objects (data structure + operation) superblock object • file system control block inode object • file control block file object • offsetand interaction between open file & process dentry object • mapping info: (pathname inode) • /a/b/c/d/e inode #7 disk memory
struct super_block{ unsigned long s_blocksize, s_maxbytes; int s_count; struct file_system_type s_type; struct super_operations s_op; … }; struct super_operations{ struct inode *(*alloc_inode)(…); void (*destroy_inode)(…); void (*read_inode)(…); void (*dirty_inode)(…); … } 1. Superblock Object Data Operations superblock knows where inodes are placed in disk Linux’s notion of an “object”
2. File object struct file { struct list_head f_list; struct dentry *f_dentry struct vfsmount *f_vfsmnt; struct file_operations *f_op; /* pointer to file operation table */ atomic_t f_count; unsigned int f_flags; mode_t f_mode; /* process access mode */ loff_t f_pos; /* position – current file offset */ struct fown_struct f_owner; unsigned int f_uid; int f_error; struct file_ra_state f_ra; /* read ahead */ unsigned long f_version; struct address_space *f_mapping; }; Inode Cache file table inode offset inode offset
Operations on File Object struct file_operations { struct module *owner; loff_t (*llseek) (struct file *, loff_t, int); ssize_t (*read) (struct file *, char __user *, size_t, loff_t *); ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *); int (*readdir) (struct file *, void *, filldir_t); int (*ioctl) (struct inode *, struct file *, … ); int (*mmap) (struct file *, struct vm_area_struct *); int (*open) (struct inode *, struct file *); ssize_t (*readv) (struct file *, const struct iovec *, …); ssize_t (*writev) (struct file *, const struct iovec *, …); ….. };
struct inode{ struct list_head i_dentry; unsigned long i_ino; atomic_t i_count; umode_t i_mode; i_op; i_fop … struct inode_operations{ int create(…); struct dentry *(*lookup)(); int permission() int getattr() int setxattr() int (*mkdir)(…); int (*link)(…); }; 3. Inode Object Data Operations
Operations on Inode Object • int create(struct inode *, struct dentry *, int) • to create a new inode for given dentry object • struct dentry *lookup(struct inode *dir, …) • Search directory for an inode in given dentry • int link(struct dentry *, struct inode *, …) • Create a hard link of the file • int unlink(struct inode *dir, struct dentry *) • Remove the inode from the given dentry • int symlink(…), int mkdir(…), int rmdir(…), …
4. Dentry Object Path “components” • User program invokes • create(/a/b/c/d/e) or • open(/a/b/c/d/e) / Path: /a/b/c/d/e / /a /a/b /a/b/c /a/b/c/d /a/b/c/d/e a b (2) Need to get inode of /a/b/c/d (create) or /a/b/c/d/e (open) c (3) Start from the root & downward for each component in pathname -fetch metadata from disk -fetch data blocks from disk Many disk I/O’s d e
4. Dentry Object (4) Save pathname lookup of every pathname components / Path: /a/b/c/d/e / inode 0 /a inode 217 /a/b inode 3 /a/b/c inode 7 /a/b/c/d inode 83 /a/b/c/d/e inode 116 a struct dentry /a/b/c inode #7 pathname: i-number b c d e
Where should we place dentry? struct dentry /a/b/c inode #7 pathname: i-number per process table (system) file table inode table fd_array[] device queue offset inode device 0 1 2 3 4 offset inode
task_ struct dentry inode file Where should we place dentry? Dentry Cache Inode Cache task_struct file table device queue fd_array[] offset PA /a/b/c inode device 0 1 2 3 4 offset inode /a
task_ struct dentry inode fd file f_dentry d_inode back to File object dentry object mapping info: (pathname inode) /a/b/c/d/e inode #7 struct file { struct list_head f_list; struct dentry *f_dentry;/* dentry object associated with this file */ struct vfsmount *f_vfsmnt; struct file_operations *f_op; /* pointer to file operation table */ atomic_t f_count; unsigned int f_flags; mode_t f_mode; /* process access mode */ loff_t f_pos; /* position – current file offset */ struct fown_struct f_owner; unsigned int f_uid; int f_error; struct file_ra_state f_ra; /* read ahead */ unsigned long f_version; struct address_space *f_mapping; }; dentry Cache Inode Cache file table inode offset /a/b/c inode offset /e/f
/ p a m a b c x y z Dentry Cache Inode Cache file table inode offset /a/b/c inode offset /e/f task_ struct dentry inode fd file f_dentry d_inode struct dentry{ atomic_t d_count; // usage count struct inode *d_inode; // associated inode unsigned long d_vfs_flags; // dentry cache flag struct dentry *parent // dentry of parent struct list_head d_child; // dentries in parent d_subdirs; // subdirectories struct qstr d_name; // name of this file, qstr - length, array[char] struct dentry_operations *d_op; // dentry operations table int d_mounted; // is this mounted point? … } parent/child of /a/b
Dentry Cache Inode Cache file table inode offset /a/b/c inode offset /e/f Dentry Object • Pathname lookup: “/a/b/c/d”“inode 7” • time-consuming operation (disk i/o) • Save it for future use (Locality: cd /a/b vi /a/b/c gcc /a/b/c) • Save every component of a pathname lookup • eg After “/a/b/c/d” lookup, kernel creates & saves 5 dentries • kernel allocates struct dentry per each dentry • dentry object points to corresponding inode • dentry does not correspond to on-disk data structure. • dentry cache –LRU replacement / inode 0 /a inode 217 /a/b inode 3 /a/b/c inode 7 /a/b/c/d inode 83 dentry
Operations on Dentry Object int d_revalidate(struct dentry *, int) • Determine whether the dentry is valid (not deleted). int d_compare(struct dentry *, struct qstr …); • The VFS calls this to compare two filenames int d_delete(struct dentry *); • VFS calls to delete when the dentry’s d_count is zero void d_release(struct dentry *); • Free the given dentry int d_hash(struct dentry *, struct qstr *) • Create a hash value from the given dentry
/ a b dentry Cache Inode Cache file table c inode offset /a/b/c inode offset /e/f dentry cache lookup • user process: fd=open(“/a/b/c”) • VFS must lookup • [/a/b/c] component – “is it in dcache?” • Hit done • Miss lookup parent component [/a/b] • [/a/b] component – “is it in dcache?” • same as above • [/a] • [/] • return corresponding file descriptor
Dentry State • Used state • d_count is positive dentry is being used • Valid inode (“invalid” means deleted node) • Unused state • d_count = 0 • Valid inode no one is using this now • But keep it in case it is needed again (then quick lookup) • Negative state: • not associated with valid inode • keep it to resolve future lookups quickly Used? Negative? dentry Cache Inode Cache file table inode offset /a/b/c inode /e/f offset
what is in f_op? sys_read(unsigned int fd, char __user * buf, size_t count) /* sys call */ { ret = vfs_read(file, buf, …); /* to access many different file systems ,,,*/ } vfs_read (struct file *file, char __user *buf, size_t count, loff_t *pos) { ret = file->f_op read(file, buf, count, pos); /* What is in f_op? */ } • What is stored in f_op? (pointer to function has been assigned)
Who fills i_op?One who reads inode from diskWho reads inode from disk?Superblock related methods
Superblock struct super_block{ unsigned long s_blocksize, s_maxbytes; int s_count; struct file_system_type s_type; struct super_operationss_op; … }; Operations ext2_read_inode() { - - - - - - - - } Data struct super_operations{ struct inode *(*alloc_inode)(…); void (*destroy_inode)(…); void (*read_inode)(…); void (*dirty_inode)(…); … } nfs_read_inode() { - - - - - - - - } fat_read_inode() { - - - - - - - - } different function? which file system? what pathname? read inode from disk using info in superblock / a bin usr dev x Different Functions (Superblock) y b
struct super_block{ struct s_op; … }; When you load (mount) superblock from disk s_op ext2 methods Operations Data struct super_operationsfat_sops = { .alloc_inode = ext2_alloc_inode, .destroy_inode = ext2_destroy_inode, .read_inode = ext2_read_inode, .write_inode = ext2_write_in struct super_operations{ (*alloc_inode)(…); Function Address (*destroy_inode)(…); Function Address (*read_inode)(…); Function Address (*dirty_inode)(…); Function Address struct super_operations nfs_sops = { .alloc_inode = ext2_alloc_inode, .destroy_inode = ext2_destroy_inode, .read_inode = ext2_read_inode, .write_inode = ext2_write_inode, struct super_operationsext2_sops = { .alloc_inode = ext2_alloc_inode, .destroy_inode = ext2_destroy_inode, .read_inode = ext2_read_inode, .write_inode = ext2_write_inode, read inode from disk using info in superblock / a bin usr dev x y b
void ext2_read_inode (struct inode * inode) • { • if (S_ISREG(inode->i_mode)) { /* It is regular file? */ • inode->i_op = &ext2_file_inode_operations; • inode->i_fop = &ext2_file_operations; • } else if (S_ISDIR(inode->i_mode)) { • inode->i_op = &ext2_dir_inode_operations; • inode->i_fop = &ext2_dir_operations; • … • } Fill inode with ext2 operations When you read inode from ext2 file system, inode is filled with ext2 methods read inode from disk using info in superblock struct inode{ struct list_head i_dentry; unsigned long i_ino; atomic_t i_count; umode_t i_mode; i_op; i_fop } / a bin usr dev x y b ext2 operations
struct inode{ struct list_head i_dentry; unsigned long i_ino; atomic_t i_count; umode_t i_mode; i_op; i_fop … struct inode_operations{ int create(…); struct dentry *(*lookup)(); int permission() int getattr() }; Inode Object struct file_operations{ read (struct file *,…); write (struct file *,…); ioctl (); open (…); llseek(); … }
functions: inode_operations: (pointer to function) solaris_create() { - - - - - - - - - - - - } ext2_create() { - - - - - - - - - - - - } fat_create() { - - - - - - - - - - - - } .create .lookiup inode: • i_op • i_ino • - - • i_fop solaris_lookup() { - - - - - - - - - - - - } ext2_lookup() { - - - - - - - - - - - - } fat_lookup() { - - - - - - - - - - - - } functions: file_operations: (pointer to function) solaris_open() { - - - - - - - - - - - - } ext2_open() { - - - - - - - - - - - - } fat_open() { - - - - - - - - - - - - } .open .read solaris_read() { - - - - - - - - - - - - } ext2_read() { - - - - - - - - - - - - } fat_read() { - - - - - - - - - - - - }
/ a bin usr dev x y b Who initializes Superblock?Mounting System Call asmlinkage long sys_mount () do_mount ((char*)dev_page, dir_page, …); do_add_mount (&nd, type_page, …..); /* add new file system */ do_kern_mount (type, flags, name, data); do_kern_mount(const char *fstype, int flags, const char *name, void *data) { create a new superblock object fill the fields of superblock object } struct super_operations{ struct inode *(*alloc_inode) (struct super_block *sb); void (*destroy_inode) (struct inode *); void (*read_inode) (struct inode *); }; struct super_block { struct super_operations *s_op; }
r w o rk lp Linux Solaris FAT code reading & binding is dynamic • UNIX: static devswtab[] • Linux: each file system has different implementation pathname (given at run time) determines file system i_fop when kernel loads inode from disk f_op i_fop when process opens this file inode (system-wide) file (per process) file-system specific operations struct when process opens this file when inode is loaded from disk f_op i_fop open() read() write()
have to know the pathname sys_read(unsigned int fd, char __user * buf, size_t count) /* sys call */ { ret = vfs_read(file, buf, …); /* to access many different file systems ,,,*/ } vfs_read (struct file *file, char __user *buf, size_t count, loff_t *pos) { ret = file->f_op read(file, buf, count, pos); /* What is in f_op? */ } • What is stored in f_op? (pointer to function has been assigned) • You have to know the pathname of the file • You have to know which mounted file system it belongs to open() prepares these enrties (some are shared) task_ struct super file dentry inode f__op; i_fop;
“I know it is ext2 file system” sys_read(unsigned int fd, char __user * buf, size_t count) /* sys call */ { ret = vfs_read(file, buf, …); /* to access many different file systems ,,,*/ } vfs_read (struct file *file, char __user *buf, size_t count, loff_t *pos) { ret = file->f_op read(file, buf, count, pos); /* What is in f_op? */ } • What is stored in f_op? (pointer to function has been assigned) • You have to know the pathname of the file • You have to know which mounted file system it belongs to • Find out assignments to i_fop • egrep i_fop[\b\t]*= &coda_file_operations | &ext2_file_operations | …