710 likes | 1.26k Views
Linux Filesystem. 9561573 葉昌國 2007/4/26. Ext2 and Ext3 Filesystem. Ext2 Disk Data Structures File size limit Extended Attributes of an Inode How Various File Types Use Disk Blocks Ext2 file types Search policy Data Blocks Addressing File holes The Ext3 Filesystem.
E N D
Linux Filesystem 9561573 葉昌國 2007/4/26
Ext2 and Ext3 Filesystem Ext2 Disk Data Structures File size limit Extended Attributes of an Inode How Various File Types Use Disk Blocks Ext2 file types Search policy Data Blocks Addressing File holes The Ext3 Filesystem
Ext2 Disk Data Structures • First block is reserved for partition boot sector • Rest is split into blockgroup • superblock and the group descriptors are duplicated in each block group • Kernel only use blockgroup 0 • s/(8xb) blockgroup s:partition size(byte), b:block size(byte) Understanding the Linux Kernel 3rd Ed
Superblock & Group descriptor • Superblock • Store the basic information of filesystem • Group descriptor • Each block group has its own group descriptor
Inode • All inodes have the same size: 128 bytes • First inode table number is stored in group descriptor’s bg_inode_table Unit of 512bytes
File size limit • i_size is 32bit, so max file size=4GB • Most significant bit is not use in i_size, max = 2GB • i_dir_acl is not use for regular files • Use it as the extension of i_size • On 32bit machine O_LARGEFILE must be set for large file
Extended Attributes of an Inode • Intend to support access control list(ACL) • i_file_acl points to the block containing the extended attributes Understanding the Linux Kernel 3rd Ed
How Various File Types Use Disk Blocks • Regular file • When regular file has data, it needs data block • Directory • Store file name and corresponding inode number in data block • Symbolic link • If path name < 60 characters, store it in i_block of inode, other one data block is needed • Device file, pipe, and socket • No data blocks are required for these kinds of files. All the necessary information is stored in the inode
Ext2 file types Understanding the Linux Kernel 3rd Ed
Directory • For efficiency rec_len mod 4=0 , \0 is used for panding • Delete directory just set inode=0 and increase the privor ’s rec_len EXT2_NAME_LEN=255 Understanding the Linux Kernel 3rd Ed
Ext2 Memory Data Structures Understanding the Linux Kernel 3rd Ed
Managing Ext2 Disk Space • Space management must make every effort to avoid file fragmentation • Space management must be time-efficient
Creating inodes • Use ext2_new_inode() to create new inode • spread unrelated directories among different groups
Search policy • If inode is a directory • Invoke find_group_orlov() • Find blockgroup which free inodes and freeblocks above average • Nested directory put in its parent group if • not contain too many directories • sufficient number of free inodes • Small s_debts, dir(+1),file(-1) • If not find suitable group, find free inodes above average in parent group
Search policy cont’d • If inode is not a directory • Invoke find_group_other() • Logarithmic search • Search from parent group move far away • i, i+1 mod(n), i+1+2 mod(n), i+1+2+4 mod(n) • If can’t find, use exhaustive linear search • Invoke read_inode_bitmap() to extract first null entry from inode Bitmap
Data Blocks Addressing • The i_block[EXT2_N_BLOCKS] in inode contain 15 elements • First 12 elements are direct points • The other three are indirect points • Why use this method? • Small files can be accessed by 2 times • big file maybe use 3~4 times disk access • dentry ,inode,page cache can reduce disk access times
Data Blocks Addressing cont’d b/4 b: block size, inode number is 4bytes Understanding the Linux Kernel 3rd Ed
File holes • $ echo -n "X" | dd of=/tmp/hole bs=1024 seek=6 • /tmp/hole has 6145 charactors Understanding the Linux Kernel 3rd Ed
Allocating a Data Block • The Ext2 filesystem uses preallocation of data blocks • Not only get required blocks but also eight adjacent blocks • The ext2_alloc_block will try the pre-allocated blocks first • If the goal or goal+1 is free, then allocated • Else if the goal is busy, discuss the pre-allocate blocks • Then invoke ext2_new_block()
The Ext3 Filesystem • The goal of enhancing filesystem from ext2 to ext3 • To be a journaling filesystem • compatible with the old Ext2 filesystem • Journaling filesystem can reduce the e2fsck scan time after accidentally power-off or a crash • perform each high-level change to the filesystem • a copy of the blocks to be written is stored in the journal • When data is committed to the journal, the blocks are written in the filesystem • When data transfer to filesystem the blocks in journal are discarded
Three journaling modes • Journal • All filesystem data and metadata changes are logged into the journal • Saftest but slowest • Ordered(ext3 default) • Only changes to filesystem metadata are logged into the journal • data blocks are written to disk before the metadata • Writeback • Only changes to filesystem metadata are logged • fastest
Virtural filesystem Introduction VFS Support Filesystem classes CommonFile Model Superblock Object Inode Object File Object Dentry Object Dentry Cache Files Associated with a Process Filesystem Types
Introduction Hanning Gao Polytechnic University Scholarship for Service
Introduction $cp /floppy/TEST /tmp/test Understanding the Linux Kernel 3rd Ed
VFS Support Filesystem classes • Disk-based filesystems • Second Extended Filesystem(Ext2), Ext3(linux) • Sysv(Unix),UFS(BSD,Solaris) • VFAT(Win 95),NTFS(Win NT) • ISO9660(CD-ROM) , UDF(DVD) • IBM OS/2(HPFS), Apple Machintosh(HFS)
VFS Support Filesystem classes • Network filesystems • Easy to access the filesystem belong to networked computers • NFS, Coda,AFS,CIFS(Windows),NCP(Novell) • Special filesystems • Do not manage disk space. • a simple interface to access the contents of some kernel data structures. • /proc in linux
Common File Model • Each directory is regarded as a normal file • Non-Unix filesystem use FAT to store location of each file in directory tree. Linux store it in memory. • A pointer for each operation • Read() file->f_op->read(…); • Each file system has its own read,write…etc. function.
Common File Model Cont’d • Superblock object • Information concerning a mounted filesystem • Store in filesystem control block • Inode object • Information about a specific file • Inode number • Store in filesystem control block • File object • Interaction between an open file and a process • only in kernel memory • Dentry object • linking of a directory entry with the corresponding file
Common File Model Cont’d Understanding the Linux Kernel 3rd Ed
Superblock Object Circular Double linked list periodically copying all dirty superblocks to disk A linked list to save dirty inode Ex:ext2_sb_info Cache in memory to improve performance
Inode Object Inodes which have the same hash key All inode in it =0 -> unused Inode also included in inode_hashtable to improve the search time =I_DIRTY_SYNC or I_DIRTY_DATASYNC or I_DIRTY_PAGES, corresponding disk inode must be updated
Inode Object Cont’d • Circular double linked lists • inode_unused(i_count=0) unused inodes • inode_in_use(i_count>0) in use inodes • s_dirty(in super block) dirty inodes • Inode_operation • Create, lookup, link,unlink,sylink,mkdir, • Rmdir,mknode…etc
File Object Pointer to file operation table File object's reference counter Current file offset (file pointer) the next operation will take place
File Object Cont’d • Circular double linked lists • free_list unused • anno_list in use but not assigned to a superblock • s_files in use and assigned to a superblock • File_operation • read, write, open, mmap … etc. • No dirty field(only in memory)
Dentry Object Cont’d • kernel creates a dentry object for every component of a pathname • have no corresponding image on disk, so no dirty flag in dentry struct • Created kmem_cache_alloc( ) • destroyed kmem_cache_free( ) • Four States • free • unused d_count=0, d_inode also point to one i_node • in use d_count>0, d_inode point to one i_node • negative d_inode = NULL
Dentry Cache • keep dentry objects that you've finished with but might need later in memory • A set of dentry objects in the in-use, unused, or negative state • A hash table to derive the dentry object associated with a given filename and a given directory quickly • controller for an inode cache • unused dentry • in a doubly linked "Least Recently Used" list • in use dentry • inserted into i_dentry field of the corresponding inode object (hard link)
Files Associated with a Process • current working directory and its own root directory store in fs_struct • (include/linux/fs_struct.h) Number of processes sharing this table file permissions
Files Associated with a Process Cont’d • Currently open file’s descriptor struct store in process descriptor’s files field • (include/linux/file.h)
Files Associated with a Process Cont’d • max_fds default is 32 • One process max opened files • NR_OPEN = 1048576 • fget invoked when the kernel starts using a file object • fput invoked when finishes using a file object Understanding the Linux Kernel 3rd Ed
Filesystem Types • Filesystem Type Registration • register_filesystem • unregister_filesystem • get_fs_type
Filesystem mount Mount Generic Filesystem Mount root Filesystem Mounting the rootfs filesystem Mounting the real root filesystem Unmounting a Filesystem Pathname Lookup Filesystem systemcall
A Mounted Root Filesystem Li-Shien Chen’s filesystem slide
Filesystem Mounting • vfsmount (include/linux/mount.h) MNT_NOSUID,MNT_NODEV,MNT_NOEXEC
Mount Generic Filesystem sys_mount do_mount path_lookup do_new_mount do_kern_mount vfs_kern_mount
Mount root Filesystem • Step1 • mounts the special rootfs filesystem, which simply provides an empty directory that serves as initial mount point • Step2 • mounts the real root filesystem over the empty directory • Why does the kernel bother to mount the rootfs filesystem before the real one? • the rootfs filesystem allows the kernel to easily change the real root filesystem