400 likes | 731 Views
Linux Filesystems: The Ext2 and Ext3 Filesystems. 9662815 林承諺 9662822 關啟邦. Outline. General characteristics of Ext2 filesystem Ext2 disk data structures Ext2 memory data structures Ext2 methods Managing Ext2 disk space Creating inodes Data block addressing Allocating data blocks
E N D
Linux Filesystems: The Ext2 and Ext3 Filesystems 9662815 林承諺 9662822 關啟邦
Outline • General characteristics of Ext2 filesystem • Ext2 disk data structures • Ext2 memory data structures • Ext2 methods • Managing Ext2 disk space • Creating inodes • Data block addressing • Allocating data blocks • The Ext3 filesystem • References
General Characteristics of Ext2 (1/3) • Ext2: the second extended filesystem • One particular filesystem in Linux • Ext2, Ext3, Ext4, FAT, Minix, UDF, … • History of Ext filesystems • Minix -> Ext FS -> Ext2 (1994) -> Ext3 -> Ext4 • Why do we report Ext2 rather than others? • The only filesystem introduced in textbook -> • Used on almost every Linux system • Lots of good practices • Fast performance
General Characteristics of Ext2 (2/3) • Features contribute to “EFFICIENCY” • Choose the optimal block size while creating depending on the expected average file length • Larger block: fewer disk transfer but internal fragmentation • Choose how many inodes to allow for a partition of given size while creating depending on number of files to be stored • Maximize the effectively usable disk space (file/data block ratio) • Partition disk blocks into groups (block groups) • Each group includes data blocks and inodes in adjacent tracks • Reduce disk seek time for accessing files • Preallocate disk data blocks (block preallocation) • Store data in adjacent blocks to reduce file fragmentation • Fast symbolic links are supported • Symbolic links with short pathnames can be stored in inode • Can be translated with reading data blocks
General Characteristics of Ext2 (3/3) • Features being considered for inclusion • Block fragmentation • Allow several small files to be stored in different fragment of the same block • Handling of transparently compressed and encrypted files • Allow users to transparently store compressed and/or encrypted versions of their files on disk • Logical deletion • Easy recovery for removed files (recycle bin in windows) • Journaling • Avoid time-consuming check when a filesystem is abruptly unmounted • Ext3 adopts journaling
Ext2 Disk Data Structures (1/3) • Ext2 partition are formed by blocks • The first block of any Ext2 partition (partition boot sector) • The rest is split into block groups (BG) • Block groups of Ext2 partition • All the BGs have the same size and stored sequentially • Derive locations by integer index • Group concept reduces file fragmentation by keeping data blocks of a file in the same group if possible
Ext2 Disk Data Structures (2/3) • Blocks within a block group • Each block in a block group consists one of the following info. • Superblock (SB): file system info. (system-wise) • Group descriptors (GD): group info. (group-wise) • Data block bitmap: used to indicate blocks are free or used • Inode bitmap: used to indicate inodes are free or used • Inodes table: table of inodes to store file info. • Data block: used to store data • Both SB and GD are duplicated in each block group • 1 block for SB & n blocks for all GD of whole partition • Only SB & GD in block group 0 are used by kernel for consistency check
Ext2 Disk Data Structures (3/3) • How many groups? • Depends on partition size & block size • Main constraint: • For efficiency (space & time) Ext2 keeps block bitmapin a single block • In each block group, there can be at most 8xb blocks • b: block size in bytes; s: partition size in blocks • Total number of block groups = s/(8xb) • For example, • A 32-GB Ext2 filesystem with 4-KB block size • 4-KB block bitmap describes 32K data blocks (32K*4KB=128MB) • At most 32K data blocks in a block group • 32-GB/128MB = 256 block groups • At least 256 block groups for this partition
Ext2 Disk Data Structures (1/5) • Superblock • Fields stored in ext2_super_block structure (ext2_fs.h) • Important fields: (fields start with “s_”) • s_inodes_count, s_blocks_count, … // block & inode counts • s_block_per_groups, s_frags_per_group, … // frag vs. block • s_mnt_count, s_mnt_count, s_state, … // mounting info.
Ext2 Disk Data Structures (2/5) • Group descriptor & bitmap • Each block group has its own GD stored in ext2_gruop_desc structure (ext2_fs.h) • Important fields: (fields start with “bg_”) • bg_free_blocks_count, bg_free_inodes_count, bg_used_dirs_count, … • Used for allocating new inodes and data blocks to determine the most suitable block
Ext2 Disk Data Structures (3/5) • Inode table • Consist series of consecutive blocks to store inodes • inode stored in ext2_inode structure (ext2_fs.h) • All inodes have the same size: 128 bytes • A 1024-byte block contains 8 inodes • Important fields: (field start with “i_”) • i_size, i_blocks, … // file size in bytes & blocks • i_block // pointer array to data blocks
Ext2 Disk Data Structures (4/5) • How various file types use disk blocks • Ext2 file types: 8 types • Regular file, Directory • Character device, Block device • Named pipe, Socket, Symbolic link • Different types of files recognized by Ext2 use data blocks in different ways • Storage requirements for each type • Regular file: • Needs data blocks only when it starts to have data • Directory: • Use data to store filenames and inode numbers of files • Symbolic link: • If pathname <= 60 char, no data block needed • Device file , named pipe, and socket: • All necessary information is stored in the inode
Ext2 Disk Data Structures (5/5) • Directory in Ext2 • A kind of file whose data blocks store filenames with their inode numbers • ext2_dir_entry_2 structure (ext2_fs.h) • Important field: • inode: inode number • rec_len: pointer to next valid dir entry, also the length of this dir entry • Length of a dir entry (multiple of 4 bytes) for efficiency • Delete a dir entry • Set inode to 0 • Update rec_len of prev valid entry
Ext2 Memory Data Structures • How often some data structure change: • Decrease s_free_inodes_count & bg_free_inodes_count while creating files (fields in SB & GD) • Modify s_free_blocks_count & bg_free_blocks_count when appending data to an existing file • To avoid many subsequent disk read operations • Most data structures are copied into RAM when FS is mounted • Three caching modes • Dynamic: data is kept in a cache as long as the associated object is in use
Ext2 Methods • Many of the VFS methods have a corresponding Ext2 implementation • Ext2 superblock operations • read_inode, write_inode, … • ext2_sop array of pointers (super.c) • Ext2 inode operations • Inode operations depend on the type of the file which the inode refers • ext2_file_inode_operations (file.c), ext2_dir_inode_opeations (namei.c), … • Ext2 file operations • Most of them are Implemented by generic functions • ext2_file_operations table (file.c)
Managing Ext2 Disk Space (1/4) • How Ext2 allocates and deallocates inodes and data blocks • Two main problems: • Fragmentation • Small pieces of files located in non-adjcent blocks • Time-efficiency • Quickly derive logical block number from a file offset • Limit the number of accesses to addressing tables • Important operations • Creating inodes: ext2_new_inode() • Deleting inodes: ext2_free_inode() • Data block addressing • Derive the logical block number of the corresponding data block from an file offset “f” inside a file (2 steps) • Allocating a data block: ext2_alloc_block() • Releasing a data block: ext2_free_blocks()
Managing Ext2 Disk Space (2/4) • Creating inodes • ext2_new_inode() (ialloc.c) • Create an Ext2 disk inode, returning the address of the corresponding VFS inode object • Two parameters of ext2_new_node(): • dir: VFS inode object of dir into which the new inode is going to be inserted • mode: the file type of this new inode • Carefully select the BG that contains the new inode by • Spread unrelated dir among different groups • Put files into the same group as their parent dir • Debt parameter for every block group • To balance the number of regular files and directories in a BG
ext2_new_inode() (ialloc.c) (1/3) • Actions: • Invoke new_inode() to allocate a new VFS inode object • Initial super_block (VFS), Ext2 data structure (in RAM & disk) • If the new inode is “dir” • Find a suitable BG for the directory with Orlov’s algorithm • If the new inode is “not dir” • Find a suitable BG having a free inode with Logarithmic search or exhaustive linear search
Algorithms to Find A Block Group • Orlov’s algorithm to find a BG for the directory • Directories have the root as parent should be spread among all block groups (unrelated dir) • Nested directories don’t have root as parent should put in the group of the parent if it satisfies following • The group doesn’t contain too many directories • The group has sufficient free inodes • The group has a small debt • debt increases if a dir added; debt decreases if other types added • If no good group has been found, linear search from block group includes the parent • Logarithmic search algorithm • Search log(n) block groups, where n is the total number of block groups. Jump further ahead until it finds an available block group. • If start from block i: • i mod(n), i+1 mod(n), i1+2 mod(n), i+1+2+4 mod(n) • Exhaustive linear search • Start from the block group that includes the parent directory dir
ext2_new_inode() (ialloc.c) (2/3) • Actions: • Invoke read_inode_bitmap() to get inode bitmap of the selected BG • Search the first null bit (first free inode) • Allocates the disk inode • Set the bit & mark buffer dirty (which contains inode bitmap) • syn_dirty_buffer if MS_SYNCHRONOUS
ext2_new_inode() (ialloc.c) (3/3) • Actions: • Setup BG & SB variables • Decrease bg_free_inodes_count • Increase bg_used_dirs_count • Increase s_debts array of the group (dir or file differ) • Decrease s_freeinodes_counter of ext2_sb_info • If dir then increase s_dirs_counter • Set s_dirt of SB to 1 and mark buffer dirty • Set s_dirt of VFS SB object to 1 • Initialize fields of inode data structure • struct ext2_inode_info *ei; • Finished
Managing Ext2 Disk Space (3/4) • Data block addressing • Each nonempty regular file consists of a group of data blocks • Such blocks may be referred by their • Relative position inside the file (file offset: f) • Position inside the disk partition • File block number (block number of a file) • Logical block number (of a block group) • Derive “logical block number” of a data block from an file offset “f” of a file • Derive the “file block number” (index of the block) from offset • Translate the “file block number” to “logical block number” through i_block[] table
Data block addressing • Four different types for 15 components in the i_block[EXT2_N_BLOCKS] array (pointer array to blocks) • 0-11 • logical block numbers • 12: • logical block number of a indirect block • Indirect block contains logical block numbers • 13: • double indirection • 14: • triple indirection
Managing Ext2 Disk Space (4/4) • Allocating a data block (1/2) • ext2_get_block() & ext2_get_blocks() (inode.c) • Locate a block holding data for a regular file • If the block does not exist, automatically allocates the block to the file by ext2_alloc_block() • Find preferred position of the new block (goal) with ext2_find_goal() and pass it to ext2_alloc_block() • Preferred position (goal) for sequential allocation of blocks • ext2_alloc_blocks() (inode.c) • Try to allocate a group of up to eight adjacent blocks (preallocation) • i_prealloc_count & i_prealloc_block field in ext2_inode_info • Invoke ext2_new_block() to search for a free block inside the Ext2 partition • If necessary, also allocates the blocks used for indirect addressing
Managing Ext2 Disk Space (4/4) • Allocating a data block (2/2) • … • ext2_new_block() & ext2_new_blocks() (balloc.c) • Search for a free block inside the Ext2 partition with 3 strategies • If the preferred block (goal) is free • Allocate the block • If goal is not free • Check whether one of the next blocks after the preferred block is free • If no free block is found in the near vicinity of preferred block • Consider all block groups (starting from the one includes goal)
The Ext3 Filesystem (1/2) • Ext3 has been designed with two simple concepts: • To be a journaling filesystem • To be compatible with the old Ext2 filesystem • Journaling filesystems • Updates to filesystem blocks might be kept in dynamic memory for long period time before being flushed to disk • Dramatic events like power-down failure cause inconsistent states • Consistency check before being mounted if it has not been properly unmounted (exhaustive and time-consuming) • Inconsistency due to lost of data structure stored in memory • If properly unmounted? s_mount_state field in Ext2 • Checking time depends mainly on the number of file & directories to be examined • Journaling filesystem avoid this problem by looking instead in a special disk area that contains the most recent disk write operations named “journal”
The Ext3 Filesystem (2/2) • The Ext3 journaling filesystem • Perform any high-level change to the filesystem in three steps • A copy of the blocks to be written is stored in journal • When the I/O data transfer to the journal is completed, the blocks are written in the filesystem • When the I/O data transfer to the filesystem terminates, the copies in journal is discarded • While recovering after a system failure, the e2fsck program distinguishes the following two cases: • System failure occurred before a commit to journal (ignore) • System failure occurred after a commit to journal (recover) • Three different journaling modes • Journal: • All data and metadata changes are logged into the journal • Ordered: • Only metadata; data block are written to disk before the metadata • Writeback: • Only metadata are logged
References • Operating system concepts: • Chapter 11: File-system interface • Chapter 12: File-system implementation • Understanding the Linux kernel: • Chapter 12: The virtual filesystem • Chapter 18: The Ext2 and Ext3 filesystems • Websites: • Orlov block allocator: http://en.wikipedia.org/wiki/Orlov_block_allocator • Linux source code: Linux-2.6.28.7