File System

File System

Long-term Information Storage • Must store large amounts of data • Information stored must survive the termination of the process using it • Multiple processes must be able to access the information concurrently

File Structure • Three kinds of files • byte sequence • record sequence • tree

File Types (a) An executable file (b) An archive

File Access • Sequential access • read all bytes/records from the beginning • cannot jump around, could rewind or back up • convenient when medium was mag tape • Random access • bytes/records read in any order • essential for data base systems • read can be … • move file marker (seek), then read or … • read and then move file marker

Memory-Mapped Files • map() and unmap() • map a file onto a portion of the address space • read( ) and write( ) are replaced with memory operations • implementation • page tables map the file like ordinary pages • same sharing/protection as pages • issues • interaction between file system and VM when two processes access the same file via different methods

File System Implementation A possible file system layout

Implementing Files (1) (a) Contiguous allocation of disk space for 7 files (b) State of the disk after files D and E have been removed

Implementing Files (2) Storing a file as a linked list of disk blocks

Implementing Files (3) Linked list allocation using a file allocation table in RAM

Implementing Files (4) An example i-node

Implementing Directories (1) (a) A simple directory fixed size entries disk addresses and attributes in directory entry (b) Directory in which each entry just refers to an i-node

Implementing Directories (2) • Two ways of handling long file names in directory • (a) In-line • (b) In a heap

Shared Files (1) File system containing a shared file

Shared Files (2) (a) Situation prior to linking (b) After the link is created (c)After the original owner removes the file

Disk Space Management (1) • Dark line (left hand scale) gives data rate of a disk • Dotted line (right hand scale) gives disk space efficiency • All files 2KB Block size

Disk Space Management (2) (a) Storing the free list on a linked list (b) A bit map

Disk Space Management (3) (a) Almost-full block of pointers to free disk blocks in RAM - three blocks of pointers on disk (b) Result of freeing a 3-block file (c) Alternative strategy for handling 3 free blocks - shaded entries are pointers to free disk blocks

Disk Space Management (4) Quotas for keeping track of each user’s disk use

fsck() - blocks • File system states (a) consistent (b) missing block –2: put it to the free list (c) duplicate block in free list –4: happens only in linked list (d) duplicate data block – 5: a delete will set the block as used and free sol: copy the data to a new block

fsck() - files • examines directory system • counter per file • hard link makes a file to be in multiple directories • compares the counter values with link counter in the i-node • higher link counter value: i-node will not be deleted • higher counter value: a linked file can be deleted

Buffer Cache The block cache data structures

File System Performance (2) • i-nodes placed at the start of the disk • Disk divided into cylinder groups • each with its own blocks and i-nodes

The MS-DOS File System (1) The MS-DOS directory entry

The MS-DOS File System (2) • Maximum partition for different block sizes • The empty boxes represent forbidden combinations

The Windows 98 File System (1) • The extended MOS-DOS directory entry used in Windows 98 • 32 bit block address is split into two places • for long name, a file has two names; • My Document = MYDOCU~1 Bytes

The Windows 98 File System (2) An example of how a long name is stored in Windows 98

The UNIX V7 File System (1) A UNIX V7 directory entry

The UNIX V7 File System (2) A UNIX i-node

The UNIX V7 File System (3) The steps in looking up /usr/ast/mbox

DEMOS(Cray-1) • in normal case, contiguous allocation • flexibility for non-contiguous allocation • file header • table of base and size (10 entries) each block group is contiguous on disk base size block group (group of blocks)

DEMOS (2) • if a file needs more than 10 block groups, set flag in file header: BIGFILE (max 10GB) • each block group contains pointers to block group • pros & cons • + easy to find free block groups (small bitmap) • + free areas merge automatically • - when disk comes close to full • no long runs of blocks (fragmentation) • CPU overhead to find free block • disk should preserve some reservation • experience tells us 10% would be good

Transactions in File System • reliability from unreliable components • concepts • atomicity : all or nothing • durability : once it happens, it is there • serializability : transactions appear to happen one by one

Transactions in File System(2) • Motivation • File Systems have lots of data structures • bitmap for free blocks • directory • file header • indirect blocks • data blocks • for performance reason, all must be cached • read requests are easy • what about writes?

Transactions in File System • Write to cache • write through: cache is not of any help • write back: data can be lost on a crash • Multiple updates are usual for a single file operation • what happen if a crash occurs between updates • e.g. 1: move a file between directories • delete file from old directory • add file to new directory • e.g. 2: create a new file • allocate space on disk for header, data • write new header to disk • add the new file to the directory

Transactions in File System • Unix Approach (ad hoc) • meta-data consistency • synchronous write-through • multiple updates are done in specific order • after crash, fsck program fixes up anything in progress • file created, but not yet in a directory => delete file • blocks allocated, but not in bitmap => update bitmap • user data consistency • write back to disk every 30 seconds or by user request • no guarantee that blocks are written to disk in any order • no support for transaction • user may want multiple file operation done as a unit

Transactions in File System • Write-ahead logging • Almost all the file systems since 1985 use write-ahead logging • Windows/NT, Solaris, OSF, etc. • mechanism • operation • write all changes in a transaction to log • send file changes to disk • reclaim log space • if crash, read log: • if log isn't complete, no change! • if log is completely written, apply all changes to disk • if log is zero, then don't worry. All updates have gotten to disk

Log-Structured File Systems • Idea • write data once • log is the only copy of the data • as you modify disk blocks, store them in log • put everything: data blocks, file header, etc, on log • Data fetch • if need to get data from disk, get it from the log • keep map in memory • tells you where everything is • map should be in the log for crash recovery

Log-Structured File Systems • Advantage • all writes are sequential!! • no seeks, except for reads which can be handled by cache • cache is getting bigger • in extreme case, disk IO only for writes which are sequential • same problems of contiguous allocation • many files are deleted in the first 5 minutes • need garbage collection • if disk fills up, problem!! • keep disk under-utilized

Log-Structured File Systems • Mechanism • Issues for implementing the log • how to retrieve information from the log • enough free space for the log • Cache file changes, and writes sequentially on the disk in a single operation • fast writes • Information retrieval • inode map at a fixed checkpoint region • indices to inodes contained in the write • most of them are cached in memory • fast reads

Log Examples LFS • In FFS, each inode is at a fixed location on disk • an index into the inode set is sufficient to find it • in LFS, a map is needed to locate inode since it is mixed with data on the log data i-node dir i-node data i-node dir i-node map log FFS i-node i-node data dir i-node i-node dir data

Log-Structured File Systems • Space management • holes left by deleting files • threading • use the dispersed holes like a linked list • fragmentation will get worse • copying • copy a file out of the log to a leave large hole • expensive especially for long-lived files

Segment of LFS • Concept • clean segments are linked (threading) • segments with holes may be copied into a clean segment • collect long-lived files into the same segment • Cleaning Policy • when? low watermark for clean segments • how many segments? high watermark • which segments? - most fragmented • how to group files? • files in the same directory • aging sort: sort by the last modification time

Recovery • checkpoints and roll-forward (NOT a roll-back!!) • possible since all the file operations are in the log • checkpoint • a point in the log at which file system is completed • contains • address of inode maps • segment usage table • current time • checkpoint region • contains checkpoint • placed at a specific location on disk

Recovery(2) • operation • 1. write out all modified information to disk • 2. write out checkpoint region • on a crash, • roll-forward operations logged after the last checkpoint • if the crash occurs while writing a checkpoint, • keep old checkpoint • need two checkpoint regions

Roll-Forward • Recover as much information as possible • in segment summary block, there exist • a new inode: then, there must be data blocks before it. Just update inode map • data blocks without inode: ignores them since we don’t know if the data blocks are complete • Each inode has counter to indicate how many directories refer it • reference counter updated, but directory is not written • directory is written, but the reference counter is not updated • sol: employs special write ahead log for directory changes

Informed Prefetching .. • Prefetching • memory prefetching (to cache memory) • disk prefetching (to memory buffer) • disk latency is larger in different order of magnitude • Pros & Cons of prefetching • reduce latency when the prefetched data is accessed • file cache may be wasted if the prefetched data is unused • difficult to know when the prefetched data will be used • interference with other cached data and virtual memory is difficult to understand • Assumptions • disk parallelism is underutilized • applications provide hints

Limits of RAID • RAID increases disk throughput when the workload can be processed in parallel • very large accesses • multiple concurrent accesses • Many real I/O workload is not parallel • get a byte from a file • think • get another byte from (the same or another) file • access only a single disk at a time

Real I/O Workload • Recent trends • faster CPU generated I/O requests more often • programs favor larger data objects • file cache hit ratio is more important than before • Most workload is “read” • writes can be done behind in parallel - Linux • processes are blocked on “read” • most access patterns are predictable. • Let’s use the predictability as “hints”

Overview • Application discloses its future resource requirements • the system makes the final decisions, not applications • Disclosing hints are issued through ioctl • file specifier • file name or file descriptor • pattern specifier • sequential • list of <offset, length> • What to do with the disclosing hints • parallelize the I/O request for RAID • keep the data in the cache • schedule disk to reduce seek time

File System

File System

Presentation Transcript

FILE SYSTEM

Chapter 4 File System —— File System Cache

Mirror File System A Multiple Server File System

File System and File System Filter Ecosystem Update

File System

File-System

File System

FILE SYSTEM

File System

File System

FILE SYSTEM FRAMEWORK —— virtual file system framework

File System

distributed file system and google file system

File System

File System