330 likes | 348 Views
제 07 강 : Loading File into Memory. Loading File into Memory. DMA buffer replacement LRU. cpu. Memory. buffer. DMA. sector. Buffer each buffer -- holds one disk block (sector) kernel has N buffers -- shared by all OS needs information about each buffer
E N D
제07강 : Loading File into Memory Loading File into Memory DMA buffer replacement LRU
cpu Memory buffer DMA sector • Buffer • each buffer -- holds one disk block (sector) • kernel has N buffers -- shared by all • OS needs information about each buffer • user Clinton, Bob, ... (who’s using this buffer now) • hw device, sector number • state free/used (empty/waiting/reading/full/locked/writing/dirty) • “buffer header” (struct) • stores all information about each buffer • points to actual buffer • buffer header has link fields (doubly linked) • device_list, free_buffer_list, I/O_wait_list
cpu Memory buffer DMA sector “Buffer Cache” • Managed like CPU cache • read ahead(reada) • delayed write (dwrite) • dwrite • just set “dirty* bit” in buffer cache (on update) • write to disk later (when it is being replaced) • reada • prefetch if offset moves sequentially • dirty: data came from disk. Later memory copy is modified. Now disk copy and memory copy are different
cpu Memory buffer DMA sector Delayed Write ---- Pros & cons • Good performance • many disk traffic can be saved • Complex reliability • logically single information • physically many copies (disk, buffer) -- inconsistency • If system crashes ...
(1) problem detected (2) computer full stop Power t
Emergency action during this period problem detected & interrupt computer full stop Power t How many disk blocks can you save during this interval?
Crash ... • Only few blocks can be saved • What happens if they cannot be saved…? if lost, following goes wrong superblock whichblock is free/occupied? inode pointer to file data block data block if directories -- subtree structure if regular files -- just a file content • metadata are more important • superblock, directory, inode
Super block root directory Holes Occupied inode data Damage --- if this block becomes bad block?
Crash ... • In program, sync(2) system call • sync(2) flush(disk write) dirty buffers • doesn’t finish disk I/O (just queue them) on return • So sync(2) twice …2nd return guarantees flush • At keyboard • updated calls sync(2) every 30 second -- periodic • halt(8), shutdown(8) calls sync(2) -- by super user try man 8 intro …. (before logoff) • Caution • Do not power down without sync(2) or halt(8) • Otherwise the system crashes. What if it crashes?
fsck(8) • file system check -- check & repair file system • performed at system bootup time • start from root inode -- mark all occupied blocks • start from superblock -- mark all free blocks • something is wrong if: • some block has no incoming arc (unreachable) • some block has many incoming arc (reached many times) • lost+found • Very time-consuming 10 ms. * (1 GB / 1 KB) = 10 mega ms. = 10,000 sec !!!
Design Goal • Original UNIX file system design was • cheap, good performance • adequate reliability for School, SW house • on power fault (電源 中斷) • max. 30 seconds’ amount of work is gone • most important metadata are saved • timesharing market (school, sw house) • UNIX for bank? • Need to solve these problems Power Down? Some Contents lost 30 sec 30 sec flush
Modern systems • System V • To reduce boot time (minimize downtime) • On successful return from sync(2), make /fastboot file • if /fastboot exits, system was shutdown cleanly (don’t fsck) • After successful boot, remove /fastboot file • If /fastboot doesn’t exist, do fsck (only for /etc/fstab) • Log Structured File System • collect dirty nodes in one big segment (~track size) • periodically write this log to disk • fast -- no seek/rotational delay • recovery is fast & complete Memory buffer DMA sector
“remove b” directory a b dev bin 7 9 11 45 inode of b pointers[ ] data data data Issues • Transactional guarantee • Write all, or no write at all • “Account A Account B(transfer $ 100)” • Atomic transaction • Write both or cancel both • Ordering guarantee • “Delete file A” • Modify parent directory’s data block (file name A) • Release file A’s inode (address of data block sectors, …) • Release file A’s data block • Suggested order : (3 2 1), • Otherwise, A’s inode exists, pointer exists, wrong data …, • Write the next block to disk, only if previous write is complete synchronous write ** Reference: Vahalia, 11.7.2
Some buffers are linked to free buffer pool 22 23 88 83 14 25 45 32 74 37 11 19 Free buffers
Some buffers are allocated to a device 18 11 43 23 33 15 44 54 64 97 10 99 Disk 3
Allocate buffers to whom? Process 1 Buffer cache user Linux inode offset UNIX CPU dev CPU
18 11 43 23 33 15 44 54 64 97 10 99 Disk 3 Buffer header has flag Among buf allocated to dev ... somewill do (waiting) DMA some is currently doing DMA others has done DMA (I/O wait queue) within (dev)
Some buffers are waiting for disk I/O 18 11 43 23 33 15 44 54 I/O wait Queue Waiting to do DMA Disk 3 has done DMA
struct buf { int b_flags; /* see defines below */ struct buf *b_forw; /* headed by devtab of b_dev */ struct buf *b_back; /* " */ struct buf *av_forw; /* position on free list, */ struct buf *av_back; /* if not BUSY*/ int b_dev; /* major+minor device name */ char *b_blkno; /* block # on device */ int b_wcount; /* transfer count (usu. words) */ char b_error /* returned after I/O */ char *b_addr; /* low order core address */ char *b_xmem; /* high order core address */ } buf[NBUF]; struct buf bfreelist;
struct devtab { char d_active; /* busy flag */ char d_errcnt; /* error count (for recovery) */ struct buf *b_forw; /* first buffer for this dev */ struct buf *b_back; /* last buffer for this dev */ struct buf *d_actf; /* head of I/O queue */ struct buf *d_actl; /* tail of I/O queue */ }; 18 11 43 23 33 15 44 54 64 97 10 99 struct devtab d_active b_forw b_back d_actf d_actl I/O waiting buffers
Remember ..OS Kernel (plain C program with variables and functions) Process 1 Process 2 Process 3 PCB PCB PCB CPU mem disk tty CPU mem disk tty
Kernel Data Structure Process 1 user devswtab Buffer cache inode offset superblock inode data disk_ read ( ) CPU devtab CPU / bin etc cc date sh getty passwd
Each buffer header has 4 link fields • buf can belong to two doubly linked list at a time • read(fd) system call • get offset • get inode • checks access permission (rwx rwx rwx) • mapping: offset sector address • get major/minor device number • search buffer cache (buffer header has disk & sector #) • start from device table, traverse the links • compare each buffer with sector address • if already in buffer cache, done • if miss, then arrange to read from disk user file inode dev fd offset
read() system call {fd offset inode device search buffer list} If (hit) then done /* return data from buffer cache */ else /* buffer cache miss – must read disk */ if (free buf available?) then /* using this free buffer, read disk */ get buf read disk fill buf done else /* need replacement first */ {get most LRU buffer If (dirty?) {write old content -first, delayed write} {read disk fill buf done} }
mounting System can have many file systems Compare with Windows {C: D: E: ...}
<Logically> At bootup time specify which F.S. to boot as a “root file system” Bootblock Superblock Inode list Data block FS 1 FS Bootblock Superblock Inode list Data block FS 2 FS FS Bootblock Superblock Inode list Data block FS 3
<Logically> “root file system” Bootblock Superblock Inode list Data block FS 1 / bin etc usr dsk1 date sh getty passwd Bootblock Superblock Inode list Data block Now all files under root file system can be accessed But how do we access files in other file systems? FS 2 dsk2 dsk3 Bootblock Superblock Inode list Data block FS 3 Windows C: D: E:
<Logically> Bootblock Superblock Inode list Data block FS 1 / bin etc usr dsk1 date sh getty passwd Bootblock Superblock Inode list Data block FS 2 Mount it! dsk2 dsk3 /dev/dsk3 Bootblock Superblock Inode list Data block FS 3 / bin include src banner yacc studio.h uts
System call mount (path1, path2, option) dev special file: /dev/dsk3(which) mount point: /usr(where) example: read-only (how) After mounting, /dev/dsk3 is accessed as /usr / etc usr bin i-numbers in disk-1 root superblock date sh getty passwd bin include src banner yacc studio.h uts i-numbers in disk-2 root superblock
Mount Table Entry Purpose: - resolve pathname - locate superblock inode (/usr) inode (root) superblock device number / usr etc bin date sh getty passwd bin include src banner yacc studio.h uts
Relationship between Tables Buffer Cabe Inode table Mount table buf inode of /usr Superblock Mounted on inode Root inode inode of dsk 3 root
Disk File System • Boot block • Superblock pointers to free space in disk • inode list pointers to data block • data block • mounting file system