Today

Today Welcome File Systems, Project #3 Reading: Chapter 4 MOS P1: Exam, Basics, Simple File Allocation Break: P2: Indexed Allocation, Free space mgmt, Proj3

Exam Max 96 Mean ~72 Median 76 Andy will post full answers over the weekend – next week can find him or me with questions

Last time Input/Output device/OS interaction Focus on disk device - latency issue, block device File system abstraction

CSci 5103Operating Systems File systems: Chap 4 basics

Goals/Characteristics of FS Large amount of data Persistence Concurrent sharing Protected Efficient

Issues How do you find information? How do you keep one user from accessing another’s data? How do you know which disk blocks are free or … taken?

File Concept • Files are an OS abstraction for persistent data on disk • contiguous logical address space – think: byte offsets • In reality, a large file may have pieces stored on different disk sectors and cylinders (i.e. may not be contiguous) • What does this remind you of? • paging

File Operations • create • write • read • seek • delete • open • Close • Get/set attributes

Access Methods • Sequential Access • read next • write next • Direct/Random Access • read n • write n • position to n • read next • write next • n = relative block number or byte

Open file Table • System-wide open file table kept in kernel • stores file location on disk • current file pointer • file size • open count • Each process keeps track of its own open files with pointers to the system-wide open file table entries • Unix: (intfd -> index into process file table -> ptr into system-wide open file table) • fd = open (“foo.txt”, O_RDONLY);

Unix File Descriptors Illustrated user space kernel file pipe process file descriptor table socket system open file table tty • File descriptor

Directory Structure • A collection of nodes containing information about all files. Directory in most FS’s, directories are also files Files F 1 F 2 F 3 F 4 F n Both the directory structure and the files reside on disk. Why is the directory structure on disk? Directory operations? Issues?

Single-Level Directory • A single directory for all users. Problems? Naming problem Grouping problem Scale?

Two-Level Directory • Separate directory for each user. Path name is needed to share files Can have the same file name for different user Efficient searching No grouping capability: still fairly flat, no subdirs Scale?

Tree-Structured Directories • Efficient searching • Grouping Capability • Current directory (working directory) • convenience: absolute path names can get long! • relative naming • cd /spell/mail/prog • cat list Advantages?

Acyclic-Graph Directories • Have shared subdirectories and files.

Acyclic-Graph Directories (Cont.) • Two different names (aliasing) – problem? • Deletes dict /count dangling pointer! • Solution: • Backpointers, so we can delete all pointers. • Or just delete the link/name (semantics of delete are changed)

Acyclic-Graph Directories (Cont.) • If delete means delete the link, when does the file actually get deleted? • when there are no more links to it • must keep reference counts as part of file meta-data

General Graph Directory create and link are different Cycles can be a problem: why? How can cycles be detected during path parsing?

Types of Links Hard link - multiple (path) names refer to same file/dir (actual storage) - delete/remove: uses reference counts Soft/symbolic link - linked file (target) is special – stores the path to the source file - link directories - link across filesystems: won’t work for hard links - delete source -> dangling reference Unix – man ln

Protection • File owner/creator should be able to control: • what can be done • by whom • Types of access • Read • Write • Execute • Append • Delete • List

Access Lists and Groups (Unix) • Mode of access: read, write, execute • Three classes of users • RWX • a) owner access 7  1 1 1 RWX • b) groups access 6  1 1 0 • RWX • c) public access 1  0 0 1 • For directories, permissions have slightly different meaning

Unix Special Files num links • Prompt> ls –l /etc/passwd • -r--r--r-- 1 root sys 415 May 12 2004 /etc/passwd • : regular file • d: directory • l: symbolic link • b/c: block/char devices • few others

File System Layout meta-data file system info - # i-nodes, bitmaps, …

CSci 5103Operating Systems File system implementation Chap 4

File-System Structure • File structure • Logical storage unit • Collection of related information • File system resides on secondary storage (disks). • File control block – storage structure consisting of information about a file (kept within directory) • User issues a file logical address LA … turned into physical disk address • Typical disk block is 4K – disk operations allow random seek

File-System Structure • File size constrained by address range of machine; a 32-bit file pointer/offset can address 4 GB only • - though on some filesystems, files can exceed this • OS maintains free disk blocks (also stored on the disk itself)

Contiguous Allocation • Each file occupies a set of contiguous blocks/sectors on the disk. • Simple – only starting location (disk block #) and length (number of blocks or bytes) are required. • Accessing block b+1 after block b normally requires no disk head movement modulo disk scheduling algorithm (BIG win) • - how is this different from paging?

Example

Contiguous Allocation • Problems • Wasteful of space (dynamic storage-allocation problem). • How much space to allocate for a file? Files cannot grow easily. • Must allocate a worst-case amount of space … can cause internal fragmentation

Linked Allocation • Each file is a linked list of disk blocks: blocks may be scattered anywhere on the disk. pointer disk block = data

Allocate as needed, link together; e.g., file starts at block 9 count directory: contiguous File start length count 2 2

Linked Allocation (Cont.) • Simple – need only starting address • Free-space management – less waste of space • Files can grow • Problems: • - Random access more expensive: to find ith logical file block, have to follow pointers (lots o’ reads/seeks) • - Pointers take up space (rel. to contiguous) • - Reliability problems: bad block -> bad pointer -> file is hosed

directory name Start block 217 217 618 339 End-of-file 618 339 -1 Linked Allocation (Cont.) • File Allocation Table (FAT) stored on disk (MS-DOS, OS-2) • Grow: look for first 0 entry in FAT • Random access improved if cached in memory • (read the FAT … stored contiguously at well-known location on disk). Could have two seeks per block unless cached in memory. indexed by physical block

Problems with FAT • FAT is like a big page table • 40 GB disk and 4K disk blocks-> 40 MB of space • Why is this a problem?

Problems with FAT • FAT is like a big page table • 40 GB disk and 4K disk blocks-> 40 MB of space • Why is this a concern? • not only on disk but in memory! • needs to be completely in memory • memories are getting large, but disk capacity is growing faster

Break

index table Indexed Allocation • Want efficient random access with pointer blocks, and more memory efficient • Brings all pointers together into the index block/table. • Unlike FAT, index block is stored on a per-file basis • like a mini-FAT (sized to fit into a disk block) • Logical view:

Example of Indexed Allocation address of index block How is random access done? Index blocks are usually fixed size arrays: some wasted space

Indexed Allocation (Cont.) • Random access and reduced external fragmentation, but have overhead of index block. • How big? => Want it on a single disk block. • Example: • suppose max file size is 1024 KB with a block size of 2KB • how big is index table? • file contains at most 1024K/2K = 512 blocks • assuming index table stores only 4 byte pointers, index table takes up 2KB or 1 block. • If we insist of index table fitting in a disk block – problem?

Indexed Allocation – Mapping (Cont.) • Most files are small: hence small index block should be ok, but how to store large files? • Chain index blocks together • Linked scheme – link blocks of index table (no limit on size). • Last entry of index node is pointer to next index block. Nil pointer is the end. • As with linked allocation, may have to traverse a few pointers for random access (but fewer)

 outer-index file index table Indexed Allocation – Mapping (Cont.) • Two-level indexing • Faster access on average: given logical block #, jump right to index block – still want each table to fit on a disk block • How big a file can I store – 4K blocks? • If 4K blocks, store 1024 4 byte pointers in an index block, two levels (1024 x 1024 pointers) -> 1MB data block ptrs=> file of 4 GB • How can we make files bigger?

Combined Scheme: UNIX inode (4K bytes per block) Directory entry stores an inode# that “points” to this structure Pointers to first n blocks stored directly – faster After that have to go via indirection Allows much larger files to be stored Motivation for this?

Disk Free-Space Management • Bit vector (n blocks) 0 1 2 n-1 … 0  block[i] free 1  block[i] occupied bit[i] = Bit map requires extra space. Example: block size = 212 bytes disk size = 230 bytes (1 gigabyte) n = 230/212 = 218 bits (or 32K bytes) Easy to find first free block or get contiguous blocks

Consistency • Bit map will be cached • allocate[i]: i-node updated to include new block i • mem[i] <= 1 • disk[i] <= 1 • What order (hint: think crash)? • disk[i] • allocate[i] • mem[i]

Directory Implementation • Linear list of file names with pointers to file control blocks/inodes. • simple to program • time-consuming to find a file! • sort them or … • Hash Table – linear list with hash data structure. • decreases directory search time • collisions – situations where two file names hash to the same location • Max file name – length is stored for compactness • Linux: 255, Path total - 4096 “foo” “bar”

File lookup should be in log block 0 typically Example of inode fetch for /usr/ast/mbox… through directory

Optimization How can improve lookups? Cache directory i-nodes

Remove a file • Steps: • Remove the file from its directory. • Release the i-node to the pool of free i-nodes. • Return all the disk blocks to the pool of free disk blocks.

Representing Large Files in Unix inode direct block map (12 entries) indirect block Inodes are 128 bytes, packed into blocks. Each inode has 68 bytes of attributes and 15 block map entries. suppose block size = 8KB 12 direct block map entries in the inode can map 96KB of data. One indirect block (referenced by the inode) can map 16MB of data. One double indirect block pointer in inode maps 2K indirect blocks. maximum file size is 96KB + 16MB + 2K*16MB + 2K*… double indirect block

Today

Today

Presentation Transcript

Today

Today…..

Today

Today

Today

Today

TODAY:

TODAY! TODAY! TODAY!

Today

Today

TODAY

Today

Today

Today

Today

TODAY

Today…

Today

TODAY

Today:

Today

Today